Lexicometric strategies to detect textual specificities

  • Álvaro Iriarte Sanromán Universidade do Minho
  • Pablo Gamallo Otero Universidade de Santiago de Compostela
  • Alberto Simões Instituto Poltécnico do Cávado e do Ave - 2Ai Lab
Keywords: Kullback–Leibler divergence, lexical divergence, lexicometry

Abstract

In this article we propose to to define and develop an automatic strategy to search for lexical specificities within sets of texts using simple lexical units and multiword expressions (MWE).

We propose a methodology for calculating the divergence of lemma and MWE distributions that will automatically find differences and similarities between unlabeled texts. This methodology can be used to subsequently identify groups of texts to which quantitative and qualitative analyzes will be applied (semiautomatically and/or with human intervention).

In a first test, we used two specialized texts (from the area of Paediatrics) and a literary text, assuming that the texts of specialty should present greater divergences with respect to the literary text than among themselves. As the tests that were done showed the expected trend, we decided to apply the same methodology to a second set of texts (three sets of interviews done to visitors in the city of Santiago de Compostela).

Author Biographies

Álvaro Iriarte Sanromán, Universidade do Minho

Departamento de Estudos Portugueses e Lusófonos

Área Disciplinar de Linguística

Pablo Gamallo Otero, Universidade de Santiago de Compostela
Membro do ProLNat@GE e do CiTIUS
Alberto Simões, Instituto Poltécnico do Cávado e do Ave - 2Ai Lab

Departamento de Tecnologias

Published
2018-08-04
How to Cite
Iriarte Sanromán, Álvaro, Gamallo Otero, P., & Simões, A. (2018). Lexicometric strategies to detect textual specificities. Linguamática, 10(1), 19-26. https://doi.org/10.21814/lm.10.1.263
Section
Research Articles