Lexicometric strategies to detect textual specificities

Authors

  • Álvaro Iriarte Sanromán University of Minho image/svg+xml
  • Pablo Gamallo Otero Universidade de Santiago de Compostela
  • Alberto Simões Instituto Poltécnico do Cávado e do Ave - 2Ai Lab

DOI:

https://doi.org/10.21814/lm.10.1.263

Keywords:

Kullback–Leibler divergence, lexical divergence, lexicometry

Abstract

In this article we propose to to define and develop an automatic strategy to search for lexical specificities within sets of texts using simple lexical units and multiword expressions (MWE).

We propose a methodology for calculating the divergence of lemma and MWE distributions that will automatically find differences and similarities between unlabeled texts. This methodology can be used to subsequently identify groups of texts to which quantitative and qualitative analyzes will be applied (semiautomatically and/or with human intervention).

In a first test, we used two specialized texts (from the area of Paediatrics) and a literary text, assuming that the texts of specialty should present greater divergences with respect to the literary text than among themselves. As the tests that were done showed the expected trend, we decided to apply the same methodology to a second set of texts (three sets of interviews done to visitors in the city of Santiago de Compostela).

Author Biographies

  • Álvaro Iriarte Sanromán, University of Minho

    Departamento de Estudos Portugueses e Lusófonos

    Área Disciplinar de Linguística

  • Pablo Gamallo Otero, Universidade de Santiago de Compostela
    Membro do ProLNat@GE e do CiTIUS
  • Alberto Simões, Instituto Poltécnico do Cávado e do Ave - 2Ai Lab

    Departamento de Tecnologias

References

Published

2018-08-04

Issue

Section

Research Articles

How to Cite

Lexicometric strategies to detect textual specificities. (2018). Linguamática, 10(1), 19-26. https://doi.org/10.21814/lm.10.1.263