Automatic categorization of Spanish texts into linguistic registers: a contrastive analysis

  • John Roberto Rodríguez Universidad de Barcelona
  • Maria Salamó Llorente Universidad de Barcelona
  • Maria Antònia Martí Antonín Universidad de Barcelona
Keywords: Natural language processing, machine learning, linguistic register

Abstract

Collaborative software such as Recommender Systems can benefit from the automatic classification of texts into linguistic registers. First, the linguistic register provides information about the users' profiles and the context of the recommendation. Second, considering the characteristics of each type of text can help to improve existing natural language processing methods. In this paper we contrast two approaches to register categorization for Spanish. The first approach is focused on morphosintactic patterns and the second one on lexical patterns. For the experimental evaluation we tested 38 machine learning algorithms with a precision higher than 89%.

Author Biographies

John Roberto Rodríguez, Universidad de Barcelona

Becario predoctoral (FI)

Centre de Llenguatge i Computació (CLiC)

Departamento de Lingüística

Universidad de Barcelona

Maria Salamó Llorente, Universidad de Barcelona

Profesora del Departamento de Matemática Aplicada y Análisis

Universidad de Barcelona

Maria Antònia Martí Antonín, Universidad de Barcelona

Directora del Departament de Lingüística General

Directora de CLiC, Centre de Llenguatge i Computació

Universidad de Barcelona

Published
2013-07-20
How to Cite
Roberto Rodríguez, J., Salamó Llorente, M., & Martí Antonín, M. A. (2013). Automatic categorization of Spanish texts into linguistic registers: a contrastive analysis. Linguamática, 5(1), 59-67. Retrieved from https://linguamatica.com/index.php/linguamatica/article/view/153
Section
Research Articles