Social Network Multilingual Author Profiling using character and POS n-grams

  • Carlos-Emiliano González-Gallardo LIA-Université d’Avignon
  • Juan-Manuel Torres-Moreno Laboratoire Informatique d'Avignon - UAPV
  • Azucena Montes Rendón CENIDET
  • Gerardo Sierra GIL - UNAM

Abstract

In this paper we present an algorithm that combines the stylistic features represented by characters and POS n-grams to classify social network multilingual documents. In both n-gram groups a dynamic normalization by context was applied to extract all the possible stylistic information encoded in the documents (emoticons, character flooding, capital letters, references to other users, hyperlinks, hashtags, etc.).

The algorithm was applied to two different corpus; Author Profiling of PAN-CLEF 2015 training tweets and the corpus of "Comments of Mexico City in time" (CCDMX). Results shows up to 90% of accuracy.

Author Biography

Juan-Manuel Torres-Moreno, Laboratoire Informatique d'Avignon - UAPV
Responsable del Equipo de Procesamiento de Lenguaje Natural (TALNE - LIA)
Published
2016-07-22
How to Cite
González-Gallardo, C.-E., Torres-Moreno, J.-M., Montes Rendón, A., & Sierra, G. (2016). Social Network Multilingual Author Profiling using character and POS n-grams. Linguamática, 8(1), 21-29. Retrieved from https://linguamatica.com/index.php/linguamatica/article/view/v8n1-2
Section
Research Articles