Social Network Multilingual Author Profiling using character and POS n-grams

Authors

  • Carlos-Emiliano González-Gallardo LIA-Université d’Avignon
  • Juan-Manuel Torres-Moreno Laboratoire Informatique d'Avignon - UAPV
  • Azucena Montes Rendón CENIDET
  • Gerardo Sierra GIL - UNAM

Keywords:

Minería de textos, Aprendizaje automático, Clasificación, n-gramas, Blogs, Tweets, Redes sociales

Abstract

In this paper we present an algorithm that combines the stylistic features represented by characters and POS n-grams to classify social network multilingual documents. In both n-gram groups a dynamic normalization by context was applied to extract all the possible stylistic information encoded in the documents (emoticons, character flooding, capital letters, references to other users, hyperlinks, hashtags, etc.).

The algorithm was applied to two different corpus; Author Profiling of PAN-CLEF 2015 training tweets and the corpus of "Comments of Mexico City in time" (CCDMX). Results shows up to 90% of accuracy.

Author Biography

  • Juan-Manuel Torres-Moreno, Laboratoire Informatique d'Avignon - UAPV
    Responsable del Equipo de Procesamiento de Lenguaje Natural (TALNE - LIA)

References

Published

2016-07-22

Issue

Section

Research Articles

How to Cite

Social Network Multilingual Author Profiling using character and POS n-grams. (2016). Linguamática, 8(1), 21-29. https://linguamatica.com/index.php/linguamatica/article/view/v8n1-2