A SMS-like language analyzer for Spanish

  • Andrés Alfonso Caurcel Díaz Universidad Politécnica de Madrid
  • Jose Maria Gomez Hidalgo Departamento de I+DOptenet S.A.
  • Yovan Iñiguez del Rio Universidad Politécnica de Madrid
Keywords: SMS language, chat language, tokenizer, automated translation, Natural Language Processing, Age detection

Abstract

The usage of specific language codes and chat and SMS-like messages is a major trend in electronic communications. This fact makes Natrual Language Processing quite hard, even at the simplest step fo text message tokenization, due to the widespread usage of non-alphanumeric symbols, frequent typos and non-standard word separators.

In this work we present a new approach for text message tokenization, specific for the Spanish language as used in Social Networks and in electronic communications. Our system has been integrated in a more general application for age-detection in Social Networks developed in the research and development project WENDY, and it has been quantitatively evaluated both in a direct fashion, and indirectly by its impact on the genearl age-detection application, showing very promising results.

Published
2013-07-20
How to Cite
Caurcel Díaz, A. A., Gomez Hidalgo, J. M., & Iñiguez del Rio, Y. (2013). A SMS-like language analyzer for Spanish. Linguamática, 5(1), 31-39. Retrieved from https://linguamatica.com/index.php/linguamatica/article/view/156
Section
Research Articles