Making Predictions with Textual Contents in Portuguese

  • Indira Gandi Mascarenhas de Brito Instituto Superior Técnico
  • Bruno Martins Instituto Superior Técnico
Keywords: Text-Driven Forecasting, Learning Regression Models, Word Clustering, Feature Engineering for NLP

Abstract

Forecasting real-world quantities, from information on textual descriptions, has recently attracted significant interest as a research problem, although previous studies have focused on applications involving only the English language. This paper presents an experimental study on the subject of making predictions with textual contents in Portuguese, using documents from three distinct domains. We specifically report on experiments using different types of regression models, using state-of-the-art feature weighting schemes, and using features derived from cluster-based word representation. Our experiments show that regression models using the textual information achieve better results than simple baselines such as the average value in the training data, and that richer document representations (i.e., using Brown clusters and the Delta-BM25 feature weighting scheme) results in slight performance improvements.

Published
2014-07-31
How to Cite
Mascarenhas de Brito, I. G., & Martins, B. (2014). Making Predictions with Textual Contents in Portuguese. Linguamática, 6(1), 53-68. Retrieved from https://linguamatica.com/index.php/linguamatica/article/view/v6n1-04
Section
Research Articles