Evaluating computational resources for Portuguese

  • Matilde Gonçalves
  • Luisa Coheur INESC-ID/Instituto Superior Técnico
  • Jorge Baptista
  • Ana Mineiro
Keywords: natural language processing, evaluation of resources, portuguese language, part-of-speech tagging, named entity recognition, dependency parsing

Abstract

 There are several tools for the Portuguese language. However, and due to different choices at the basis of these tools' behaviour (different pre-processing, different labels, etc.), it becomes difficult to have an idea of each one's comparative performance. In this work, we propose an evaluation of tools, publicly available and free, that perform the tasks of Part-of-Speech Tagging and Named Entity Recognition, for the Portuguese language. We evaluate twelve different models for the first task and eight for the second. All the resources used in this evaluation (mapping tables between labels, testing corpora, etc.) will be made available, allowing to replicate/fine-tune the results here presented. We also present a qualitative analysis of two dependency parsers. To the best of our knowledge, no recent work that considers the recent available tools, was carried out for the Portuguese language.

Published
2020-12-31
How to Cite
Gonçalves, M., Coheur, L., Baptista, J., & Mineiro, A. (2020). Evaluating computational resources for Portuguese. Linguamática, 12(2), 51-68. https://doi.org/10.21814/lm.12.2.331
Section
Research Articles