Enhancing Named Entity Recognition in Portuguese Literary Texts with Adaptive Models

Keywords: named entity recognition, adaptive pre-training, literature in portuguese

Abstract

We investigate pre-training strategies to enhance Named Entity Recognition (NER) in Portuguese literary texts. We introduce two domain-adaptive models, LitBERT-CRF and LitBERTimbau, built on general-domain language models. We also evaluate transfer learning across domains alongside a general-domain baseline (BERT-CRF). Overall, our findings highlight the efficiency of our strategies and their implications for literary NER tasks. Furthermore, experimental results reveal the adapted and domain-specific models outperform the generic baseline with an F1 score of over 75% in a strict evaluation scenario and over 80\% in a partial scenario.

Published
2025-06-17
How to Cite
O. Silva, M., & Moro, M. (2025). Enhancing Named Entity Recognition in Portuguese Literary Texts with Adaptive Models. Linguamática, 17(1), preprint. Retrieved from https://linguamatica.com/index.php/linguamatica/article/view/443
Section
PROPOR 2024 | Invited Articles