Annotating, analysing and learning named entities in Portuguese historical texts (18th century)

Keywords: named entities recognition, 18th century

Abstract

This article presents a study based on 18th-century Portuguese texts, focusing on the analysis of named entities to enhance their value for historical research. For that, an annotated corpus was developed using a primary source (the Parish Memories), which was transcribed, revised, and standardised.

The distribution of named entities in the source was then analysed to reflect on the variations in the defined categories, which were established according to historians' requirements. The annotated corpus was subsequently employed to develop Named Entity Recognition (NER) models that accommodate the complexity of historical analysis. Several solutions and language models for the NER task were trained and evaluated, where the best models achieve F1 = 0.70. Thus, this work demonstrates the usefulness of named entity recognition in the analysis of historical texts and provides a model with the capabilities to extend annotations to a larger set of texts with the same characteristics.

Published
2025-06-30
How to Cite
Vieira, R., Olival, F., Cameron, H., Farrica, F., Santos, J., & Reyes, D. (2025). Annotating, analysing and learning named entities in Portuguese historical texts (18th century). Linguamática, 17(1), 121-136. https://doi.org/10.21814/lm.17.1.445
Section
PROPOR 2024 | Invited Articles