Challenges and advantages of the automatic identification of character gender and professions in DIP

  • Emanoel Pires Universidade Estadual do Maranhão
  • Marcia Caetano Langfeldt
  • Rebeca Schumacher Fuão
Keywords: distant reading, character identification, gender, profession

Abstract

 The development of systems for automatic identification of characters and some of their characteristics is the central objective of the Character Identification Challenge (DIP) project developed in conjunction with Linguateca. Among these characteristics, 2 this article will focus on the identification of gender and professions of the characters. Firstly, we will justify our choice to work with these two data sets, presenting the different paths we have taken to establish guidelines for their identification. Manual identification of gender and profession is exhaustive and susceptible to errors, making the use of computer systems increasingly common for this task. The analysis of professions would allow reflection on issues such as the definition of a profession, its frequency in Brazilian and Portuguese works, and possible relationships with literary genres. We present some results from distant and close reading of a group of works, contrast these results and comment on the challenges and advantages we encountered throughout this task, which seem to reinforce our hypothesis of a preference for a combined effort of automatic systems and human interpretation in character identification.

Published
2023-07-02
How to Cite
Pires, E., Langfeldt, M. C., & Fuão, R. S. (2023). Challenges and advantages of the automatic identification of character gender and professions in DIP. Linguamática, 15(1), 55-67. https://doi.org/10.21814/lm.15.1.401
Section
DIP - Character Identification Challenge