Challenges and advantages of the automatic identification of character gender and professions in DIP

Authors

  • Emanoel Pires State University of Maranhão image/svg+xml
  • Marcia Caetano Langfeldt
  • Rebeca Schumacher Fuão

DOI:

https://doi.org/10.21814/lm.15.1.401

Keywords:

distant reading, character identification, gender, profession

Abstract

 The development of systems for automatic identification of characters and some of their characteristics is the central objective of the Character Identification Challenge (DIP) project developed in conjunction with Linguateca. Among these characteristics, 2 this article will focus on the identification of gender and professions of the characters. Firstly, we will justify our choice to work with these two data sets, presenting the different paths we have taken to establish guidelines for their identification. Manual identification of gender and profession is exhaustive and susceptible to errors, making the use of computer systems increasingly common for this task. The analysis of professions would allow reflection on issues such as the definition of a profession, its frequency in Brazilian and Portuguese works, and possible relationships with literary genres. We present some results from distant and close reading of a group of works, contrast these results and comment on the challenges and advantages we encountered throughout this task, which seem to reinforce our hypothesis of a preference for a combined effort of automatic systems and human interpretation in character identification.

References

Published

2023-07-02

Issue

Section

DIP - Character Identification Challenge

How to Cite

Challenges and advantages of the automatic identification of character gender and professions in DIP. (2023). Linguamática, 15(1), 55-67. https://doi.org/10.21814/lm.15.1.401