Automatic literary school assignment

Linguistic-statistical studies of lusophone literature

  • Diana Santos Linguateca / Universidade de Oslo
  • Emanoel Pires
  • Cláudia Freitas
  • Rebeca Schumacher Fuão
  • João Marques Lopes
Keywords: distant reading, corpus linguistics, literary school, Portuguese, Brazilian literature, Portuguese literature, lusophone literature

Abstract

In this paper we use a set of syntactic and semantic features of Portuguese to automatically classify literary works in literary periods and/or schools, and address the issue of their appropriateness, for two different literary collections.

The first task attempts to replicate the work by Barufaldi and colleagues, who applied compression methods on 37 Brazilian works by 15 different authors and classified the works in 4 different literary schools.

The second collection, of 192 novels published in Portugal and Brazil in the period 1840 to 1919, features many works who cannot be singly accomodated in one literary school only, and which have been (not mutually exclusively) classified as romantic, realist, naturalist, symbolist, decadent and modernist.

We use classification techniques in R, such as discriminant analysis and support vector models for the first task, and correspondence analysis for the second collection. We also apply topic modeling to (distinct subsets of) the second collection in order to investigate whether this technique can provide us with recurrent topics for different literary schools.

Published
2020-06-29
How to Cite
Santos, D., Pires, E., Freitas, C., Fuão, R. S., & Lopes, J. M. (2020). Automatic literary school assignment: Linguistic-statistical studies of lusophone literature. Linguamática, 12(1), 81-95. https://doi.org/10.21814/lm.12.1.314
Section
Research Articles