Automatic text readability classification: resources and models for Galician
Abstract
The automatic readability assessment of texts is a growing field within Natural Language Processing, with significant implications in areas such as language teaching and learning and accessibility. In this context, this paper presents Corlega, the first corpus of Galician texts classified by readability level, consisting of 480 texts aimed at adult readers. The corpus
covers 11 categories and 36 subcategories, including a variety of text types, genres and subgenres. The process of selection and compilation of documents, as well as classification, follows the standards of the iRead4Skills project, which develops resources and computational models for Portuguese, Spanish and French. To compile Corlega, this work defines six levels of readability in Galician and proposes a set of linguistic descriptors for each level. Using this taxonomy, we describe the compilation process of the corpus and its current distribution ---across four of the six readability levels---,
as well as the main features of this new resource. Additionally, we used the corpus to train and evaluate automatic readability classification tools by fitting monolingual and multilingual Transformer models, and the implementation of hybrid models. The results suggest that, with small training corpora, feature extraction from pre-trained models is
an efficient method to achieve competitive results with supervised model fitting. However, combining corpora from different languages enables the fitting of multilingual models with better performance. Both the corpus and the models are available to the scientific community.
Copyright (c) 2025 Sandra Rodríguez Rey, Marcos Garcia

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).








