Enhancing Automatic Hyphenation in Portuguese for TeX
Abstract
Portuguese hyphenation rules for TeX have been in use for over three decades, showing good overall performance. However, there are still incorrect hyphenations and undetected hyphenation points. These points, although mostly occurring near word boundaries and being irrelevant for typographic purposes in TeX, can be relevant in specific contexts, such as when dealing with words outside the standard lexicon or in applications that utilize syllabic/typographic segmentation. Based on an analysis of 49,528 hyphenated words obtained from online dictionaries, we proposed 120 new rules to be incorporated into the existing Portuguese hyphenation rules. Additionally, we used patgen to create new rules or improve existing ones. However, the rules generated by patgen did not demonstrate good generalization capability. Ultimately, the manually adjusted rules showed the best performance, resulting in a 2.1% increase in the success rate. The number of correct hyphenation points increased from 38,519 to 39,808, while the incorrect hyphenation points drastically decreased from 2,059 to 33. It~is also important to note that the manually crafted rules demonstrated better generalization capability than the automatically generated rules by patgen.
Copyright (c) 2024 Leonardo Carneiro Araujo, Aline Benevides
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).