Geração de Linguagem Natural para Conversão de Dados em Texto - Aplicação a um Assistente de Medicação para o Português

José Casimiro Pereira; António Teixeira

Trainable NLG for Data to Portuguese - With application to a Medication Assistant

José Casimiro Pereira Instituto Politécnico de Tomar
António Teixeira Dep Electrónica Telecomunicações e Informática Universidade de Aveiro

Abstract

New equipments, such as smartphones and tablets, are changing human computer interaction. These devices present several challenges, especially due to their small screen and keyboard. In order to use text and voice in multimodal interaction, it is essential to deploy modules to translate the internal information of the applications into sentences or texts, in order to display it on screen or synthesize it. Also, these modules must generate phrases and texts in the user's native language; the development should not require considerable resources; and the outcome of the generation should achieve a good degree of variability.

Our main objective is to propose, implement and evaluate a method of data conversion to Portuguese which can be developed with a minimum of time and knowledge, but without compromising the necessary variability and quality of what is generated. The developed system, for a Medication Assistant, is intended to create descriptions, in natural language, of medication to be taken. Motivated by recent results, we opted for an approach based on machine translation, with models trained on a small parallel corpus.

For that, a new corpus was created. With it, two variants of the system were trained: phrase-based translation and syntax-based translation. The two variants were evaluated by automatic measurements -- BLEU and Meteor -- and by humans. The results showed that a phrase-based approach produced better results than a syntax-based one: human evaluators evaluated 60% of phrase-based responses as good, or very good, compared to only 46% of syntax-based responses. Considering the corpus size, we judge this value (60%) as good.

PDF (Português (Portugal))

Published

2015-07-31

How to Cite

Pereira, J. C., & Teixeira, A. (2015). Trainable NLG for Data to Portuguese - With application to a Medication Assistant. Linguamática, 7(1), 3-21. Retrieved from https://linguamatica.com/index.php/linguamatica/article/view/V7N1-1

Download Citation

Issue

Vol. 7 No. 1

Section

Research Articles

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).