MultiWOZ-PT: A Task-oriented Dialogue Dataset in Portuguese

  • Patrícia Ferreira Universidade de Coimbra
  • Francisco Pais
  • Catarina Silva
  • Ana Alves
  • Hugo Gonçalo Oliveira
Keywords: task-oriented dialogue dataset, translation, multiWOZ, dialogue state tracking, intent recognition, slot filling

Abstract

Despite the language widespread usage, publicly available and annotated Portuguese dialogue corpora are scarce. This poses a significant challenge in the development of effective dialogue systems that communicate in Portuguese. Having this in mind, we present MultiWOZ-PT, a new task-oriented dialogue dataset that results from the manual translation of dialogues in the MultiWOZ dataset to the European variety of Portuguese, as well as an adaptation of its database. We provide comprehensive guidelines and insights into the process of creating MultiWOZ-PT and, to demonstrate its practical utility, we conducted experiments in two task-oriented scenarios: Intent Recognition and Dialog State Tracking, both useful for dialogue systems. Reported results illustrate the dataset's effectiveness and its potential for training and evaluating language understanding and dialogue management models for Portuguese. Therefore, MultiWOZ-PT constitutes a significant contribution to the computational processing of this language, fostering further research and development.

Published
2024-12-27
How to Cite
Ferreira, P., Pais, F., Silva, C., Alves, A., & Oliveira, H. G. (2024). MultiWOZ-PT: A Task-oriented Dialogue Dataset in Portuguese. Linguamática, 16(2), preprint. Retrieved from https://linguamatica.com/index.php/linguamatica/article/view/431
Section
PROPOR 2024 | Invited Articles