Relation Extraction using Different Features in Portuguese

  • Erick Nilsen Pereira Souza Universidade Federal da Bahia
  • Daniela Barreiro Claro Universidade Federal da Bahia

Abstract

Relation Extraction (RE) is a task of Information Extraction (IE) responsible for the discovery of semantic relationships between concepts in unstructured text. When the extraction is not limited to a predefined set of relations, the task is called Open Relation Extraction, whose main challenge is to reduce the proportion of invalid extractions in the universe of relationships identified. Current methods based on a set of specific machine learning features eliminate much of the invalid extractions. However, these solutions have the disadvantage of being highly language-dependent. This dependence arises from the difficulty in finding the most representative set of features to the Open RE problem, considering the peculiarities of each language. In this context, the present work proposes to assess the difficulties of classification based on features in open relation extraction in Portuguese, aiming to base new solutions that can reduce language dependence in this task. The results indicate that many representative features in English can not be mapped directly to the Portuguese language with satisfactory merits of classification. Among the classification algorithms evaluated, J48 showed the best results with a F-measure value of 84.1%, followed by SVM (83.9%), Perceptron (82.0%) and Naive Bayes (79,9%).

Author Biography

Daniela Barreiro Claro, Universidade Federal da Bahia
Daniela é professora Adjunta da Universidade Federal da Bahia. Ela obteve o seu Mestrado em Ciências da Computação pela Universidade Federal de Santa Catarina (2000) e o seu Doutorado em Ciência da Computação - Université d'Angers/França (2006). Em 2009, ela fundou o Grupo de Pesquisa FORMAS - Formalismos e Aplicações Semânticas no CNPQ e desde então é líder deste grupo, promovendo pesquisas na área de Similaridade Semântica e Extração da Informação. Suas principais áreas de interesse são: Similaridade Semântica, Serviços Web Semânticos, Extração da Informação, Mineração de Dados, Recuperação da Informação
Published
2014-12-26
How to Cite
Souza, E. N. P., & Claro, D. B. (2014). Relation Extraction using Different Features in Portuguese. Linguamática, 6(2), 57-65. Retrieved from https://linguamatica.com/index.php/linguamatica/article/view/v6n2-4
Section
Research Articles