Extração de Relações utilizando Features Diferenciadas para Português

Erick Nilsen Pereira Souza; Daniela Barreiro Claro

Relation Extraction using Different Features in Portuguese

Authors

Erick Nilsen Pereira Souza Federal University of Bahia
Daniela Barreiro Claro Federal University of Bahia

Keywords:

Extração de Relações Abertas, Seleção de Características

Abstract

Relation Extraction (RE) is a task of Information Extraction (IE) responsible for the discovery of semantic relationships between concepts in unstructured text. When the extraction is not limited to a predefined set of relations, the task is called Open Relation Extraction, whose main challenge is to reduce the proportion of invalid extractions in the universe of relationships identified. Current methods based on a set of specific machine learning features eliminate much of the invalid extractions. However, these solutions have the disadvantage of being highly language-dependent. This dependence arises from the difficulty in finding the most representative set of features to the Open RE problem, considering the peculiarities of each language. In this context, the present work proposes to assess the difficulties of classification based on features in open relation extraction in Portuguese, aiming to base new solutions that can reduce language dependence in this task. The results indicate that many representative features in English can not be mapped directly to the Portuguese language with satisfactory merits of classification. Among the classification algorithms evaluated, J48 showed the best results with a F-measure value of 84.1%, followed by SVM (83.9%), Perceptron (82.0%) and Naive Bayes (79,9%).

Author Biography

Daniela Barreiro Claro, Federal University of Bahia

Daniela é professora Adjunta da Universidade Federal da Bahia. Ela obteve o seu Mestrado em Ciências da Computação pela Universidade Federal de Santa Catarina (2000) e o seu Doutorado em Ciência da Computação - Université d'Angers/França (2006). Em 2009, ela fundou o Grupo de Pesquisa FORMAS - Formalismos e Aplicações Semânticas no CNPQ e desde então é líder deste grupo, promovendo pesquisas na área de Similaridade Semântica e Extração da Informação. Suas principais áreas de interesse são: Similaridade Semântica, Serviços Web Semânticos, Extração da Informação, Mineração de Dados, Recuperação da Informação

Downloads

PDF (Portuguese)

Published

2014-12-26

Issue

Vol. 6 No. 2

Section

Research Articles

License

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

How to Cite

Relation Extraction using Different Features in Portuguese. (2014). Linguamática, 6(2), 57-65. https://linguamatica.com/index.php/linguamatica/article/view/v6n2-4

Download Citation