BrAgriNews: A Temporal-Causal Brazilian-Portuguese Corpus for Agriculture

  • Brett Drury National University of Ireland Galway
  • Robson Fernandes ICMC, University of São Paulo
  • Alneu de Andrade Lopes ICMC, University of São Paulo

Abstract

There has been a recent sharp increase in interest in academia and industry in applying machine learning and artificial intelligence to agricultural problems. Text mining and related natural language processing techniques, have been rarely used to tackle agricultural problems, and at the time of writing there was a single project in the Portuguese language. It is possible that the failure of researchers to use text mining techniques to analyze Portuguese texts to resolve agricultural problems may be due to a lack of freely available corpora. To correct the lack of a Portuguese language agriculture centric corpus we are releasing a Brazilian-Portuguese agricultural language resource, which is described by this paper. The corpus is partially non-contiguous and spans a time period from 1996 to 2016. It consists of news stories that have been scraped from Brazilian News sites that have been annotated with the following information types: causal, sentiment, named entities that include temporal expressions. The corpus has additional resources such as a: treebank, lists of frequent: unigrams, bigrams and trigrams, as well words or phrases that have been identified by journalists as either: ``important'' or domain specific. It is hoped that the release of this corpus will stimulate the adoption of text mining in agriculture in the Lusophonic research community.

Author Biographies

Brett Drury, National University of Ireland Galway

Brett is currently a Senior Research Fellow at the National University of Ireland Galway and is a member of the machine learning group. Prior to this position he was a post-doctoral researcher and FAPESP grant holder at the University of Sao Paulo under the supervision of Alneu Lopes. He gained his doctoral degree in computer science at the University of Porto under the guidance of Luis Torgo and José João Almeida. Prior to this Brett spent 14 years in industry as a software engineer. He holds undergraduate and post-graduate qualifications from Plymouth University and the University of London.

Robson Fernandes, ICMC, University of São Paulo

Robson is a Master's student in Mathematics, Statistics and Computing Applied to Industry at the Institute of Mathematical and Computer Sciences of the University of São Paulo - ICMC -USP, under the supervision of Alneu Lopes and co-supervision of Brett Drury.He holds a Post-Graduate qualification in Distributed Software Architecture at the Pontifical Catholic University of Minas Gerais - PUC-MG, Brazil; MBA in Service Oriented Software Engineering (SOA) from METROCAMP, Brazil; Graduation in Information Technology Management from Anhanguera Educacional College, Brazil. He is currently a lecturer on the Post-Graduate course in Software Engineering and Management and Governance in Information Technology at Sacred Heart University (USC), Brazil; and a Software Developer.

Alneu de Andrade Lopes, ICMC, University of São Paulo

Alneu is Assistant Professor in the University of São Paulo at São Carlos. Member of the Machine Learning Group (LABIC). My research interests lie in the fields of Machine Learning and Data Mining. In particular I am interested in Graph-Based Relational Learning.

Published
2017-07-01
How to Cite
Drury, B., Fernandes, R., & Lopes, A. de A. (2017). BrAgriNews: A Temporal-Causal Brazilian-Portuguese Corpus for Agriculture. Linguamática, 9(1), 41-54. https://doi.org/10.21814/lm.9.1.245
Section
Project Presentations