MultiBooked_Corpora [research data]

Barnes, Jeremy

MultiBooked_Corpora [research data]

Citació

Barnes J. MultiBooked_Corpora [research data]. Repositori Digital de la UPF: Barcelona; 2015. Disponible a: http://hdl.handle.net/10230/33928

Enllaç permanent

http://hdl.handle.net/10230/33928

Descripció

Resum
We release two corpora of hotel reviews annotated for aspect-level sentiment analysis in Catalan and Basque. We also include scripts which allow the conversion to sentence-level annotations and provide benchmarks for opinion holder, target, and expression extraction based on conditional random fields.
Descripció
The corpora are compiled from hotel reviews taken mainly from booking.com. The corpora are in Kaf/Naf format [https://github.com/opener-project/kaf/wiki/KAF-structure-overview] [https://github.com/newsreader/NAF], which is an xml-style stand-off format that allows for multiple layers of annotation. Each review was sentence- and word-tokenized and lemmatized using Freeling [http://nlp.lsi.upc.edu/freeling/node/1] for Catalan and ixa-pipes [http://ixa2.si.ehu.es/ixa-pipes/] for Basque. Finally, for each language two annotators annotated opinion holders, opinion targets, and opinion expressions for each review, following the guidelines set out in the OpeNER project [http://www.opener-project.eu/]. Details can be found in the paper. This package includes the two corpora, as well as providing scripts to obtain corpus statistics (corpus_stats.py), reproduce the benchmarks reported in the paper (crf.py), extract only the opinionated units from the text (extract_opinions.py), or map the aspect-level annotations to sentence- or document-level annotated corpora (extract_sentences.py). Requirements for stats and extraction: Python 3, NumPy
DOI
https://doi.org/10.34810/data398
Col·leccions
Departament de Traducció i Ciències del llenguatge. Dades primàries

Mostra el registre complet

MultiBooked_Corpora [research data]

MultiBooked_Corpora [research data]

Fitxers

Data

Autories

Resum

Descripció

DOI

Col·leccions