MultiBooked_Corpora [research data]

Barnes, Jeremy2018-02-192018-02-192015-01Barnes J. MultiBooked_Corpora [research data]. Repositori Digital de la UPF: Barcelona; 2015. Disponible a: http://hdl.handle.net/10230/33928http://hdl.handle.net/10230/33928The corpora are compiled from hotel reviews taken mainly from booking.com. The corpora are in Kaf/Naf format [https://github.com/opener-project/kaf/wiki/KAF-structure-overview] [https://github.com/newsreader/NAF], which is an xml-style stand-off format that allows for multiple layers of annotation. Each review was sentence- and word-tokenized and lemmatized using Freeling [http://nlp.lsi.upc.edu/freeling/node/1] for Catalan and ixa-pipes [http://ixa2.si.ehu.es/ixa-pipes/] for Basque. Finally, for each language two annotators annotated opinion holders, opinion targets, and opinion expressions for each review, following the guidelines set out in the OpeNER project [http://www.opener-project.eu/]. Details can be found in the paper. This package includes the two corpora, as well as providing scripts to obtain corpus statistics (corpus_stats.py), reproduce the benchmarks reported in the paper (crf.py), extract only the opinionated units from the text (extract_opinions.py), or map the aspect-level annotations to sentence- or document-level annotated corpora (extract_sentences.py). Requirements for stats and extraction: Python 3, NumPyWe release two corpora of hotel reviews annotated for aspect-level sentiment analysis in Catalan and Basque. We also include scripts which allow the conversion to sentence-level annotations and provide benchmarks for opinion holder, target, and expression extraction based on conditional random fields.catLicensed under the terms of the Creative Commons CC-BY public license.MultiBooked_Corpora [research data]info:eu-repo/semantics/otherhttps://doi.org/10.34810/data398Cross-lingual sentiment analysisSentiment analysisUnder-resourced languagesCatalanBasqueAnálisis de sentimientoCatalánEuskeraAnálisi de sentimentCatalàIritzienAnalisiaEuskarainfo:eu-repo/semantics/openAccess