Show simple item record Barnes, Jeremy 2018-02-19T09:52:11Z 2018-02-19T09:52:11Z 2015-01
dc.identifier.citation Barnes J. MultiBooked_Corpora [research data]. Repositori Digital de la UPF: Barcelona; 2015. Disponible a:
dc.description The corpora are compiled from hotel reviews taken mainly from The corpora are in Kaf/Naf format [] [], which is an xml-style stand-off format that allows for multiple layers of annotation. Each review was sentence- and word-tokenized and lemmatized using Freeling [] for Catalan and ixa-pipes [] for Basque. Finally, for each language two annotators annotated opinion holders, opinion targets, and opinion expressions for each review, following the guidelines set out in the OpeNER project []. Details can be found in the paper. This package includes the two corpora, as well as providing scripts to obtain corpus statistics (, reproduce the benchmarks reported in the paper (, extract only the opinionated units from the text (, or map the aspect-level annotations to sentence- or document-level annotated corpora ( Requirements for stats and extraction: Python 3, NumPy
dc.description.abstract We release two corpora of hotel reviews annotated for aspect-level sentiment analysis in Catalan and Basque. We also include scripts which allow the conversion to sentence-level annotations and provide benchmarks for opinion holder, target, and expression extraction based on conditional random fields.
dc.language.iso cat
dc.language.iso eus
dc.relation Publicació relacionada: Barnes J, Lambert P, Badia T. Multibooked: a corpus of Basque and Catalan hotel reviews annotated for aspect-level sentiment classification. Paper persented at: Language Resources and Evaluation Conference (LREC); 2018 May 7-12; Miyazaki, Japan.
dc.rights Licensed under the terms of the Creative Commons CC-BY public license.
dc.title MultiBooked_Corpora [research data]
dc.type info:eu-repo/semantics/other
dc.type Dataset
dc.subject.keyword Cross-lingual sentiment analysis
dc.subject.keyword Sentiment analysis
dc.subject.keyword Under-resourced languages
dc.subject.keyword Catalan
dc.subject.keyword Basque
dc.subject.keyword Análisis de sentimiento
dc.subject.keyword Catalán
dc.subject.keyword Euskera
dc.subject.keyword Análisi de sentiment
dc.subject.keyword Català
dc.subject.keyword Iritzien
dc.subject.keyword Analisia
dc.subject.keyword Euskara
dc.rights.accessRights info:eu-repo/semantics/openAccess


This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search


My Account


Compliant to Partaking