Robust multilingual Named Entity Recognition with shallow semi-supervised features

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Agerri, Rodrigoca
  • dc.contributor.author Rigau Claramunt, Germanca
  • dc.date.accessioned 2017-12-19T10:20:49Z
  • dc.date.issued 2016
  • dc.description.abstract We present a multilingual Named Entity Recognition approach based on a robust and general set of features across languages and datasets. Our system combines shallow local information with clustering semi-supervised features induced on large amounts of unlabeled text. Understanding via empirical experimentation how to effectively combine various types of clustering features allows us to seamlessly export our system to other datasets and languages. The result is a simple but highly competitive system which obtains state of the art results across five languages and twelve datasets. The results are reported on standard shared task evaluation data such as CoNLL for English, Spanish and Dutch. Furthermore, and despite the lack of linguistically motivated features, we also report best results for languages such as Basque and German. In addition, we demonstrate that our method also obtains very competitive results even when the amount of supervised data is cut by half, alleviating the dependency on manually annotated data. Finally, the results show that our emphasis on clustering features is crucial to develop robust out-of-domain models. The system and models are freely available to facilitate its use and guarantee the reproducibility of results.en
  • dc.description.sponsorship This work has been supported by the European projects NewsReader, EC/FP7/316404 and QTLeap – EC/FP7/610516, and by the Spanish Ministry of Economy and Competitiveness (MINECO) SKATER, Grant No. TIN2012-38584-C06-01 and TUNER, TIN2015-65308-C5-1-R.
  • dc.format.mimetype application/pdfca
  • dc.identifier.citation Agerri R, Rigau G. Robust multilingual Named Entity Recognition with shallow semi-supervised features. Artif Intell. 2016;238: 63-82. DOI: 10.1016/j.artint.2016.05.003
  • dc.identifier.doi http://dx.doi.org/10.1016/j.artint.2016.05.003
  • dc.identifier.issn 0004-3702
  • dc.identifier.uri http://hdl.handle.net/10230/33529
  • dc.language.iso eng
  • dc.publisher Elsevierca
  • dc.relation.ispartof Artificial Intelligence. 2016;238: 63-82.
  • dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/316404
  • dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/610516
  • dc.relation.projectID info:eu-repo/grantAgreement/ES/3PN/TIN2012-38584-C06-01
  • dc.relation.projectID info:eu-repo/grantAgreement/ES/1PE/TIN2015-65308-C5-1-R
  • dc.rights © Elsevier http://dx.doi.org/10.1016/j.artint.2016.05.003
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.subject.keyword Named entity recognitionen
  • dc.subject.keyword Information extractionen
  • dc.subject.keyword Clusteringen
  • dc.subject.keyword Semi-supervised learningen
  • dc.subject.keyword Natural language processingen
  • dc.title Robust multilingual Named Entity Recognition with shallow semi-supervised featuresca
  • dc.type info:eu-repo/semantics/article
  • dc.type.version info:eu-repo/semantics/acceptedVersion