Distilling an ensemble of greedy dependency parsers into one MST parser

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Kuncoro, Adhiguna
  • dc.contributor.author Ballesteros, Miguel
  • dc.contributor.author Kong, Lingpeng
  • dc.contributor.author Dyer, Chris
  • dc.contributor.author Smith, Noah A.
  • dc.date.accessioned 2024-02-19T10:09:07Z
  • dc.date.available 2024-02-19T10:09:07Z
  • dc.date.issued 2016
  • dc.description Comunicació presentada a la 2016 Conference on Empirical Methods in Natural Language Processing, celebrada de l'1 al 5 de novembre de 2016 a Austin, Texas.
  • dc.description.abstract We introduce two first-order graph-based dependency parsers achieving a new state of the art. The first is a consensus parser built from an ensemble of independently trained greedy LSTM transition-based parsers with different random initializations. We cast this approach as minimum Bayes risk decoding (under the Hamming cost) and argue that weaker consensus within the ensemble is a useful signal of difficulty or ambiguity. The second parser is a “distillation” of the ensemble into a single model. We train the distillation parser using a structured hinge loss objective with a novel cost that incorporates ensemble uncertainty estimates for each possible attachment, thereby avoiding the intractable crossentropy computations required by applying standard distillation objectives to problems with structured outputs. The first-order distillation parser matches or surpasses the state of the art on English, Chinese, and German.
  • dc.description.sponsorship We thank Swabha Swayamdipta, Sam Thomson, Jesse Dodge, Dallas Card, Yuichiro Sawai, Graham Neubig, and the anonymous reviewers for useful feedback. We also thank Juntao Yu and Bernd Bohnet for re-running the parser of Bohnet and Nivre (2012) on Chinese with gold tags. This work was sponsored in part by the Defense Advanced Research Projects Agency (DARPA) Information Innovation Office (I2O) under the Low Resource Languages for Emergent Incidents (LORELEI) program issued by DARPA/I2O under Contract No. HR0011-15-C-0114; it was also supported in part by Contract No. W911NF-15-1-0543 with the DARPA and the Army Research Office (ARO). Approved for public release, distribution unlimited. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government. Miguel Ballesteros was supported by the European Commission under the contract numbers FP7-ICT-610411 (project MULTISENSOR) and H2020-RIA-645012 (project KRISTINA).
  • dc.format.mimetype application/pdf
  • dc.identifier.citation Kuncoro A, Ballesteros M, Kong L, Dyer C, Smith NA. Distilling an ensemble of greedy dependency parsers into one MST parser. In: Su J, Duh K, Carreras X, editors. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing; 2016 Nov 1-5; Austin, Texas. [Texas]: Association for Computational Linguistics; 2016. p. 1744-53. DOI: 10.18653/v1/d16-1180
  • dc.identifier.doi http://dx.doi.org/10.18653/v1/d16-1180
  • dc.identifier.uri http://hdl.handle.net/10230/59142
  • dc.language.iso eng
  • dc.publisher ACL (Association for Computational Linguistics)
  • dc.relation.ispartof Su J, Duh K, Carreras X, editors. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing; 2016 Nov 1-5; Austin, Texas. [Texas]: Association for Computational Linguistics; 2016. p. 1744-53
  • dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/610411
  • dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/645012
  • dc.rights ACL materials are Copyright © 1963–2023 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/
  • dc.subject.other Estadística bayesiana
  • dc.subject.other Xarxes neuronals (Informàtica)
  • dc.subject.other Algorismes de grafs
  • dc.title Distilling an ensemble of greedy dependency parsers into one MST parser
  • dc.type info:eu-repo/semantics/conferenceObject
  • dc.type.version info:eu-repo/semantics/publishedVersion