Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Bravo Serrano, Àlex, 1984-ca
- dc.contributor.author Li, Tong Shuca
- dc.contributor.author Su, Andrew I.ca
- dc.contributor.author Good, Benjamin M.ca
- dc.contributor.author Furlong, Laura I., 1971-ca
- dc.date.accessioned 2016-12-14T08:06:28Z
- dc.date.available 2016-12-14T08:06:28Z
- dc.date.issued 2016ca
- dc.description.abstract Drug toxicity is a major concern for both regulatory agencies and the pharmaceutical industry. In this context, text-mining methods for the identification of drug side effects from free text are key for the development of up-to-date knowledge sources on drug adverse reactions. We present a new system for identification of drug side effects from the literature that combines three approaches: machine learning, rule- and knowledge-based approaches. This system has been developed to address the Task 3.B of Biocreative V challenge (BC5) dealing with Chemical-induced Disease (CID) relations. The first two approaches focus on identifying relations at the sentence-level, while the knowledge-based approach is applied both at sentence and abstract levels. The machine learning method is based on the BeFree system using two corpora as training data: the annotated data provided by the CID task organizers and a new CID corpus developed by crowdsourcing. Different combinations of results from the three strategies were selected for each run of the challenge. In the final evaluation setting, the system achieved the highest Recall of the challenge (63%). By performing an error analysis, we identified the main causes of misclassifications and areas for improving of our system, and highlighted the need of consistent gold standard data sets for advancing the state of the art in text mining of drug side effects.
- dc.description.sponsorship This work was supported by grants from the National Institutes of Health (GM114833, GM089820, TR001114); the Instituto de Salud Carlos III-Fondo Europeo de Desarrollo Regional (PI13/00082 and CP10/00524 to A.B. and L.I.F.); the Innovative Medicines Initiative-Joint Undertaking (eTOX no. 115002, Open PHACTs no. 115191, EMIF no. 115372, iPiE no. 115735 to A.B. and L.I.F.), resources of which are composed of financial contributions from the European Union’s Seventh Framework Programme (FP7/2007–13) and European Federation of Pharmaceutical Industries and Associations; and the European Union Horizon 2020 Programme 2014–20 (MedBioinformatics no. 634143 and Elixir-Excelerate no. 676559 to A.B. and L.I.F.). The Research Programme on Biomedical Informatics (GRIB) is a node of the Spanish National Institute of Bioinformatics (INB).
- dc.format.mimetype application/pdfca
- dc.identifier.citation Bravo À, Li TS, Su AI, Good BM, Furlong LI. Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text. Database (Oxford). 2016 Jun 15; 2016. DOI: 10.1093/database/baw094ca
- dc.identifier.doi http://dx.doi.org/10.1093/database/baw094
- dc.identifier.issn 1758-0463ca
- dc.identifier.uri http://hdl.handle.net/10230/27762
- dc.language.iso engca
- dc.publisher Oxford University Pressca
- dc.relation.ispartof Database (Oxford). 2016 Jun 15; 2016
- dc.rights © Alex Bravo, Tong Shu Li, Andrew I. Su, Benjamin M. Good and/nLaura I. Furlong 2016. Published by Oxford University Press. This is an Open Access article distributed under the terms of a Creative Commons Attribution Licenseca
- dc.rights.accessRights info:eu-repo/semantics/openAccessca
- dc.rights.uri http://creativecommons.org/licenses/by/4.0/
- dc.subject.other Medicaments -- Efectes secundaris
- dc.title Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in textca
- dc.type info:eu-repo/semantics/articleca
- dc.type.version info:eu-repo/semantics/publishedVersionca