OSIRISv1.2: a named entity recognition system for sequence variants of genes in biomedical literature

Welcome to the UPF Digital Repository

Furlong LI, Dach H, Hofmann-Apitius M, Sanz F. OSIRISv1.2: a named entity recognition system for sequence variants of genes in biomedical literature. BMC Bioinformatics. 2008; 9: 84. DOI 10.1186/1471-2105-9-84
http://hdl.handle.net/10230/16433
To cite or link this document: http://hdl.handle.net/10230/16433
dc.contributor.author Furlong, Laura I.
dc.contributor.author Dach, Holger
dc.contributor.author Hofmann Apitius, Martin
dc.contributor.author Sanz, Ferran
dc.date.accessioned 2012-05-09T08:42:58Z
dc.date.available 2012-05-09T08:42:58Z
dc.date.issued 2008
dc.identifier.citation Furlong LI, Dach H, Hofmann-Apitius M, Sanz F. OSIRISv1.2: a named entity recognition system for sequence variants of genes in biomedical literature. BMC Bioinformatics. 2008; 9: 84. DOI 10.1186/1471-2105-9-84
dc.identifier.issn 1471-2105
dc.identifier.uri http://hdl.handle.net/10230/16433
dc.description.abstract Background: Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the variations are generally found in biomedical literature. The identification of the relevant documents and the extraction of the information from them are hampered by the large size of literature databases and the lack of widely accepted standard notation for biomedical entities. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required. Results: Our group has previously reported the development of OSIRIS, a system aimed at the retrieval of literature about allelic variants of genes http://ibi.imim.es/osirisform.html. Here we describe the development of a new version of OSIRIS (OSIRISv1.2, http://ibi.imim.es/OSIRISv1.2.html webcite) which incorporates a new entity recognition module and is built on top of a local mirror of the MEDLINE collection and HgenetInfoDB: a database that collects data on human gene sequence variations. The new entity recognition module is based on a pattern-based search algorithm for the identification of variation terms in the texts and their mapping to dbSNP identifiers. The performance of OSIRISv1.2 was evaluated on a manually annotated corpus, resulting in 99% precision, 82% recall, and an F-score of 0.89. As an example, the application of the system for collecting literature citations for the allelic variants of genes related to the diseases intracranial aneurysm and breast cancer is presented. Conclusion: OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases. The application of OSIRISv1.2 in combination with controlled vocabularies like MeSH provides a way to identify associations of biomedical interest, such as those that relate SNPs with diseases.
dc.language.iso eng
dc.publisher BioMed Central
dc.relation.ispartof BMC Bioinformatics. 2008; 9: 84
dc.rights (c) 2008 Furlong et al. Creative Commons Attribution License
dc.rights.uri http://creativecommons.org/licenses/by/2.0/
dc.subject.other Genètica humana -- Variació
dc.subject.other Ciències de la salut -- Bibliografia -- Bases de dades
dc.title OSIRISv1.2: a named entity recognition system for sequence variants of genes in biomedical literature
dc.type info:eu-repo/semantics/article
dc.identifier.doi http://dx.doi.org/10.1186/1471-2105-9-84
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.type.version info:eu-repo/semantics/publishedVersion


See full text
This document is licensed under a Creative Commons license:

Search


Advanced Search

Browse

My Account

Statistics