Identification of the core chemical structure in SureChEMBL patents

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Falaguera Mata, Maria José, 1994-
  • dc.contributor.author Mestres i López, Jordi
  • dc.date.accessioned 2021-09-27T06:28:00Z
  • dc.date.issued 2021
  • dc.description.abstract The SureChEMBL database provides open access to 17 million chemical entities mentioned in 14 million patents published since 1970. However, alongside with molecules covered by patent claims, the database is full of starting materials and intermediate products of little pharmacological relevance. Herein, we introduce a new filtering protocol to automatically select the core chemical structures best representing a congeneric series of pharmacologically relevant molecules in patents. The protocol is first validated against a selection of 890 SureChEMBL patents for which a total of 51,738 manually curated molecules are deposited in ChEMBL. Our protocol was able to select 92.5% of the molecules in ChEMBL from all 270,968 molecules in SureChEMBL for those patents. Subsequently, the protocol was applied to all 240,988 US pharmacological patents for which 9,111,706 molecules are available in SureChEMBL. The unsupervised filtering process selected 5,949,214 molecules (65.3% of the total number of molecules) that form highly congeneric chemical series in 188,795 of those patents (78.3% of the total number of patents). A SureChEMBL version enriched with molecules of pharmacological relevance is available for download at https://ftp.ebi.ac.uk/pub/databases/chembl/SureChEMBLccs.
  • dc.description.sponsorship This work was supported by a RETOS project from the Spanish Ministerio de Ciencia, Innovación y Universidades (SAF2017‐83614‐R).
  • dc.format.mimetype application/pdf
  • dc.identifier.citation Falaguera MJ, Mestres J. Identification of the core chemical structure in SureChEMBL patents. J Chem Inf Model. 2021;61(5):2241-7. DOI: 10.1021/acs.jcim.1c00151
  • dc.identifier.doi http://dx.doi.org/10.1021/acs.jcim.1c00151
  • dc.identifier.issn 1549-9596
  • dc.identifier.uri http://hdl.handle.net/10230/48503
  • dc.language.iso eng
  • dc.publisher American Chemical Society (ACS)
  • dc.relation.ispartof J Chem Inf Model. 2021;61(5):2241-7
  • dc.relation.projectID info:eu-repo/grantAgreement/ES/2PE/SAF2017-83614-R
  • dc.rights This document is the Accepted Manuscript version of a Published Work that appeared in final form in Journal of chemical information and modeling, copyright © American Chemical Society after peer review and technical editing by the publisher. To access the final edited and published work see http://dx.doi.org/10.1021/acs.jcim.1c00151.
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.title Identification of the core chemical structure in SureChEMBL patents
  • dc.type info:eu-repo/semantics/article
  • dc.type.version info:eu-repo/semantics/acceptedVersion