López Massaguer, Oriol, 1972-Pinto Gil, KevinSanz, FerranAmberg, AlexanderAnger, Lennart T.Stolte, ManuelaRavagli, CarloMarc, PhilippePastor Maeso, Manuel2018-03-152018-03-152018López-Massaguer O, Pinto-Gil K, Sanz F, Amberg A, Anger LT, Stolte M et al. Generating Modeling Data From Repeat-Dose Toxicity Reports. Toxicol Sci. 2018;162(1):287-300. DOI: 10.1093/toxsci/kfx2541096-0929http://hdl.handle.net/10230/34135Over the past decades, pharmaceutical companies have conducted a large number of high-quality in vivo repeat-dose toxicity (RDT) studies for regulatory purposes. As part of the eTOX project, a high number of these studies have been compiled and integrated into a database. This valuable resource can be queried directly, but it can be further exploited to build predictive models. As the studies were originally conducted to investigate the properties of individual compounds, the experimental conditions across the studies are highly heterogeneous. Consequently, the original data required normalization/standardization, filtering, categorization and integration to make possible any data analysis (such as building predictive models). Additionally, the primary objectives of the RDT studies were to identify toxicological findings, most of which do not directly translate to in vivo endpoints. This article describes a method to extract datasets containing comparable toxicological properties for a series of compounds amenable for building predictive models. The proposed strategy starts with the normalization of the terms used within the original reports. Then, comparable datasets are extracted from the database by applying filters based on the experimental conditions. Finally, carefully selected profiles of toxicological findings are mapped to endpoints of interest, generating QSAR-like tables. In this work, we describe in detail the strategy and tools used for carrying out these transformations and illustrate its application in a data sample extracted from the eTOX database. The suitability of the resulting tables for developing hazard-predicting models was investigated by building proof-of-concept models for in vivo liver endpoints.application/pdfeng© The Author 2017. Published by Oxford University Press on behalf of the Society of Toxicology.This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.comGenerating modeling data from repeat-dose toxicity reportsinfo:eu-repo/semantics/articlehttp://dx.doi.org/10.1093/toxsci/kfx254RDTToxicology databasesIn vivo dataIn silico modelingOntologiesinfo:eu-repo/semantics/openAccess