I was interested in a recent article in Genetic Engineering & Biotechnology News (GEN), entitled “Big Data Won’t Save Pharma, But Smart Data Might”, which presents the argument that we need to do more than just get access to the huge amounts of genomic, genetic, medical and other biological and chemical data being generated – we need to be able to apply analytical approaches, that can then be used to generate actionable insights. The article discusses some of the creative ways scientists in the industry are approaching these problems, and I felt that a recent project shows just this sort of inventive approach to getting better use from large data sets.
One of Instem’s customers has recently completed the development phase for a non-clinical Datamart. This project takes data feeds from several preclinical data collection systems (e.g. Watson LIMS for exposure data, Provantis for in-life data, etc.) and combines these into a central database that can be queried by users, for both on-going study data and for data from completed studies. The access to on-going studies is very useful for study directors to carry out quick checks (e.g. any anomalies in current body weights, clinical observations); while the access to the legacy data is used in a more investigational way, for example to review the data on compounds now in clinical trials, and assess the translational impact of particular preclinical markers.
One aspect of the extract-transform-load processes is the harmonisation and mapping of the pathology vocabularies, from the potentially very variable raw microhistopathology terms used in the data collection systems, to something more controlled, so that users of the non-clinical Datamart can search for “all liver necrosis observations” quickly and easily. This is where Instem Scientific were able to help. We used our Metawise toolset to map around 10k raw terms to a standardised biomedical ontology, which includes concepts and identifiers from public vocabularies (e.g. INHAND, MeSH, MedDRA, UMLS). Each term was also categorised into top-level groupings such as vascular, inflammatory, neoplastic, degenerative. This harmonised ToxPath vocabulary facilitates the daily searches, providing over 95% hit rate for the Datamart users*.
This project demonstrates the innovative use of data, collected predominantly for regulatory purposes, which when integrated and made visible and searchable can be analysed to address many more problems within pharmaceutical development.
[* Instem's poster at the Society for Toxicologic Pathology demonstrated use of the ToxPath vocabs across public domain data]

