Strength in numbers: more is better for drug safety/risk prediction

The re-use of data is an ever present issue within the pharmaceutical industry.  Publicly available data is an important resource for companies looking to develop new drugs, and find new uses for current ones.  The search has just become easier with the publication of the DrugMatrix database by the National Toxicology Program.

DrugMatrix is a large molecular toxicology reference database and informatics system. It contains data on the effects of more than 600 therapeutic, industrial and environment chemicals at a variety of doses and exposure times. For each of these compounds, relevant data curated from the literature is available, as well as assay results for inhibition of 132 protein targets. These have been chosen for their importance in drug development, and so among them, we can find drug-metabolizing enzymes or proteins involved in important toxicities.

The core of DrugMatrix is a set of highly standardised toxicological experiments performed in male, Sprague-Dawley rats, resulting in a wealth of data regarding histopathology, clinical chemistry and gene expression responses elicited by 638 compounds. The main strength of this database is that it provides the basis to linking macroscopic observations to alterations in genetic pathways. And the great news is, Safety Intelligence Program (SIP) users can now access this fantastic resource, with the addition of ~88,000 curated assertions with DrugMatrix evidence. The inclusion of the DrugMatrix data in SIP makes it possible to answer questions that were currently not addressable through the DrugMatrix interface, as the data is now integrated with knowledge extracted from several other relevant sources such as Medline, DailyMed and FDA NDAs.

dmblog1

DrugMatrix interface showing the results of an experiment where the administration of 2mg/kg Cisplatin for 3 days caused a 1.4-fold increase in blood urea nitrogen.

To give an example, let’s look at the effects of Cisplatin in rats. According to DrugMatrix, this compound increases blood urea nitrogen level, which is an important safety signal because it is an indicator of renal health. If the kidneys are not working properly and the glomerular filtration rate decreases, blood urea nitrogen will increase. This compound level can also be associated with heart failure, dehydration, fever or high-protein diet.

dmblog2

SIP ToxPath knowledgebase summary matrix showing that Cisplatin-induced increases in blood urea nitrogen occur in different species and relevant datasources

The next thing we might be interested in knowing is whether the effect is replicated in other animal species. While DrugMatrix only includes rat information, a quick search in SIP will point to the answer. Firstly, we will see that apart from DrugMatrix, there are Medline records describing the same observation in Sprague-Dawley and Fischer 344 strains. Increases in blood urea nitrogen are also described in mouse and rabbit. More importantly, this finding is also seen in humans according to DailyMed and the Electronic Medicines Compendium.

But should we worry about kidney function because of this increase? We can search SIP to see if Cisplatin is known to be associated with kidney dysfunction in patients. Again, we can see that Cisplatin is linked to liver disorder-related biomedical observations in 43 assertions from 6 different datasources. This makes sense, as Cisplatin is a well-known nephrotoxicant and kidney toxicity is dose-limiting in this type of chemotherapy.

dmblog4

Summary matrix of associations between Cisplatin and kidney disorder in humans, and the sources where the data has been obtained from.

Finally, we might also be interested in assessing whether compounds that cause an increase in blood urea nitrogen share a similar structure or protein target. While DrugMatrix is an excellent tool for this task, it only queries the 600+ compounds that are included in the dataset. Conversely, at present, SIP contains 85,996 compounds and 22,036 proteins, which are part of 2,574,454 assertions, and this allows the users to expand the search and increases the likelihood of finding meaningful results.

This is a very good example that when it comes to toxicology and pathology data, there is certainly strength in numbers, and when two powerful tools such as these are put together, their usefulness is greatly enhanced.

A small subset of curated SIP assertions linking Cisplatin to kidney disorder in humans.

A small subset of curated SIP assertions linking Cisplatin to kidney disorder in humans.

Does your institution have a plan for BLAST?

Do you or your colleagues use NCBI’s BLAST algorithm to find and analyze biological sequences?

SRS Blast Results View

In a BeyeNetwork article, Dr. Richard Casey asserts what many of us have suspected for over a decade:  BLAST is probably the most widely used bioinformatics program.  Here’s a small list of tasks that are enabled by BLAST (see also http://www.ncbi.nlm.nih.gov/books/NBK21097/ ):

  • inferring the function of newly sequenced genes
  • exploring evolutionary relationships:  I am trying to develop a drug targeting HMG-CoA reductase in humans.  Is the rat homolog similar enough that I don’t need to use primates in my studies?
  • exploring the known natural diversity of genes similar to your gene of interest
  • keeping on top of intellectual property concerns:  Has anyone submitted a patent for an enzyme that I want to commercialize?

Many scientists are exposed to NCBI’s BLAST web interface as part of their academic training.  Due to this familiarity, they may be inclined to use the NCBI interface whenever they need to do a BLAST analysis.  But how can you protect the intellectual value of your sensitive queries if they are being sent over the public internet?  And naturally, if the scientists in your organization are using NCBI’s BLAST web interface, they will be limited to searching against the databases that NCBI provides.  If your organization has gone to the trouble to sequence some genomes, you are going to need a local solution for BLASTing those genomes.

Over the years, there have been open source initiatives for locally-deployed web-based BLAST interfaces, but over the long-term, they are usually poorly maintained, with no single point of accountability for support.  Many of these tools are free only for academic or non-profit institutions.  Perhaps of greatest concern, these tools tend to be analytical dead-ends or offer just a single view of the results.  In contrast, Instem Scientific SRS is a true data integration platform.  Any BLAST analysis carried out in SRS is just the starting point for subsequent analyses, like a multiple sequence alignment.  You can also link to pages for your BLAST hits in their original, rich format.  Multiple canned or user-generated views help you make sense of large result sets, as do filtering and sorting options.

Organizations that recognize the limitations of NCBI’s public BLAST interface or open-source BLAST interface often initiate in-house development of a BLAST interface from scratch.  In some organizations you will even find multiple efforts like this.  Is maintaining a custom BLAST interface the best use of your highly-skilled and highly-compensated bioinformatics developers’ time?  Shouldn’t they be focusing their efforts on innovative solutions within your business domain?

If you are using a home-brewed BLAST interface, do you have a plan for accommodating changes in the blast binaries?  For example, have you made the recommended migration to blast+?  How agile can you be in accommodating new command line parameters or new output formats?  Subscribers to SRS maintenance receive parser updates several times throughout the year to accommodate these kinds of changes to bioinformatics applications and file formats.

For those of you who have an in-house BLAST interface, how do you keep up to date with your BLAST database sources?  The optional SRS Prisma module downloads and formats the latest version of common sequence databases.  Even seasoned bioinformaticians who prefer to work from the command line will benefit from the up-to-date, centralized BLAST databases delivered by Prisma.

Your organization needs a plan for a BLAST interface that is

  • easy to use
  • scales well for multiple users
  • secure
  • comes with a commitment for support

Instem Scientific’s SRS meets all of these requirements.  Don’t let such an important tool become an afterthought in your institution – SRS has a decades-long track record of providing reliable BLASTing to institutions of all sizes

Who knows what in your company? Identification of subject matter experts using novel technologies

Matrix visualisation (OmniViz CoMet) of the bioprocessing keywords with authors. The colour depicts correlation level, with white showing high levels of correlation, and black none.
It’s easy to see, for example, that Authors 13, 14 and 15 are relatively prolific in their publications, and that Author 13 appears to be expert in the fields of chromatography, spectroscopy and other pharmaceutical analysis techniques.

How does a big organisation know who are the internal specialists in breast cancer? In HPLC? In kinase inhibitors? It’s a difficult question, and increases with the size of the organisation; and yet without the answers, valuable time and knowledge can be wasted. Instem Scientific have recently completed a project with one pharmaceutical company that came up with an ingenious solution.

They collated all the publication records for any scientist within their organisation, from a number of sources (Medline, Embase, Biosis, Scopus, and Current Contents) into an Endnote dataset, and wanted to annotate each abstract with key categories – therapeutic areas, drug classes, drug processing techniques, analytical technologies. Then, they would use these categories to “tag” the abstract authors with that skill set, and hey presto! a searchable data set of internal talent and expertise.

Of course, many abstract providers already give sets of key words, but these can be sporadic, and assembling abstracts from several different sources means that the keywords aren’t consistent across the whole set. And this is where Instem Scientific came in. We used our data integration and harmonisation tools (Metawise) and Instem Scientific’s life science vocabularies to annotate all the abstracts, with both the keywords and categories across biomedical observations (e.g. Oncology, breast cancer), protein target classes (GPCR, 5-HT2B), analytical techniques (Chromatography, size exclusion-HPLC) and others.

Instem’s Metawise is a concept identification and translation engine that utilizes proprietary entity recognition algorithms to identify key terms in text using a unique approach based on term structure and semantics. This means that however someone has described a particular type of cancer or protein class, Metawise can mark-up the text and find the appropriate key concept, hugely enhancing both recall and accuracy.

Obviously, although this project was on public domain publication records, exactly the same process could be run across internal reports, documents, emails etc. – again so that any organisation can make best use of its own resources.

Clustering visualisation (OmniViz Galaxy, with K-means clustering) of a set of abstracts (~12k), clustered on the basis of all keywords across the Instem Scientific vocabularies. It is then easy to navigate and view individual (or multiple) clusters of abstracts which will contain similar topics.