Analysis of Drug Target Literature Using TxtViz: Differentiating between indications and adverse events

TxtViz is a powerful and flexible analytics tool that allows the rapid, unbiased assessment of text documents. Here we describe a workflow for the analysis of drug target literature, which could support drug development by revealing new therapeutic applications and possible safety liabilities. Conventional literature searches have the potential to reveal new insights around a drug or target, such as potential novel indications or side effects. However, one drug’s side effect can be another’s indication, so a standard keyword-based search would return a mixture of both, which can only be resolved through manual review. TxtViz, however, allows the detection and automatic categorisation of these two scenarios, allowing the user to focus their literature searches more accurately.

Ipilimumab, a CTLA-4 antagonist developed by Bristol-Myers Squibb, was approved for the treatment of melanoma in 2011 in the US and 2012 in the UK. A PubMed search of the Ipilimumab target ‘CTLA-4’ was imported and analysed in TxtViz, and articles were categorised according to the potential indications and side effects of drugs that target this receptor.

TxtViz is inclusive and adaptable; by providing a visual overview of the literature around a biological target you can learn about a topic, refine the dataset and then use vocabularies to tease out specific information, such as disease associations. The ThemeMap view allows you to quickly get to grips with the major topics in a dataset (Figure 1). Navigating around the diagram, prevalent ideas such as treatment of melanoma and the association between CTLA-4 and graft survival can be immediately identified. This is a good example of how a biological target with several indications can be used as a drug target for a number of different diseases.

CTLA-4 ONE

Figure 1. ThemeMap of PubMed ‘CTLA-4’ search results

Peaks represent clusters of similar papers. The larger the peak, the richer in papers the corresponding cluster is, i.e the more prevalent the theme is in the literature. The terms shown in the red boxes were obtained by placing probes around the diagram, these are the main terms associating the papers in the peak on which the probe was placed.

Views such as ThemeMap provide an instant overview of a broad thematic area. To investigate more detailed associations, such as indications and side effects, the TxtViz heat map, or CoMet view, is used (Figure 2). At this stage it can beneficial to select a subset of literature that addresses your area of interest. For example, to investigate the effects associated with antagonism of a particular target, the subset of articles referring to antagonism, blockade, etc. can be rapidly identified with a simple synonym-enhanced text query. This tends to be more important with established biological targets which already have therapeutic use, less so for new targets.

CTLA-4 TWO

Figure 2. CoMet View of a CTLA-4 ‘Blockade’ subset, which has been indexed using thesauri of disease terminology and indication/side effect language. Red colouration indicates statistical overassociation between disease terms and indications/side effects, with blue colouration denoting underassociation. Double clicking a cell in this view gives the underlying evidence in the record viewer. This view identified indications such as Non-Hodgkin lymphoma, experimental autoimmune myocarditis, hepatitis and leishmanias. Side effects such as exacerbation of malaria and Myasthenia Gravis can be seen.

The CoMet view can be used alongside thesauri developed on any subject; adding more flexibility, groups can be created and used as rows or columns in CoMet. For example, to analyse the trends in research associated with diffferent diseases over time, the publications can be grouped by publication date and ran against a disease termlist. In this way, new or emerging ideas or disease areas can be differentiated from well established ones, therefore novel potential indications can be identified.

After analysis, reading lists can be exported to excel to produce a searchable database, based on the CoMet associations, for personal use or to share with colleages.

A key strength of TxtViz is that it is inclusive; an unbiased overview can be gained allowing you to learn broadly about a topic. From this understanding, the dataset and views can be adapted to address specific questions, using thesauri, subsets and groups.

Find out more at http://www.txtviz.com/

Rapid review of all of the abstracts presented at the 2014 Annual Meeting of the Society of Toxicology in Phoenix, Arizona.

Next week I’ll be attending the 53rd annual meeting of the Society of Toxicology (SOT) in Phoenix Arizona.  My manager is sending me from Cambridge UK all the way over to Phoenix AZ, at great expense and I will be expected to provide a thorough review of the topics and themes covered when I get back.  The Phoenix Convention Centre will be packed to the rafters with scientists from across the globe eager to present their most recent and exciting work.  Almost three thousand abstracts have been submitted.  If I allowed myself just one minute to read each abstract, I’d be done in around 50 hours!  Luckily I have our new personal text analysis program TxtViz (http://www.txtviz.com/) to help me.

Read Full Article

 

Big data management with TxtViz: removing ambiguity in literature catalogue searches

In December 2013, the open-access, online publication PLOS ONE published their 100,000th article. This vast quantity of publications presents a great challenge to the people managing and maintaining the catalogue in which they are stored. A recent article in Bio-IT World exposes the enormity of the task of keeping the catalogue useful and searchable (Aaron Krol, 2014).

To search for literature on a desired topic, search terms are used to pull out papers from PLOS ONE’s catalogue; it is the job of taxonomy curators to tag articles and produce algorithms (or rules) which match the search terms to appropriate papers. With such a broad topic base, searches in PLOS ONE can be hindered by ambiguous terms; therefore, rules removing this ambiguity are required to make search results more appropriate to the search terms. These rules take into account the presence of other words in the abstract to distinguish ambiguous terms and remove inappropriate search results. An example of which is given in the Bio-IT World article, by Rachel Drysdale, about the search term ‘snail’. Words which are associated with the different meanings of the word ‘snail’, the organism or the gene name, must be identified. This is a laborious process which could be aided by the use of Instem’s data mining and visual analytics tool, TxtViz.

TxtViz generates visualisations representing the relatedness of data in the form of clusters; the closer clusters are to each other, the more related they are. This data can be in the ‘MEDLINE’ format, therefore literature searches (e.g. in PubMed) can be imported; in this way TxtViz can produce visualisations of the relatedness of academic papers. Importantly, TxtViz can determine relatedness using the words/terms in textual data. Therefore, TxtViz could be used to identify the most distinguishing terms which can be used to generate rules, removing ambiguity in literature searches.

To demonstrate this, I ran an analysis of a PubMed search of ‘snail’ in TxtViz. Below are the visualisations which TxtViz produced. Figure 1 shows the ‘Galaxy’ visualisation; the most distant points in the Galaxy represent the most different sets of papers, based on the terms used in the abstract. The three main terms from each cluster (shown in the ‘Cluster Information’ panels) indicate that clusters in the top right of the visualisation are related to ‘SNAI1′, the gene (i.e. ‘cell’, ‘express’), and clusters in the bottom left refer to ‘Snail’, the organism (i.e. ‘species’, ‘population’). Immediately, by navigating around the clusters in the ‘Galaxy’ view, the main terms differentiating papers which refer to the different meanings of the word ‘snail’ can be identified.

TxtViz diambiguation blog

Figure 1. Galaxy Visualisation produced in TxtViz
Individual data points are represented by the dark blue dots. Clusters are represented by the cluster image and are labelled (as seen in the ‘CLUSTER INFORMATION’ boxes) based on the three most important terms relating the data points.

 

Figure 2 shows the ThemeMap visualisation. The peaks represent the major themes; peak height indicates the frequency of theme terms, closer peaks indicate better related themes. The figure below shows the terms relating data in two of the ThemeMap peaks: one in the top left and one in the bottom right. Compared to the ‘Galaxy’ view, more of the terms which can be used to distinguish papers can be seen. From the stacked bar charts (to the left and right of the ThemeMap), ‘cell’, ‘express’ and ‘transcribe’, and ‘species’, ‘evolve’, ‘morphology’ and ‘population’ are identified as useful terms identifying papers referring to ‘snail’ the gene and ’snail’ the organism respectively.

TxtViz diambiguation blog ThemeMap better

Figure 2. ThemeMap visualisation produced in TxtViz
By placing probes on points in the ThemeMap, a stacked bar chart shows the terms relating data at that peak. Here, two different probes were placed on the ThemeMap, at the points indicated by the arrows, to produce the Probe Terms Stacked Bar Charts’ seen in the black boxes to the right and left of the ThemeMap.

 

Therefore, by identifying words which remove ambiguity in literature searches, TxtViz could help taxonomy curators, such as those at PLOS ONE, improve the searchability and usefulness of their databases, aiding the management of ‘big data’.

See our website http://www.txtviz.com/ for more information.

Visualising the 2013 toxicity literature using TxtViz

I recently returned to work at after a year’s maternity leave and wanted to get a quick overview of the scientific literature I’d missed around the subject of toxicity.  Searching for this term in PubMed and restricting the results by publication date identified 24,480 papers, which I downloaded in the Medline format and loaded into TxtViz, Instem’s text analytics platform.

I used the default species and tissue lists in the CoMet tool to visualise the statistical co-occurrence of these terms in the title/abstract.  As may be expected, there were many human publications in a wide variety of tissues.  As shown in Figure 1, an interesting finding was an over-representation of liver studies in several varieties of fish, e.g. the common carp (Cyprius carpio) and the rainbow trout (Oncorhynchus mykiss).  As a sanity check it is also reassuring to note that studies mentioning gills, the respiratory organ found in many aquatic species, are also common for these species.  By selecting the relevant cells in the CoMet tool, I viewed these 53 fish liver studies in the Gist tool, which summarises the terms occurring within these publications which most distinguish them from the rest.  These common terms include the insecticide chlorpyrifos and the common water pollutant fluoride.

Figure 1. The over-representation of fish liver studies in the 2013 toxicity literature.  Each cell shows a colour representation (red for positive and blue for negative) of the deviation (expected number of records subtracted from the actual number) for a particular species/tissue combination.

2013_toxicity

Another useful view in TxtViz is the Galaxy, a two-dimensional representation of proximity of the publications to each other based on terms within the title/abstract.  Figure 2 shows that my 53 related records of interest (highlighted in yellow) lie within several clusters which are not necessarily neighbours.  As such, I used the Text Query By Example tool rather than cluster membership to identify other similar papers around hepatotoxic effects of environmental pollutants such as arsenic and benzene.

 Figure 2.  Galaxy view of the 2013 toxicity publications with the 53 fish liver records highlighted in yellow.  Default clustering (K-means with Euclidean distance metric) based on terms within the title/abstract.

2013_toxicity_galaxy

This review proves a useful reminder that the toxicity literature covers not only established drugs and those within the development pipeline, but also agrochemicals and environmental contaminants.  Instem’s ToxPath knowledgebase, the Safety Intelligence Program (SIP), integrates data on biological effects of over 100,000 compounds – including marketed and withdrawn drugs, agrochemicals, environmental toxins, natural products and test chemicals – across a wide range of species and tissues.

 

SIP ToxPath: Capabilities beyond toxicity studies – contributing to Replacement, Refinement and Reduction

Established by William Russell and Rex Burch in 1959, the concept of ‘Replacement, Refinement and Reduction’, the 3Rs, is central to the advancement of the drug discovery process, aiming to increase efficiency, reduce cost and improve the ethical stance.

Instem’s Safety Intelligence Program (SIP) ToxPath knowledgebase has capabilities in addition to its traditional use which could aid the application of the 3Rs to toxicity studies. An important aspect of making biomedical science and the drug discovery process more efficient is better use of available data; the public domain contains vast quantities of useful data in different formats which are integrated and can be easily searched using SIP ToxPath knowledgebase. SIP, which is mainly used as a tool to assist drug toxicity & safety assessment, can also be used in concordance studies, evaluating which animal models best predict human toxicities, and can produce data aiding the justification and potential benefit of developing alternatives to animal models in toxicity studies.

I ran an analysis using SIP ToxPath of the concordance between humans, rodent and non-rodent species in compounds associated with neoplastic biomedical observations (BMOs), the results of which are shown in the Venn diagram below (Figure 1). This is a quick demonstration of the insights SIP ToxPath can provide about toxicity concordance between species. With the capability to search BMOs in specific species and tissues, SIP ToxPath could be used to evaluate which animal model best predicts a certain toxicity/ BMO. Therefore, SIP ToxPath can support a more informed choice of which animal model is most appropriate for a certain study; improving understanding of where animals can best be used within drug development.

SIP Blog 3Rs

Figure 1. Concordance of compound associations with neoplasia, data generated by SIP ToxPath

This search was conducted using data from Medline and PubMed. Non-rodent species uniquely predict only 1.5% of the observed human neoplasia caused by chemicals; therefore it would follow that rodent species appear to be better at predicting unique neoplastic outcomes to compound exposure than non-rodent species. Further investigation could be carried out using data from more sources (e.g. FDA AERs) and could compare specific species, e.g. mice vs. guinea pigs.

Another area where SIP ToxPath is useful is in analysis of current methods and justifying the development of alternative technologies for toxicity studies. Animal toxicity studies are expensive; each study costing tens to hundreds of thousands of pounds; there is also a huge financial burden on the pharmaceutical industry due to drug attrition. SIP ToxPath knowledgebase has been used in a recent study funded by FRAME, which aims to assess the use of a specific animal model in toxicity studies (J Bailey et al, 2013). Analysing data extracted from SIP ToxPath, the study found that ‘testing [using canine models] contributes essentially no additional confidence in the outcome, but at considerable extra cost, both in monetary terms and in terms of animal welfare’. Therefore this study highlights the need of more informed use of animal species in toxicity studies and provides justification for the drive to develop alternative technologies to complement and enhance current methods.

SIP ToxPath is able to help improve data utilisation hence aid more informed and intelligent decision making in drug development, in accordance with the 3Rs. Therefore the drive for replacement, refinement and reduction represents an area where Instem’s products provide interesting new insights.