Rapid review of all of the abstracts presented at the 2014 Annual Meeting of the Society of Toxicology in Phoenix, Arizona.

Next week I’ll be attending the 53rd annual meeting of the Society of Toxicology (SOT) in Phoenix Arizona.  My manager is sending me from Cambridge UK all the way over to Phoenix AZ, at great expense and I will be expected to provide a thorough review of the topics and themes covered when I get back.  The Phoenix Convention Centre will be packed to the rafters with scientists from across the globe eager to present their most recent and exciting work.  Almost three thousand abstracts have been submitted.  If I allowed myself just one minute to read each abstract, I’d be done in around 50 hours!  Luckily I have our new personal text analysis program TxtViz (http://www.txtviz.com/) to help me.

Read Full Article


Big data management with TxtViz: removing ambiguity in literature catalogue searches

In December 2013, the open-access, online publication PLOS ONE published their 100,000th article. This vast quantity of publications presents a great challenge to the people managing and maintaining the catalogue in which they are stored. A recent article in Bio-IT World exposes the enormity of the task of keeping the catalogue useful and searchable (Aaron Krol, 2014).

To search for literature on a desired topic, search terms are used to pull out papers from PLOS ONE’s catalogue; it is the job of taxonomy curators to tag articles and produce algorithms (or rules) which match the search terms to appropriate papers. With such a broad topic base, searches in PLOS ONE can be hindered by ambiguous terms; therefore, rules removing this ambiguity are required to make search results more appropriate to the search terms. These rules take into account the presence of other words in the abstract to distinguish ambiguous terms and remove inappropriate search results. An example of which is given in the Bio-IT World article, by Rachel Drysdale, about the search term ‘snail’. Words which are associated with the different meanings of the word ‘snail’, the organism or the gene name, must be identified. This is a laborious process which could be aided by the use of Instem’s data mining and visual analytics tool, TxtViz.

TxtViz generates visualisations representing the relatedness of data in the form of clusters; the closer clusters are to each other, the more related they are. This data can be in the ‘MEDLINE’ format, therefore literature searches (e.g. in PubMed) can be imported; in this way TxtViz can produce visualisations of the relatedness of academic papers. Importantly, TxtViz can determine relatedness using the words/terms in textual data. Therefore, TxtViz could be used to identify the most distinguishing terms which can be used to generate rules, removing ambiguity in literature searches.

To demonstrate this, I ran an analysis of a PubMed search of ‘snail’ in TxtViz. Below are the visualisations which TxtViz produced. Figure 1 shows the ‘Galaxy’ visualisation; the most distant points in the Galaxy represent the most different sets of papers, based on the terms used in the abstract. The three main terms from each cluster (shown in the ‘Cluster Information’ panels) indicate that clusters in the top right of the visualisation are related to ‘SNAI1′, the gene (i.e. ‘cell’, ‘express’), and clusters in the bottom left refer to ‘Snail’, the organism (i.e. ‘species’, ‘population’). Immediately, by navigating around the clusters in the ‘Galaxy’ view, the main terms differentiating papers which refer to the different meanings of the word ‘snail’ can be identified.

TxtViz diambiguation blog

Figure 1. Galaxy Visualisation produced in TxtViz
Individual data points are represented by the dark blue dots. Clusters are represented by the cluster image and are labelled (as seen in the ‘CLUSTER INFORMATION’ boxes) based on the three most important terms relating the data points.


Figure 2 shows the ThemeMap visualisation. The peaks represent the major themes; peak height indicates the frequency of theme terms, closer peaks indicate better related themes. The figure below shows the terms relating data in two of the ThemeMap peaks: one in the top left and one in the bottom right. Compared to the ‘Galaxy’ view, more of the terms which can be used to distinguish papers can be seen. From the stacked bar charts (to the left and right of the ThemeMap), ‘cell’, ‘express’ and ‘transcribe’, and ‘species’, ‘evolve’, ‘morphology’ and ‘population’ are identified as useful terms identifying papers referring to ‘snail’ the gene and ’snail’ the organism respectively.

TxtViz diambiguation blog ThemeMap better

Figure 2. ThemeMap visualisation produced in TxtViz
By placing probes on points in the ThemeMap, a stacked bar chart shows the terms relating data at that peak. Here, two different probes were placed on the ThemeMap, at the points indicated by the arrows, to produce the Probe Terms Stacked Bar Charts’ seen in the black boxes to the right and left of the ThemeMap.


Therefore, by identifying words which remove ambiguity in literature searches, TxtViz could help taxonomy curators, such as those at PLOS ONE, improve the searchability and usefulness of their databases, aiding the management of ‘big data’.

See our website http://www.txtviz.com/ for more information.

Visualising the 2013 toxicity literature using TxtViz

I recently returned to work at after a year’s maternity leave and wanted to get a quick overview of the scientific literature I’d missed around the subject of toxicity.  Searching for this term in PubMed and restricting the results by publication date identified 24,480 papers, which I downloaded in the Medline format and loaded into TxtViz, Instem’s text analytics platform.

I used the default species and tissue lists in the CoMet tool to visualise the statistical co-occurrence of these terms in the title/abstract.  As may be expected, there were many human publications in a wide variety of tissues.  As shown in Figure 1, an interesting finding was an over-representation of liver studies in several varieties of fish, e.g. the common carp (Cyprius carpio) and the rainbow trout (Oncorhynchus mykiss).  As a sanity check it is also reassuring to note that studies mentioning gills, the respiratory organ found in many aquatic species, are also common for these species.  By selecting the relevant cells in the CoMet tool, I viewed these 53 fish liver studies in the Gist tool, which summarises the terms occurring within these publications which most distinguish them from the rest.  These common terms include the insecticide chlorpyrifos and the common water pollutant fluoride.

Figure 1. The over-representation of fish liver studies in the 2013 toxicity literature.  Each cell shows a colour representation (red for positive and blue for negative) of the deviation (expected number of records subtracted from the actual number) for a particular species/tissue combination.


Another useful view in TxtViz is the Galaxy, a two-dimensional representation of proximity of the publications to each other based on terms within the title/abstract.  Figure 2 shows that my 53 related records of interest (highlighted in yellow) lie within several clusters which are not necessarily neighbours.  As such, I used the Text Query By Example tool rather than cluster membership to identify other similar papers around hepatotoxic effects of environmental pollutants such as arsenic and benzene.

 Figure 2.  Galaxy view of the 2013 toxicity publications with the 53 fish liver records highlighted in yellow.  Default clustering (K-means with Euclidean distance metric) based on terms within the title/abstract.


This review proves a useful reminder that the toxicity literature covers not only established drugs and those within the development pipeline, but also agrochemicals and environmental contaminants.  Instem’s ToxPath knowledgebase, the Safety Intelligence Program (SIP), integrates data on biological effects of over 100,000 compounds – including marketed and withdrawn drugs, agrochemicals, environmental toxins, natural products and test chemicals – across a wide range of species and tissues.


SIP ToxPath: Capabilities beyond toxicity studies – contributing to Replacement, Refinement and Reduction

Established by William Russell and Rex Burch in 1959, the concept of ‘Replacement, Refinement and Reduction’, the 3Rs, is central to the advancement of the drug discovery process, aiming to increase efficiency, reduce cost and improve the ethical stance.

Instem’s Safety Intelligence Program (SIP) ToxPath knowledgebase has capabilities in addition to its traditional use which could aid the application of the 3Rs to toxicity studies. An important aspect of making biomedical science and the drug discovery process more efficient is better use of available data; the public domain contains vast quantities of useful data in different formats which are integrated and can be easily searched using SIP ToxPath knowledgebase. SIP, which is mainly used as a tool to assist drug toxicity & safety assessment, can also be used in concordance studies, evaluating which animal models best predict human toxicities, and can produce data aiding the justification and potential benefit of developing alternatives to animal models in toxicity studies.

I ran an analysis using SIP ToxPath of the concordance between humans, rodent and non-rodent species in compounds associated with neoplastic biomedical observations (BMOs), the results of which are shown in the Venn diagram below (Figure 1). This is a quick demonstration of the insights SIP ToxPath can provide about toxicity concordance between species. With the capability to search BMOs in specific species and tissues, SIP ToxPath could be used to evaluate which animal model best predicts a certain toxicity/ BMO. Therefore, SIP ToxPath can support a more informed choice of which animal model is most appropriate for a certain study; improving understanding of where animals can best be used within drug development.

SIP Blog 3Rs

Figure 1. Concordance of compound associations with neoplasia, data generated by SIP ToxPath

This search was conducted using data from Medline and PubMed. Non-rodent species uniquely predict only 1.5% of the observed human neoplasia caused by chemicals; therefore it would follow that rodent species appear to be better at predicting unique neoplastic outcomes to compound exposure than non-rodent species. Further investigation could be carried out using data from more sources (e.g. FDA AERs) and could compare specific species, e.g. mice vs. guinea pigs.

Another area where SIP ToxPath is useful is in analysis of current methods and justifying the development of alternative technologies for toxicity studies. Animal toxicity studies are expensive; each study costing tens to hundreds of thousands of pounds; there is also a huge financial burden on the pharmaceutical industry due to drug attrition. SIP ToxPath knowledgebase has been used in a recent study funded by FRAME, which aims to assess the use of a specific animal model in toxicity studies (J Bailey et al, 2013). Analysing data extracted from SIP ToxPath, the study found that ‘testing [using canine models] contributes essentially no additional confidence in the outcome, but at considerable extra cost, both in monetary terms and in terms of animal welfare’. Therefore this study highlights the need of more informed use of animal species in toxicity studies and provides justification for the drive to develop alternative technologies to complement and enhance current methods.

SIP ToxPath is able to help improve data utilisation hence aid more informed and intelligent decision making in drug development, in accordance with the 3Rs. Therefore the drive for replacement, refinement and reduction represents an area where Instem’s products provide interesting new insights.

Overcoming Information Overload: The Value of OmniViz in Literature Searching

It’s always gratifying to hear that your products have had a positive impact on productivity. We’ve been very fortunate that Dr Gareth Evans, Principal Scientist at the Health and Safety Laboratory (HSL) in Buxton, has written an article about the benefits he gains from the use of our analytics software, OmniViz.

The HSL is the Scientific Agency of the Health and Safety Executive with responsibility for incident and health & safety investigations in workplaces across Britain. Increasingly, they are being asked by their customers to create evidence summaries to support complex and costly research. To produce these summaries they need to review and “sift” large quantities of scientific literature rapidly. Historically, this would be done using keyword searches and manual review of literature, but using OmniViz they have been able to make valuable time savings:

“Using OmniViz we have made valuable improvements to searching and sifting literature and large search processes undertaken several years ago that were taking several days to complete can now be completed by one person in several hours.”

The article is a great illustration of how one customer uses the software to increase productivity, meet tight deadlines and have greater confidence that no stone has been left unturned. If your job involves literature review or unearthing insight from unstructured text, you can request a copy of the article here. You can also contact us at omniviz.support@instem.com or leave a reply below.