Global Zika virus research trends

Over the last 18 months, there has been a predictable surge in research on the Zika virus as the scientific community try to better understand the disease area. We decided to take a look at this topic to see how much research we being done across the globe.

Zika Virus thermometer temperature

Zika virus is a mosquito-borne virus first identified back in the 1940-50’s 1. More recently, the world’s attention has been drawn to this mostly harmless infection due to its potentially serious implications for neonates.  Initially observed in Africa and the Pacific region, the virus has spread to South and Central America and is predicted to continue to spread in the future.

Over the last 18 months, there has been a predictable surge in research on the Zika virus as the scientific community try to better understand the disease area. We decided to take a look at this topic to see how much research we being done across the globe and what phenotypes/symptoms have been mentioned to date.

Have a play with the interactive analysis below and see what insight you can uncover.

The challenge

To manually review and analyse the text from over 1,000 articles requires a significant investment of time and effort. Here’s where our semantic analytic technologies can help. Our tools facilitate the rapid scanning and extraction of key terms from documents such as publications transforming raw text into scientifically relevant, machine-readable data.

Parsing structured XML

Our Inxights module allows users to perform complex mining of more structured documents such as Excel or XML. Having the ability to select individual fields within a file and extract any combination of terms from within enables users to quickly create valuable datasets with minimal effort. Starting with a Zika virus XML download from PubMed, we used Inxights to extract the phenotypic terms from the abstract, the institution name from the affiliation field and the publication date for each document within the corpus.

Phenotypic extraction and normalisation

We scanned the XML using our phenotypic vocabulary (containing over 1.5 million terms) and extracted all the terms within the abstracts. There are multiple ways to describe any phenotype, Microcephaly, small skull, small head etc.. our VOCabs are designed to manage the synonymous language found in scientific literature and normalise the results with ease.

Geo-location of publications

We’ve recently added a GEO library to our VOCabs meaning you can now add a location to semantic searches where institutions or addresses are provided.

Visualization

Using Tableau, we created the following interactive information. A view over time of where the publication powerhouses for the Zika Virus sit and the emergence of Microcephaly as the predominantly mentioned phenotype (look at publications prior to 2015, it wasn’t always the case).

Summary

Remember, this analysis stemmed from unstructured text extracted in a single XML file from a PubMed search. SciBite technologies allow you to transform individual documents into semantically enriched scientific data, which can be built into powerful visualizations supporting a wide range of use cases from disease exploration to identification of emerging centres of excellence for your specific research fields.

Related articles

  1. SciBite and PerkinElmer provide advanced analytics from unstructured scientific data

    PerkinElmer, Inc., today announced sophisticated scientific semantic enhancements to the PerkinElmer Signals™ Perspectives platform, powered by SciBite and Attivio®.

    Read
  2. Loving the Data Others Don’t

    Like it or loathe it, plain text is a goldmine of information. The challenge is that data mining is often complicated through ambiguity. Sure, identifying, disambiguating and extracting those scientific terms is a big challenge but we’ve got it covered.

    Read

How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us