Use Cases

Discover how SciBite’s powerful solutions are supporting scientists and researchers.

Use Cases Overview

Gartner report

Gartner® How to calculate business value and cost for generative AI use cases

Access report

Knowledge Hub

Explore expert insights, articles, and thought leadership on scientific data challenges.

Knowledge Hub

Resources

Discover our whitepapers, spec sheets, and webinars for in-depth product knowledge.

Resources

Events

Join us at upcoming events and webinars to learn more about SciBite solutions.

Events

News

Stay informed with the latest SciBite updates, announcements, and industry news.

News

About SciBite

Explore SciBite’s full suite of solutions to unlock the potential of your data.

Discover more about us

Our Partners

We build powerful partnerships with world-leading organizations.

Our Partners

The pivotal role of semantic enrichment in the evolution of data commons

In this blog post, discover how Pfizer have integrated SciBite’s semantically enriched vocabularies into their Data Commons project, which has the goal of enabling scientists to develop and refine hypotheses by investigating correlations between genetic and phenotypic data.

The value of clinical data in pharmaceutical research

Clinical data can provide valuable insights for Pharmaceutical research, such as mining adverse event data to reveal opportunities for drug repositioning. For example, an analysis of public clinical trials data in ClinicalTrials.gov identified a lower incidence of gastric cancer in patients treated with Aliskiren, a treatment for hypertension, than those treated with the placebo, suggesting the possible repurposing of this drug to treat cancer [1].

However, one of the fundamental challenges associated with clinical data is that the information captured for a given clinical study is very specific to that study, typically resulting in a bespoke database for each trial with no common schema. While this may not be a problem when analyzing data for that specific study, the data is not interoperable, resulting in a barrier to performing translational research.

Evolution of Pfizer’s data commons project

During last year’s SciBite User Group Meeting in Boston, Cathy Marshall (Director, Genomics Data Informatics Strategy & Implementation) and Alicia Dana from Pfizer (Data Strategy Lead for Medicinal Sciences) gave an excellent presentation on the evolution of Pfizer’s Data Commons project, which has the goal of enabling scientists to develop and refine hypotheses by investigating correlations between genetic and phenotypic data.

In 2012, the initial version of Pfizer’s Data Commons was based on the deployment of tranSMART to provide a single platform to collect both clinical and genomic data. However, while tranSMART does have a unifying schema for all studies, Pfizer’s experience has been that the mapping of each new study requires extensive curation effort from technical and scientific experts.

In addition, because tranSMART has been designed with data capture in mind, Pfizer was unable to perform broad, cross-project queries in support of research hypotheses.

Figure 1: A conceptual view of the transition from Data Commons 1.0 to 2.0 [2]

More recently, Pfizer have integrated SciBite’s Gene, Drug, Species, and Technology vocabularies into their Data Commons 2.0 platform, augmented with proprietary dictionaries of internal study and compound IDs and a new dictionary based on CDISC standard for Measures. This gives Pfizer the possibility to semantically enrich data from a range of unstructured documents in file shares and repositories such as ELN and SharePoint.

According to Alicia, this approach has required “minimal data manipulation, removing the need for formal curation” and has resulted in “an ontology-based index which enables intelligent searches using broad English term queries tailored to the translational/exploratory research domain.”

For example, users can now perform faceted, ‘Amazon-like’ scientific searches such as:

  • Find biomarkers that predict the response to a given drug or that predict disease progression.
  • What were the results from placebo versus treated subjects?
  • What was the clinical response to a particular class of drug?
  • Which studies included one or more specified exclusion criteria?
  • Which studies excluded obese people, yet resulted in patients increasing weight beyond the exclusion criteria?

Cathy and Alicia also described how the ontology-based index can be used to power downstream applications, from Spotfire-based visualizations and statistical models to machine learning algorithms. We look forward to hearing how Pfizer’s Data Commons continues to evolve.

Learn more about SciBite’s named entity recognition engine (NER) and extraction engine, TERMite, and our high quality semantically enriched biomedical vocabularies.

References

[1] For example, see Su EW, and Sanger TM. (2017). Systematic drug repositioning through mining adverse event data in ClinicalTrials.gov. PeerJ 5:e3154.

[2] Taken from Cathy and Alicia’s presentation, ‘Termite Integration with Pfizer’s Data Commons Platform,’ presented at SciBite’s 2018 UGM in Boston.

Richard Harrison
Senior Manager, Portfolio Marketing, SciBite

Richard is a seasoned marketing professional with over two decades of experience in the information services and life sciences sectors. Currently, he is the Senior Manager, Portfolio Marketing at Elsevier’s SciBite, where he drives strategic campaigns and harnesses data-driven strategies to amplify the platform’s online visibility and impact.

Share this article
Relevant resources, events and news