Delivery of precision medicine through alignment of clinical data to ontologies

Precision medicine is changing the way that we think about the treatment of disease, moving from broad-acting therapies to therapies tailored to the individual patient. This increasingly relies on real-world data (RWD), encompassing a diverse range of sources, spanning multi-omic molecular characterisation of the patient’s condition, clinical presentation, treatment, and broader medical histories.

Precision Medicine

Real-world evidence (RWE) is also collected during therapeutic studies or trials. These data are not only being used to inform treatment options for individual patients but also to help optimise the trial design for next-phase treatment options.

The full promise of precision medicine is, however, still to be fulfilled. Researchers and clinicians face significant challenges when it comes to collecting and harmonising RWD and RWE. If managed correctly, this data can be used to develop robust predictive models, identify patients for enrolment into trials and support biomarker discovery.

Here we discuss how research institutions and hospital systems are bringing data together and how SciBite can support the harmonisation of this data to support the field of precision medicine.

Bringing real-world data together

During patient diagnosis, treatment and clinical trials, data are captured using specialised tools, including electronic health record (EHR) platforms, clinical trial platforms and specialist scientific tools such as LIMS and ELNs. Data unification aims to de-silo data captured in the varied toolset while addressing data access issues. Increasingly, institutions and hospital systems are developing data aggregation solutions to bring data together. For example, multi-omic analysis platforms are being adopted to aggregate complex multi-omic data and allow for its interpretation in the context of the EHR. Unified views of this data in the correct context are crucial for the development of predictive models to inform treatment choices, the optimisation of trial design and enrolment, and for biomarker discovery through Genome-Wide Association Studies (GWAS).

To serve the use-cases, data not only needs to be brought together but harmonised. For example, if a researcher were looking for EHR and clinical trials referencing Breast Cancer, despite having access to all the required data, a string search for Breast Cancer would fail to return any data referencing Neoplasm of the breast or malignant mammary tumour, returning incomplete result sets.

The Center for Precision Medicine at City of Hope, one of the largest cancer research and treatment organisations in the United States, has created an enterprise-wide platform and precision medicine program to unlock the clinical value and research potential of complex and unique datasets by combining patient data with comprehensive genomic profiling and proprietary analytics. POSEIDON (Precision Oncology Software Environment Interoperable Data Ontologies Network) is a secure, cloud-based Oncology Analytics and Insights platform developed on DNAnexus®, a multi-omics and clinical analysis platform that enables exploration, analysis, visualisation, and collaboration on City of Hope’s patient clinico-genomic data along with public data sources.

POSEIDON has been leveraged by multiple research groups across oncology disciplines and institutions, driving insights that lead to individual patient care improvements. POSEIDON empowers scientists and clinicians to leverage omics data in a single, secure, cloud environment.

Harmonisation of clinical data

Increasingly, ontologies are being used to harmonise clinical data with a number of clinical ontologies being maintained by the community, for example, MedDRA, SNOMED and ICD. Rather than manually re-processing the original data to an agreed data standard, ontologies can be leveraged to add a layer of semantic interoperability, where data is aligned to and annotated with concepts or entities rather than existing purely as text strings. SciBite are industry leaders in the application of ontologies to support the creation of harmonised, machine-readable data in the life sciences.

Taking a data-centric approach to FAIR (Findable, Accessible, Interoperable and Reusable) principles, SciBite advocates the adoption of public community standards where they exist as well as the alignment of data to these standards. SciBite’s expert ontology team optimise public ontologies for the purposes of Named Entity Recognition (NER), covering entities describing clinical procedures, drugs, clinical measures, and quantitative measurements, to name a few. A key part of this optimisation process is supporting synonyms or different naming conventions for the same thing or entity.

This is achieved through a combination of manual curation, and automated synonym expansion and the optimised ontologies are called VOCabs. SciBite is developing additional VOCabs that would support clinical data, such as the sequence ontology VOCab that covers terminology used for annotating the genome, e.g., coding sequence (CDS) and 3’-untranslated region (3’UTR). SciBite has also made progress in the machine learning (ML) space, where NER models, such as the genetic variation model, may bring additional value by expanding the ability to identify genetic variants when different nomenclature is used.

Data is aligned to these ontologies using TERMite, SciBite’s named entity recognition tool, which leverages all of these synonyms during entity recognition. Once entities are identified, they are assigned an ID that comes from the underlying ontology and enables the data to be aligned to the broader scientific community. In practice, instead of breast cancer, which could also be cancer of the breast, malignant mammary tumour or breast neoplasm, or a whole host of other synonyms, it is assigned the ID10006187 from the MedDRA ontology, and these additional synonyms are all associated with this ID. If users search or perform an analysis using any of these synonyms, they are associated with ID10006187 and results are returned, independent of which synonym was used to originally capture or search the data, known as semantic interoperability.

Another key consideration to success and adoption is long-term flexibility. Ontologies need to be dynamic and evolve alongside the field to continue supporting the data. The ability for research and treatment organisations like City of Hope, which includes more than 35 clinical locations in Southern California and clinical facilities in Arizona, Illinois and Georgia through its recent acquisition of Cancer Treatment Centers of America (CTCA), or for hospital networks to update public ontologies, such as the addition of a new drug name included in a clinical trial is paramount. CENtree, SciBite’s enterprise ontology management tool, is used by City of Hope to centrally manage, edit, and serve ontologies to POSEIDON. In CENtree, one may add additional entities or synonyms and push these to downstream systems, such as TERMite, to enable the periodic re-alignment of data to the updated standard, rather than extensive manual reprocessing of the data.

A future look to ontologies and clinical data

There is additional scope for expanding this semantic layer and leveraging the rich CENtree API to guide data entry into clinical trial platforms, enabling ontology backing at the point at which the data is captured. Learn how this API has been utilised to guide smart data capture into electronic laboratory notebooks (ELN’s) in our use case ‘Unlock the Full Potential of ELN Data‘.

Unlocking the value in clinical data and supporting precision medicine initiatives is a significant challenge and a huge step forward if it can be done effectively. Patients benefit from informed and data-driven treatment plans, and clinical trials can be designed, and patients enrolled far more effectively.

If you are currently using a multi-omic data platform and are interested in learning more about how SciBite could add semantic interoperability to support your precision medicine initiative, don’t hesitate to get in touch with SciBite directly.

Get in touch

About SciBite

Our data-first, award-winning semantic analytics software is for those who want to innovate and get more from their data. Built by scientists for scientists, we believe data fuels discovery and continue to push boundaries with our cutting-edge technology applications and people-first solutions that unlock the complexities of scientific content.

Related articles

  1. SciBite brings enterprise ontologies to Benchling – Ontology backed data capture

    Unstructured and siloed data in the life sciences remains a significant barrier to fulfilling the promise of digital transformation. Awareness is growing for the importance of data capture and storage, enabling it to be effectively found, accessed, used interoperably and reused. These are the foundations of FAIR. Capturing data with FAIR in mind, ensuring your data is “born FAIR”, is key to unlocking the full potential of data.

  2. SciBite and Sinequa join forces to transform scientific search

    SciBite and Sinequa's new collaboration combines custom ontologies with a powerful search platform to help researchers find answers fast.


How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us