Unlocking important RWE from patient data (Part 1) – Why and how?

Image and link to LinkedIn profile of blog author Arvind Swaminathan

In this three-part blog series, we explore the challenges healthcare organizations face in unlocking patient data for real-world evidence. In part 1 Unlocking Important Real World Evidence (RWE) from Patient Data – Why and How?

Wind Power Generation In The Sea Of Clouds,Wind Power Generator Before Sunrise Sunset

Unlocking important RWE from patient data – Why and how?

For many healthcare organizations, they are constantly asked to unlock their patient data to uncover real-world evidence (RWE) that can be utilized for a variety of different purposes, ranging from drug development, financial benefits, and improved patient outcomes. However, many organizations are daunted by this task because of the perceived amount of work required to properly clean this data.

Here at SciBite, we are passionate about demystifying the ways healthcare organizations can take advantage of the resources available to them to start themselves on this journey of FAIR clean data for all.

Today, it is apparent that gathering and analyzing real-world evidence or data (RWE/RWD) for use in clinical trials and studies is no longer optional. Instead, it is paramount that healthcare systems and pharmaceutical companies work together to unlock this data to inform real healthcare decisions.

The need for collaboration

Collaborating effectively will enable both systems to profit accordingly – healthcare systems in providing hopefully high-quality data to these companies, while pharma companies can use this data to inform their decisions in drug development. These financial incentives include revenue sharing, research funding, value-based care reimbursement and data sharing incentives.

Ultimately, putting aside the financial incentives of such a collaboration, if we can unlock this data, patients will benefit the most as the drug development process accelerates.

As evidenced by our work with City of Hope, a healthcare organization with more than 35 locations in Southern California and additional facilities in Arizona, Illinois and Georgia, which enabled City of Hope to normalize their patient data within POSEIDON (Precision Oncology Software Environment Interoperable Data Ontologies Network), healthcare organizations are already looking at the easiest and most reproducible way to get the most value out of their data.

To collaborate effectively and get to our ultimate goal, healthcare organizations need to clean their data and align it to industry standards that we can all understand. How can healthcare organizations undertake this sort of task?

At a high level, to get to the ultimate goal, healthcare organizations need to do the following:

  1. Select publicly accepted standards to use as references in various data domains.
  2. Align their data to those publicly accepted standards (normalize their data).
  3. Make that normalized, deidentified data available for researchers.

Included in our 3-part series, we will explore the keys to following all these steps and what work healthcare organizations can draw from to start themselves on this journey to benefit us all. To start, as a starting point in part 1, let’s explore our standards and how we can align data in at least one data type using these standards.

Standard/ontology selection

Luckily, in many different clinical data domains, public standards have already been created that can be used for this work. Some of these commonly available standards are listed below:

  • Medication data – MeSH, MEdDRA, IDMP, RxNorm
  • Diagnoses – ICD-9-CM, ICD-10-CM, SNOMED, NCIT
  • Lab data – LOINC
  • Genomic data – Gene Ontology

Using CENtree, SciBite’s award-winning ontology management platform, organizations can manage these standards. CENtree enables users to participate in a democratized management process to ensure the vocabularies are up to date. Importantly, CENtree enables organizations to deploy these standards downstream to any tool, including SciBite’s Named Entity Recognition Engine, TERMite.

By integrating CENtree and TERMite together, organizations can realize the full value of flexibly updating their standards in CENtree. Let’s look at a specific data type and how organizations can normalize their data using CENtree and TERMite.

Unlocking real-world patient data within clinical notes

Many clinical notes are difficult to interpret, even while they have a lot of information in them. Some of that information is valuable, and some of that information is not as valuable. In any solution to unlock this data, it is imperative that it is possible to sift through all the information to find what is actually of value. In this example, I asked chatGPT to write example clinical notes for fake patients. In this example, let’s say I was a researcher that wanted to analyze the effects of different drugs in early-stage breast cancer treatment for different patient populations. I’m trying to identify patients that I would want to analyze for my study.

Here is a sample note chatGPT generated for a fake patient:

Patient 1:

  • Name: Sarah Thompson
  • Age: 45
  • Gender: Female
  • Medical History: No significant medical history
  • Family History: No known family history. Clinical Presentation: Sarah presents with a painless lump in her left breast that she discovered during a routine self-examination. Possible breast cancer. No other symptoms reported.
  • Physical Examination:
    • General: Well-nourished and in good overall health.
    • Breasts: Left breast reveals a firm, non-mobile, palpable lump measuring approximately 2 cm in diameter. No nipple discharge or skin changes noted. Right breast examination is unremarkable.
    • Lymph Nodes: No palpable axillary lymphadenopathy. Diagnostic Workup:
    • Mammogram: Shows an irregular mass in the upper outer quadrant of the left breast.
    • Ultrasound: Confirms the presence of a solid mass with irregular borders.
    • Core Needle Biopsy: Reveals invasive ductal carcinoma, estrogen receptor (ER) positive, progesterone receptor (PR) positive, and HER2/neu negative.
    • Stage: T1N0M0 breast cancer
    • Surgical Consultation: Referral for lumpectomy or mastectomy with sentinel lymph node biopsy.
    • Medical Oncology Consultation: Discussion of the need for adjuvant therapy, including endocrine therapy with Kentadex and consideration for radiation therapy.
    • Genetic Counseling: Evaluation for genetic testing due to the patient’s age and lack of family history.

From scanning the note above, I can see that the patient has stage 1 breast cancer. The note doesn’t explicitly say that, but as a human, I can see that mentioned via different synonyms and context clues. The question becomes the following: How can I get a machine to quickly and easily recognize the piece of information that I can see after quick analysis?

In this instance, we are focused on getting diagnosis/indication and medication data from our notes. Using TERMite, as shown by the screenshot below, we can see the important information, TERMite was able to recognize from those standards managed in CENtree:

Blog Unlocking Important RWE From Patient Data Picture 1

Figure 1: A screenshot of a medical history

From the screenshot, we can see that TERMite has recognized through SciBite’s manually curated synonyms important information and standardized it to terms within the NCIT VOCab, including “T1N0M0 breast cancer”. It has also recognized a synonym of the early stage cancer therapy drug tamoxifen (“kentadex”).

Creating machine-readable annotations to use downstream

TERMite generates machine-readable annotations to use downstream. Therefore, organizations can feed these machine readable annotations to your NLP and AI teams so they can run a more rigorous analysis of the data for research purposes. Additionally, that researcher we started with can immediately see if it makes sense to analyze this patient in more depth for their study. From this simple example, we can see how TERMite can help standardize information within a chart for use in downstream data analysis.

To reiterate earlier points in this article, even if a synonym isn’t included in the standard initially, with CENtree you can easily augment these vocabularies as needed. Case in point, the phrase “T1N0M0 breast cancer” wasn’t included in our NCIT VOCab as a synonym to Stage I Breast Cancer since it isn’t a common way of saying exactly that. I added it in a matter of minutes:

Blog Unlocking Important RWE From Patient Data Picture 2

Figure 2: Description automatically generated with medium confidence

In the above screenshot, you can see how I added an exact synonym to this specific term. After deploying this change to TERMite, it was immediately picked up for use in entity recognition.

In the next blog…

In the later installments of this series, we will see specifically how organizations can take these machine-readable annotations provided by TERMite and powered by CENtree in order to leverage them to enable researchers to do their job quickly and effectively. This efficiency gain not only helps internally in healthcare organizations, but also fosters collaboration to share RWD with pharmaceutical organizations for the betterment of patients everywhere.

About Arvind Swaminathan

Technical Consultant, SciBite

Arvind Swaminathan, Technical Consultant. He is passionate in helping organizations overcome their digital transformation challenges to enable data discovery and research. Over his professional career, first at Epic Systems, Arvind has worked in the healthcare space to help clean and aggregate data for research and commercial use. He has been with SciBite since 2022.

View LinkedIn profile

Other articles by Arvind

1. [Blog] Healthcare digital transformation challenges: Can we enable healthcare systems to trust their data? read more.

Related articles

  1. Healthcare digital transformation challenges: Can we enable healthcare systems to trust their data?

    Image and link to LinkedIn profile of blog author Arvind Swaminathan

    At SciBite, we are passionate about enabling organizations to make full use of their data to help them make evidence-based decisions, especially to help organizations overcome their healthcare digital transformation challenges. To support organizations on this journey, we offer a suite of products to help organizations adopt FAIR data standards.

  2. Delivery of precision medicine through alignment of clinical data to ontologies

    Precision medicine is changing the way that we think about the treatment of disease, moving from broad-acting therapies to therapies tailored to the individual patient. This increasingly relies on real-world data (RWD), encompassing a diverse range of sources, spanning multi-omic molecular characterisation of the patient’s condition, clinical presentation, treatment, and broader medical histories.


How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us