For many healthcare organizations, they are constantly asked to unlock their patient data to uncover real-world evidence (RWE) that can be utilized for a variety of different purposes, ranging from drug development, financial benefits, and improved patient outcomes. However, many organizations are daunted by this task because of the perceived amount of work required to properly clean this data.
Here at SciBite, we are passionate about demystifying the ways healthcare organizations can take advantage of the resources available to them to start themselves on this journey of FAIR clean data for all.
Today, it is apparent that gathering and analyzing real-world evidence or data (RWE/RWD) for use in clinical trials and studies is no longer optional. Instead, it is paramount that healthcare systems and pharmaceutical companies work together to unlock this data to inform real healthcare decisions.
Collaborating effectively will enable both systems to profit accordingly – healthcare systems in providing hopefully high-quality data to these companies, while pharma companies can use this data to inform their decisions in drug development. These financial incentives include revenue sharing, research funding, value-based care reimbursement and data sharing incentives.
Ultimately, putting aside the financial incentives of such a collaboration, if we can unlock this data, patients will benefit the most as the drug development process accelerates.
As evidenced by our work with City of Hope, a healthcare organization with more than 35 locations in Southern California and additional facilities in Arizona, Illinois and Georgia, which enabled City of Hope to normalize their patient data within POSEIDON (Precision Oncology Software Environment Interoperable Data Ontologies Network), healthcare organizations are already looking at the easiest and most reproducible way to get the most value out of their data.
To collaborate effectively and get to our ultimate goal, healthcare organizations need to clean their data and align it to industry standards that we can all understand. How can healthcare organizations undertake this sort of task?
At a high level, to get to the ultimate goal, healthcare organizations need to do the following:
Included in our 3-part series, we will explore the keys to following all these steps and what work healthcare organizations can draw from to start themselves on this journey to benefit us all. To start, as a starting point in part 1, let’s explore our standards and how we can align data in at least one data type using these standards.
Luckily, in many different clinical data domains, public standards have already been created that can be used for this work. Some of these commonly available standards are listed below:
Using CENtree, SciBite’s award-winning ontology management platform, organizations can manage these standards. CENtree enables users to participate in a democratized management process to ensure the vocabularies are up to date. Importantly, CENtree enables organizations to deploy these standards downstream to any tool, including SciBite’s Named Entity Recognition Engine, TERMite.
By integrating CENtree and TERMite together, organizations can realize the full value of flexibly updating their standards in CENtree. Let’s look at a specific data type and how organizations can normalize their data using CENtree and TERMite.
Many clinical notes are difficult to interpret, even while they have a lot of information in them. Some of that information is valuable, and some of that information is not as valuable. In any solution to unlock this data, it is imperative that it is possible to sift through all the information to find what is actually of value. In this example, I asked chatGPT to write example clinical notes for fake patients. In this example, let’s say I was a researcher that wanted to analyze the effects of different drugs in early-stage breast cancer treatment for different patient populations. I’m trying to identify patients that I would want to analyze for my study.
Here is a sample note chatGPT generated for a fake patient:
From scanning the note above, I can see that the patient has stage 1 breast cancer. The note doesn’t explicitly say that, but as a human, I can see that mentioned via different synonyms and context clues. The question becomes the following: How can I get a machine to quickly and easily recognize the piece of information that I can see after quick analysis?
In this instance, we are focused on getting diagnosis/indication and medication data from our notes. Using TERMite, as shown by the screenshot below, we can see the important information, TERMite was able to recognize from those standards managed in CENtree:
Figure 1: A screenshot of a medical history
From the screenshot, we can see that TERMite has recognized through SciBite’s manually curated synonyms important information and standardized it to terms within the NCIT VOCab, including “T1N0M0 breast cancer”. It has also recognized a synonym of the early stage cancer therapy drug tamoxifen (“kentadex”).
TERMite generates machine-readable annotations to use downstream. Therefore, organizations can feed these machine readable annotations to your NLP and AI teams so they can run a more rigorous analysis of the data for research purposes. Additionally, that researcher we started with can immediately see if it makes sense to analyze this patient in more depth for their study. From this simple example, we can see how TERMite can help standardize information within a chart for use in downstream data analysis.
To reiterate earlier points in this article, even if a synonym isn’t included in the standard initially, with CENtree you can easily augment these vocabularies as needed. Case in point, the phrase “T1N0M0 breast cancer” wasn’t included in our NCIT VOCab as a synonym to Stage I Breast Cancer since it isn’t a common way of saying exactly that. I added it in a matter of minutes:
Figure 2: Description automatically generated with medium confidence
In the above screenshot, you can see how I added an exact synonym to this specific term. After deploying this change to TERMite, it was immediately picked up for use in entity recognition.
In the later installments of this series, we will see specifically how organizations can take these machine-readable annotations provided by TERMite and powered by CENtree in order to leverage them to enable researchers to do their job quickly and effectively. This efficiency gain not only helps internally in healthcare organizations, but also fosters collaboration to share RWD with pharmaceutical organizations for the betterment of patients everywhere.
Richard is a seasoned marketing professional with over two decades of experience in the information services and life sciences sectors. Currently, he is the Senior Manager, Portfolio Marketing at Elsevier’s SciBite, where he drives strategic campaigns and harnesses data-driven strategies to amplify the platform’s online visibility and impact.