One of the biggest headaches a researcher faces is the huge volumes of published literature out there that they’d want to mine. The conundrum is how to get quickly to the most important and relevant points. Fast distillation is key.
Now, text mining is already out there, so you may be wondering what it is that SciBite can bring to the semantics analytics party.
We offer a two-pronged resolution with our high-quality VOCabs – hand-curated ontologies, tailored to the scientific domain. We then pair this with our super-fast TERMite engine to liberate more data that might have otherwise remained buried.
And the results?
1) It enables you to find direct links in literature more readily
2) You’re able to find new links which may have never been previously (or explicitly) stated
3) You gain a better understanding of the mechanisms behind the disease – unraveling how and why someone gets it, its behavior, development, what it looks like, and its weak spot.
Then, you have the start of a journey that could lead on to applying gene therapy and, eventually, a potential therapy or treatment.
So let’s demonstrate this technology on a real-life rare disease and its related conditions.
Friedreich’s Ataxia is a debilitating disorder with heartbreaking degeneration. It’s described on the Rare Disease Day website:
“…a genetic, progressive, neurodegenerative movement disorder, with a mean age of onset between 10 and 15 years. Initial symptoms may include unsteady posture, frequent falling, and progressive difficulty walking due to impaired ability to coordinate voluntary movements (ataxia).”
What we’re aiming for here is a better characterization of this rare disease based on its similarities to more widely understood conditions.
We ran TERMite across 25 million Medline abstracts and extracted co-occurring pairs of conditions and clinical signs.
TERMite results from Medline abstracts
We performed a statistical analysis of the results. We did this so that we could identify the most scientifically interesting relationships.
We then loaded the results into a graph database, providing us with scalable and flexible retrieval.
Here you can see an initial visualization of that graph database using Linkurious. The image below shows the major phenotypes associated with Friedreich’s Ataxia.
Now, let’s interrogate this knowledge base.
How Friedreich’s Ataxia shares multiple phenotypes with Huntington’s Disease
Now that we can calculate the major phenotypes associated with thousands of conditions, we can compare their phenotype profiles and apply similarity scoring algorithms.
The next image shows the conditions that have the most similar phenotype profiles to Friedreich’s Ataxia:
Indications related by similar phenotype profiles. The numbers on the grey lines represent the relative similarity score for each pair of conditions
We can also export the data as a list of the related indications and their major shared phenotypes (from the Neo4J interface into Excel)
If you’re an expert in the field, you may be thinking that many of these indications are well-known, but keep scanning down the list – less well-known information may become apparent.
Let me make this clear – this was all worked out by the computer with no prior knowledge of the condition: a computer that can now also characterize thousands of other conditions in the same way.
So now it’s time to explore the associated genes for these phenotypically related conditions. By doing this, we’ll get an idea of where there are knowledge gaps for how these conditions might be mechanistically related. We can also show potential areas where these gaps might be filled.
By overlaying gene association data from DisGeNET, we can see some conditions with many known gene associations. However, for Friedreich’s Ataxia, there is only one – frataxin (FXN).
Are there any conditions with lots of gene associations? Yes – you can see Peripheral Neuropathies have a huge number of associated genes – these are linked because of the sheer amount of research done in this area.
By contrast, take a look at Friedreich’s Ataxia. There are clearly huge gaps in mechanistic understanding, and we can see that there’s not a great deal of investigation.
Going back to FXN, and to help get an idea of where it might fit in with the other gene/protein entities displayed on the graph, we added in protein-protein interaction data from iRefIndex. This fills in some of the gaps from the above image, and we now see FXN interacting with several genes that are known to be associated with phenotypically related conditions. In doing so, we’re building up a picture of related conditions and their underlying genetic mechanisms.
The incredibly useful thing about this method is that we’ve brought together three sets of data:
Once some interesting and plausible hypotheses have been derived from the graphs, an individual can help to drive research in new directions.
For example, the gene entity PASK (PAS domain containing serine/threonine kinase) seen in the image above interacts with FXN and is also known to be associated with Peripheral Neuropathies. From the analysis, this was one of the most phenotypically similar conditions to Friedreich’s Ataxia, as well as SDHA (succinate dehydrogenase complex, subunit A – you can see why it’s shortened!) being linked to a number of related conditions.
What we love at SciBite about using our software in this way is exactly that – opening up new possibilities. And opening them up quickly, leaving researchers more time to, well, research.
Read part 3 in the Disease detective blog series “Machine Learning and phenotype triangulation” Read Part 3
We’ve written a White Paper on how we used Machine Learning to liberate data. To find out more about our work and how we could best help you, please contact us with your name, contact details, and your organization. We’d love to hear from you.
Get in touch with us to find out how we can transform your data
© Copyright © 2023 Elsevier Ltd., its licensors, and contributors. All rights are reserved, including those for text and data mining, AI training, and similar technologies.