Exploring mechanistically-related diseases through shared phenotypic profiles

Disease detective part 2: Today, we’ll look at a fresh way of enabling scientific researchers, either in pharmaceutical R&D or in medical institutes to deepen their investigations and consider new links.

Disease Detective Part 2

What’s the problem?

One of the biggest headaches a researcher faces is the huge volumes of published literature out there that they’d want to mine.  The conundrum is how to get quickly to the most important and relevant points.  Fast distillation is key.

Now, text mining is already out there, so you may be wondering what it is that SciBite can bring to the semantics analytics party.

We offer a two-pronged resolution with our high-quality VOCabs – hand-curated ontologies, tailored to the scientific domain.  We then pair this with our super-fast TERMite engine to liberate more data that might have otherwise remained buried.

And the results?

1) It enables you to find direct links in literature more readily

2) You’re able to find new links which may have never been previously (or explicitly) stated

3) You gain a better understanding of the mechanisms behind the disease – unraveling how and why someone gets it, its behavior, development, what it looks like, and its weak spot.

Then, you have the start of a journey that could lead on to applying gene therapy and, eventually, a potential therapy or treatment.

Understanding the disease is key

So let’s demonstrate this technology on a real-life rare disease and its related conditions.

Friedreich’s Ataxia is a debilitating disorder with heartbreaking degeneration.  It’s described on the Rare Disease Day website:

“…a genetic, progressive, neurodegenerative movement disorder, with a mean age of onset between 10 and 15 years. Initial symptoms may include unsteady posture, frequent falling, and progressive difficulty walking due to impaired ability to coordinate voluntary movements (ataxia).”

What we’re aiming for here is a better characterization of this rare disease based on its similarities to more widely understood conditions.

Step 1: 

We ran TERMite across 25 million Medline abstracts and extracted co-occurring pairs of conditions and clinical signs.

TERMite results from Medline abstracts


Step 2:

We performed a statistical analysis of the results.  We did this so that we could identify the most scientifically interesting relationships.

Step 3:

We then loaded the results into a graph database, providing us with scalable and flexible retrieval.

Step 4:

Here you can see an initial visualization of that graph database using Linkurious.  The image below shows the major phenotypes associated with Friedreich’s Ataxia.

Step 5:

Now, let’s interrogate this knowledge base.

How Friedreich’s Ataxia shares multiple phenotypes with Huntington’s Disease

Now that we can calculate the major phenotypes associated with thousands of conditions, we can compare their phenotype profiles and apply similarity scoring algorithms.

The next image shows the conditions that have the most similar phenotype profiles to Friedreich’s Ataxia:

Indications related by similar phenotype profiles. The numbers on the grey lines represent the relative similarity score for each pair of conditions

We can also export the data as a list of the related indications and their major shared phenotypes (from the Neo4J interface into Excel)

If you’re an expert in the field, you may be thinking that many of these indications are well-known, but keep scanning down the list – less well-known information may become apparent.

Let me make this clear – this was all worked out by the computer with no prior knowledge of the condition: a computer that can now also characterize thousands of other conditions in the same way.

Exploiting the power of this analysis

So now it’s time to explore the associated genes for these phenotypically related conditions. By doing this, we’ll get an idea of where there are knowledge gaps for how these conditions might be mechanistically related. We can also show potential areas where these gaps might be filled.

By overlaying gene association data from DisGeNET, we can see some conditions with many known gene associations. However, for Friedreich’s Ataxia, there is only one – frataxin (FXN).

Are there any conditions with lots of gene associations? Yes – you can see Peripheral Neuropathies have a huge number of associated genes – these are linked because of the sheer amount of research done in this area.

By contrast, take a look at Friedreich’s Ataxia.  There are clearly huge gaps in mechanistic understanding, and we can see that there’s not a great deal of investigation.

Going back to FXN, and to help get an idea of where it might fit in with the other gene/protein entities displayed on the graph, we added in protein-protein interaction data from iRefIndex. This fills in some of the gaps from the above image, and we now see FXN interacting with several ­­­­­­genes that are known to be associated with phenotypically related conditions. In doing so, we’re building up a picture of related conditions and their underlying genetic mechanisms.

The incredibly useful thing about this method is that we’ve brought together three sets of data:

  1. Text-mined data from Medline (courtesy of TERMite) – seen here in yellow lines
  2. Gene disease associations from DIsGenet – pink lines
  3. Protein-protein interaction data from Irefindex – orange lines

Once some interesting and plausible hypotheses have been derived from the graphs, an individual can help to drive research in new directions.

For example, the gene entity PASK (PAS domain containing serine/threonine kinase) seen in the image above interacts with FXN and is also known to be associated with Peripheral Neuropathies. From the analysis, this was one of the most phenotypically similar conditions to Friedreich’s Ataxia, as well as SDHA (succinate dehydrogenase complex, subunit A – you can see why it’s shortened!) being linked to a number of related conditions.

Could this be a new area of research?

What we love at SciBite about using our software in this way is exactly that – opening up new possibilities.  And opening them up quickly, leaving researchers more time to, well, research.

Up next in the Disease detective blog series

Read part 3 in the Disease detective blog series “Machine Learning and phenotype triangulation”  Read Part 3

We’ve written a White Paper on how we used Machine Learning to liberate data.  To find out more about our work and how we could best help you, please contact us with your name, contact details, and your organization.  We’d love to hear from you.

Related articles

  1. Rare disease collaboration networks

    Disease Detective Part 1: In celebration of Rare Disease Day 28th Feb, we have a 3 part blog post looking into some of the challenges/analysis techniques involved in the research process.

  2. Machine Learning and phenotype triangulation

    Disease detective part 3: In our final disease detective article, we’ll take Part 2’s topic a little further and zoom in on how we can find new relationships between diseases where direct evidence is sparse.


How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us