VOCab of the Week: ORPHAN

Gaining a deeper insight into rare diseases with our ORPHAN VOCab.

Vocab of the week

Welcome to the second in our series of VOCab of the Week.  As explained last week, SciBite’s VOCabs underpin our solutions – they are based on a public resource but with many times more synonyms and disambiguation rules to make them perfect for text analytics.

This week we’re moving from the world of food to rare diseases, so please give a warm welcome to…ORPHAN!

Here’s the lowdown on our star guest:

VOCab name


Special features

This VOCab is based on the disease classification from Orphanet – a comprehensive resource for rare diseases and orphan drugs.

What gives our VOCab additional value is that although Orphanet provides downloads of their data (and it is very comprehensive data), their focus is not really on representing synonyms for use in text analytics. Our curation team performs an extensive analysis to handle highly ambiguous terms and expand out synonyms for these entities, providing a deeply enriched resource.

Top stat

>9,000 entries, with over 3 million synonyms

 What’s it useful for?

Rare diseases are big news in pharma, as although each disease individually affects relatively few people, collectively rare diseases are thought to affect 300 million people worldwide.

One of the issues when researching is the huge volume of data available – this becomes even more problematic, when you take into account that many rare diseases share similar phenotypes.  Add numerous synonyms for each disease to the mix, and you have a lot of complexity to work through.

So, let’s look at it from the viewpoint of a rare disease researcher.  You know many rare diseases share phenotypes – could there be an underlying link between all of these?

How it works

As before, DOCstore, our Elastic powered semantic search engine, is an excellent starting point.

Step 1:

We searched for typical symptoms or phenotypes of Friedreich’s Ataxia (FA), giving us the results below:

Top phenotypes of Friedreich’s Ataxia

As you can see, there are quite a few!  Chances are, a vast number of diseases will share the same phenotypes, so let’s pick a less common one which will (hopefully) narrow down the results.

Step 2:

At number 40, is limb ataxia.  If we click on this, we’re taken to a whole raft of pertinent papers:

Papers with co-occurrences of Friedreich’s Ataxia and limb ataxia

Step 3 (an alternative route):

We can also flip reverse the search and see what other diseases are out there with limb ataxia as a phenotype – could they share something?  If we search for this symptom, we get this rundown of results:

Top diseases associated with limb ataxia

So now we’re armed with this information, we can go back into Orphanet and link it all up.

Linking up Orphanet data

Within any text analytics project, being able to link up mined data with data from curated sources of evidence is incredibly valuable. Our ORPHAN VOCab enables you to do just this because each rare disease entity extracted from the text gets annotated with its unique Orphanet identifier.  Once you have this, it opens up the opportunity to incorporate Orphanet’s own curated data into your analysis.  This includes datasets connecting rare diseases with associated genes and phenotypes and also data on the epidemiology of each rare disease.

At this point, you’ve transformed your unstructured text into structured data.  With it, you have a crucial element of driving strategic decision making over drug-repurposing candidates, allowing you to focus research into drug development.

We’ve gone even further in looking at how you could investigate phenotypic links between rare diseases in another of our blogs.

If you’d like to know more about our ORPHAN VOCab or any others and how SciBite can transform your data, get in touch with the team. We’d love to hear from you.

See you next week for another VOCab!


Related articles

  1. A hacker’s guide to understanding bio-ontology jargon

    Perfect for those new to bio-ontologies or who work with ontologists - a whole new vocabulary deciphered!

  2. Drug repurposing, rare diseases and semantic analytics

    In this blog we cover how to look potentially reduce the cost of and speed up the repurposing pipeline.


How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us