SciBite and Stardog build a Knowledge Graph for drug discovery

In a recent webinar, SciBite and Stardog team members demonstrated how to build a knowledge graph to identify candidate drugs for a rare disease.

Blog - SciBite Stardog Knowledge Graph Blog

There are thousands of rare diseases in the world, many without any cures or treatments available. The fact that they affect a relatively low number of people means there is little commercial drive for pharmaceutical companies to develop new drugs for them. Drug repurposing has presented an attractive solution to this problem by finding other already-approved drugs that have the potential to treat rare diseases.

However, identifying drug candidates is still a data-intensive endeavour, and it is one that requires a view of data from multiple sources and at huge scale. The creation of a Knowledge Graph can help with the drug discovery process by bringing together different datasets and enabling researchers to make important connections that simply would not be possible using manual review.

SciBite and Stardog: Stronger together

Life sciences researchers are often challenged to navigate an ever-growing deluge of complex scientific data in their search for solutions. New technologies and tools in data science are the key to empowering them by providing clean, readable data along with the ability to map scientific concepts (e.g., genes, drugs) and leverage public ontologies.

SciBite uses its data modelling, tooling, and expertise to create that clean, machine-readable data for analysis. Working together with Stardog, a leading platform that does data analysis at scale, these combined forces create an analytical solution that enables insights that simply wouldn’t be possible through manual review and curation.

To demonstrate this synergy, SciBite and Stardog recently held an illuminating webinar called “Building a Knowledge Graph for Drug Discovery,” in which experts from both companies showed how a handful of powerful tools work together to find answers quickly.

Making connections

For this practical demonstration, which only took about 40 minutes, three presenters worked in succession for the goal of finding candidate drugs for repurposing to treat Friedrich’s Ataxia, a rare genetic disorder.

After an opening by Sam Shelton, Partnership Manager at SciBite, on drug repurposing, SciBite Senior Solutions Engineer Simon Jupp began by introducing CENtree, SciBite’s ontology management platform. Providing one-click access to over 60 biomedical ontologies, CENtree has a number of impressive features, including a governance layer for creating and editing ontologies and numerous export formats.

Ontologies are used to give us the shared understanding of those things that go into our knowledge graphs – so those entities might be genes, drugs, diseases, whatever we’re interested in,” Simon explained as he loaded relevant ontologies and built a schema in CENtree. “If we use common identifies for those things, then they naturally become connected in our knowledge graph, and we can start to query across different data sources that are described using the same set of ontologies.

CENtree allows users to deploy ontologies to another important tool, TERMite. This automatically expands the synonyms supported and uses these synonyms to identify entities within unstructured text, mapping these entities back to a common identifier.

Structuring data

TERMite is SciBite’s very precise named entity recognition engine that annotates unstructured natural language content by using a set of controlled vocabularies we call VOCabs,” said Tiago Almeida, Senior Data Scientist at SciBite, as he took the reigns of the webinar. These controlled vocabularies that SciBite builds and maintains are based on public ontologies whenever possible and wherever they exist. And TERMite, it should also be noted, can handle any unstructured textual information, even a spreadsheet full of information which isn’t natural language.

As he showed the webinar audience how TERMite could annotate documents on Friedrichs Ataxia, Tiago explained that by converting unstructured text into structured, semantic data, TERMite allows for the identification of actionable insights. He also demonstrated how the TExpress tool works alongside TERMite, matching patterns between entities and inferring relationships from the literature.

In just a few minutes, CENtree, TERMite, and TExpress were used to build an ontology and extract the terms from some 4,000 documents, creating a machine-readable output ready to be transferred from SciBite into Stardog, where the knowledge graph can be created.

Putting it together

At this point in the presentation, Nick McHugh, Senior Solutions Consultant at Stardog, showed off the platform’s impressive ability to explore, query, and visualize data.

“It provides a flexible abstraction layer to connect any type of data, regardless of its format,” he said of the Stardog platform. “Connect data in a way that is not possible or (perhaps) practical in rigid, siloed systems, often discovering new connections in the process. It gives a simple way for users and machines to query and explore the graph, because data and metadata live together, providing user and machine-readable views.”

He also noted that, in a knowledge graph, data is not stored or organized in a tree. “There’s no top, middle, or center, and this allows users—or machines, for that matter—to start querying, exploring, and then retrieve data from any point in the graph.”

Nick demonstrated various Stardog features as he worked on pinpointing the information they were looking for on Friedrich’s Ataxia – specifically, drugs that increase the expression of frataxin and are in Phase IV. With the help of information from the ChEMBL database, he was quickly able to identify four possible drug candidates that would be most suitable for further exploration.

Finding answers faster

The presentation ably proved how easily these tools could together help surface needed information rapidly, providing potentially enormous time savings during the drug discovery process.

The webinar concluded with a Q&A that attracted a variety of interesting questions, which made it clear that many people in the life sciences community are eager to discover how knowledge graphs might help them advance their own research.

Watch the webinar presentation to learn more, or get in touch to find out how our team can help you get the most from your data.

Watch the webinar

Related articles

  1. Addressing common challenges with Knowledge Graphs

    In this blog we describe the pivotal role of semantic enrichment in the creation of effective Knowledge Graphs, and illustrate how semantic Knowledge Graphs help answer complex scientific questions.

  2. What is a Semantic Knowledge Graph?

    At a time where more and more of our customer projects revolve around knowledge graph creation, we thought it was about time we blogged on what exactly a knowledge graph is and explain a bit more about how our semantic enrichment technology is being used to facilitate the production of such a powerful data model.


How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us