In a recent webinar, SciBite and Stardog team members demonstrated how to build a knowledge graph to identify candidate drugs for a rare disease.
There are thousands of rare diseases in the world, many without any cures or treatments available. The fact that they affect a relatively low number of people means there is little commercial drive for pharmaceutical companies to develop new drugs for them. Drug repurposing has presented an attractive solution to this problem by finding other already-approved drugs that have the potential to treat rare diseases.
However, identifying drug candidates is still a data-intensive endeavour, and it is one that requires a view of data from multiple sources and at huge scale. The creation of a Knowledge Graph can help with the drug discovery process by bringing together different datasets and enabling researchers to make important connections that simply would not be possible using manual review.
Life sciences researchers are often challenged to navigate an ever-growing deluge of complex scientific data in their search for solutions. New technologies and tools in data science are the key to empowering them by providing clean, readable data along with the ability to map scientific concepts (e.g., genes, drugs) and leverage public ontologies.
SciBite uses its data modelling, tooling, and expertise to create that clean, machine-readable data for analysis. Working together with Stardog, a leading platform that does data analysis at scale, these combined forces create an analytical solution that enables insights that simply wouldn’t be possible through manual review and curation.
To demonstrate this synergy, SciBite and Stardog recently held an illuminating webinar called “Building a Knowledge Graph for Drug Discovery,” in which experts from both companies showed how a handful of powerful tools work together to find answers quickly.
For this practical demonstration, which only took about 40 minutes, three presenters worked in succession for the goal of finding candidate drugs for repurposing to treat Friedrich’s Ataxia, a rare genetic disorder.
After an opening by Sam Shelton, Partnership Manager at SciBite, on drug repurposing, SciBite Senior Solutions Engineer Simon Jupp began by introducing CENtree, SciBite’s ontology management platform. Providing one-click access to over 60 biomedical ontologies, CENtree has a number of impressive features, including a governance layer for creating and editing ontologies and numerous export formats.
“Ontologies are used to give us the shared understanding of those things that go into our knowledge graphs – so those entities might be genes, drugs, diseases, whatever we’re interested in,” Simon explained as he loaded relevant ontologies and built a schema in CENtree. “If we use common identifies for those things, then they naturally become connected in our knowledge graph, and we can start to query across different data sources that are described using the same set of ontologies.”
CENtree allows users to deploy ontologies to another important tool, TERMite. This automatically expands the synonyms supported and uses these synonyms to identify entities within unstructured text, mapping these entities back to a common identifier.
“TERMite is SciBite’s very precise named entity recognition engine that annotates unstructured natural language content by using a set of controlled vocabularies we call VOCabs,” said Tiago Almeida, Senior Data Scientist at SciBite, as he took the reigns of the webinar. These controlled vocabularies that SciBite builds and maintains are based on public ontologies whenever possible and wherever they exist. And TERMite, it should also be noted, can handle any unstructured textual information, even a spreadsheet full of information which isn’t natural language.
As he showed the webinar audience how TERMite could annotate documents on Friedrichs Ataxia, Tiago explained that by converting unstructured text into structured, semantic data, TERMite allows for the identification of actionable insights. He also demonstrated how the TExpress tool works alongside TERMite, matching patterns between entities and inferring relationships from the literature.
In just a few minutes, CENtree, TERMite, and TExpress were used to build an ontology and extract the terms from some 4,000 documents, creating a machine-readable output ready to be transferred from SciBite into Stardog, where the knowledge graph can be created.
At this point in the presentation, Nick McHugh, Senior Solutions Consultant at Stardog, showed off the platform’s impressive ability to explore, query, and visualize data.
“It provides a flexible abstraction layer to connect any type of data, regardless of its format,” he said of the Stardog platform. “Connect data in a way that is not possible or (perhaps) practical in rigid, siloed systems, often discovering new connections in the process. It gives a simple way for users and machines to query and explore the graph, because data and metadata live together, providing user and machine-readable views.”
He also noted that, in a knowledge graph, data is not stored or organized in a tree. “There’s no top, middle, or center, and this allows users—or machines, for that matter—to start querying, exploring, and then retrieve data from any point in the graph.”
Nick demonstrated various Stardog features as he worked on pinpointing the information they were looking for on Friedrich’s Ataxia – specifically, drugs that increase the expression of frataxin and are in Phase IV. With the help of information from the ChEMBL database, he was quickly able to identify four possible drug candidates that would be most suitable for further exploration.
The presentation ably proved how easily these tools could together help surface needed information rapidly, providing potentially enormous time savings during the drug discovery process.
The webinar concluded with a Q&A that attracted a variety of interesting questions, which made it clear that many people in the life sciences community are eager to discover how knowledge graphs might help them advance their own research.
Watch the webinar presentation to learn more, or get in touch to find out how our team can help you get the most from your data.
SciBite works with leading data analytics companies to deliver large sets of clean, machine-readable data that simply wouldn’t be possible using manual curation methods. Learn how, together, we can propel your digital transformation with the data, software, and service expertise to make large-scale clean data an opportunity, not a hurdle.
Sam leads partnerships and alliances at SciBite, working collaboratively with existing partners and developing new partnerships aligned to SciBite’s strategic goals. He has a strong technical background in the life sciences, with a PhD in Protein Biochemistry from the University of Nottingham and post-doctoral training in bioinformatics within the department of Neurosurgery at the University of California San Francisco.
Prior to Joining SciBite he held technical sales and commercial roles at Carl Zeiss and most recently led business development at Repositive, building relationships with contract research organisations, biotech’s and pharma companies, facilitating data exchange and search across multiomic datasets. He has a good grasp of the challenges of dealing with unstructured scientific data, and collaboratively developing practical solutions to overcome these.