Raw data has the inherent characteristic to be unstructured with potential quality issues such as inaccurate, incomplete, inconsistent, and duplicated. Therefore, it must be processed before it can be used for subsequent analysis and confident data-driven decisions.
This is where ontologies come into play.
Ontologies provide a way to codify or standardize the business language of organizations across a domain in a machine-readable form. They enable the information sharing between diverse systems within the same domain. In the life sciences domain, some of the terms (entities) of interest include molecules, targets, diseases, tissues, cell lines, anatomical terms, species, strains to name a few.
Those entities may have associated properties (such as labels, definition, synonyms, mappings) and relationships (such as ‘part of’, ‘derives from’, ‘develops from’) to another entities or entity’s categories. Thus, a drug molecule can interact with a target and modulate its activity (a relationship between different entities) or may be an analgesic, a corticosteroid, an expectorant, a sedative… (a category).
Figure 1: Example of drug’s category from the Chemical Entities of Biological Interest (ChEBI) ontology.
Each entity has both a unique identifier as well as potentially multiple externally issued ones (such as drug INN, CAS, SMILES, InChI) depending on the queried ontology.
Figure 2: Examples of externally issued drug’s identifiers from the ChEBI ontology.
The categories can be thought of as adjectives, while the relationships can be thought of as prepositional phrases (such as ‘is enantiomer of’, ‘is conjugate acid of’).
Figure 3: Example of chemical entity’s relationship from the ChEBI ontology.
An ontology defines a set of classes, attributes (or properties) and relationships with which to model a domain of knowledge (please read this article for understanding ontology jargon). With the increased adoption of ontologies in the life sciences domain, CENtree plays a fundamental role in data standardization and interoperability.
Standardization is the result of using ontologies to eliminate ambiguity from technical languages and thus enabling communication and knowledge sharing between agents either human or software thanks to resolvable globally unique and persistent identifiers. It also increases compatibility and interoperability across data sets allowing information to be shared within a larger network (e.g., collaboration, partnerships).
CENtree aims to facilitate ontology development, edition and visualization but it can also be used as an ontology’s server. It provides an intuitive Web interface for displaying the details and hierarchy of a specific ontology term.
There are a couple of Web sites which provide lists of scientific ontologies (e.g., OBO Foundry, BioPortal). When an application (e.g., an ELN, a data catalog, a knowledge graph) needs to consume some of the ontologies, it is required to learn each individual underlying API and manage the authentication mechanism for each of them.
With CENtree, this process is simplified and quite straightforward. Like all applications developed at SciBite, CENtree is an API-first application. This means that we developed exhaustive collections of API endpoints first, and then, we designed the Web user interface (UI) based on those collections. This also means that all the actions, doable in the UI, can be done using the API endpoints (here, some examples from the “search-resource” and “ontology-metadata-resource” collections).
So, to integrate the ontologies that are hosted and maintained in CENtree, a unique API knowledge is required as well as a single authentication! This represents a huge benefit over the multitude of individual APIs of the different public ontologies. The ROI of leveraging CENtree as an ontology server is quite immediate in terms of efforts and resources required to integrate the required ontologies in a downstream application.
The UI of CENtree has been designed to be accessible to a broad audience and not just experts (e.g., ontologists). Loading public ontologies is a one-click process for CENtree users having the appropriate permissions, the related API endpoint being:
Figure 4: One-click feature to load public ontologies in the CENtree UI.
Here are some of the benefits of using CENtree as an ontology server:
Figure 5: Process to publish public ontologies and SciBite VOCabs in the CENtree UI.
With a PhD from Newcastle University in computational approaches to drug repositioning, Joe brings a strong scientific foundation rooted in semantic data integration, knowledge graphs, and data mining. Since joining SciBite in 2017, he has had the privilege of leading the Data Science and Professional Services teams, where he combined cutting-edge technology with our core data enrichment products to create tailored solutions for a diverse range of customers.
Today, as Product Director, Joe is passionate about shaping the vision of our software solutions, aligning them with strategic goals, and most importantly, supporting our clients in unlocking the full potential of their scientific data.
His focus is on driving innovation that empowers scientists and organizations to make impactful discoveries faster and more efficiently.
Other articles by Joe