How biomedical ontologies are unlocking the full potential of biomedical data

SciBite / News / How biomedical ontologies are unlocking the full potential of biomedical data

Our latest blog explains how SciBite’s Ontologies team takes public biomedical ontologies and tailors them so that they can be used for named entity recognition (NER).

The importance of biomedical data curation

In a world where we have access to an ever-expanding sea of information – be it music, science, photos or news – curation is increasingly key. This is well-known to the major tech companies, Google, for example, employs an army of 10,000 curators (‘raters’) to assess the quality of its output.

It is especially true in the life-sciences, where the process of sifting, sorting and normalizing biomedical data (a process known as biocuration), is critical for making that data accessible and findable.

The role of biocuration

Biocuration generally involves applying standard semantics – terms from biomedical ontologies and vocabularies– to data. This process may also be called ‘data annotation’. It is typically a manual process, for which professional biocurators are required to have domain expertise as well as the ability to navigate often large and complex ontologies. It is expert, labour-intensive work and as such often not scalable for large amounts of data.

This is where SciBite’s tools come in. Our Ontologies team takes public biomedical ontologies and tailors them so that they can be used for named entity recognition (NER). We do this tailoring in two key ways. First, we add more synonyms to increase the search breadth, and to allow for normalisation across entities.

For example: PDE5A, Phosphodiesterase V, Phosphodiesterase 5a, and PDEVa are all synonyms for the gene “Phosphodiesterase 5” and they all resolve to the same entity via its ID. Second, we handle ambiguity so that the matches are made only within the correct contexts, so ‘EGFR’ will match to either ‘Epidermal growth factor receptor’ or ‘e-glomerular filtration rate’ depending on the surrounding text.

The biocuration process at scale

This is itself a manual biocuration process, but one which allows us to scale across large amounts of information very quickly. Valuable manual effort can be reused efficiently and at scale. And while at SciBite we take pride in our manual expertise, we also super-charge work that with technologies such as machine learning and rule-based systems to make the manual work faster and more efficient.

Obviously NER can only take you so far, but once you have your entities reliably recognised and typed, you can leave your biocurators to focus on the parts that really benefit from human intervention, like extrapolating inferences from multiple sources or extracting subtle meaning from text. Alternatively, the annotated text can be used as the input to your machine learning/AI systems, helping improve performance by reducing noise and variation.

Custom solutions for specific needs

At SciBite we often build custom TERMite vocabularies for our customers, with their own data or in some specialist area, or to augment SciBite’s own vocabularies. Increasingly, we get asked to provide other specialist ontology related services such as creating new vocabularies or more formal ontologies where no public source exists, or to create custom semantic patterns to detect certain elements from text.

This is why we have decided in 2019 to launch Expert Ontology Services, which will allow our specialist team, with their many years of experience of working with ontologies, to help tackle your organisations data challenges and get you on the road to clean data.

Get in touch with the team to find out how we can work with you.

Jane Lomax

Head of Ontologies, SciBite

Jane leads the development of SciBite’s vocabularies and ontology services. With a Ph.D. in Genetics from Cambridge University and 15 years of experience working with biomedical ontologies, including at the EBI and Sanger Institute, she focussed on bioinformatics and developing biomedical ontologies. She has published over 35 scientific papers, mainly in ontology development.

Other articles by Jane:

Introduction to ontologies; Watch on demand
Why do you need FAIR data? read more
The key to being FAIR read more
Using ontologies to unlock the full potential of your scientific data – Part 1; read more
How biomedical ontologies are unlocking the full potential of biomedical data; read more

Share this article

Relevant resources, events and news

News Creating a SciBite VOCab from a public ontology

Public ontologies are key for applying FAIR principles but are not built for use in NER discover how we create a VOCab from a Public Ontology