A global pharmaceutical company recognized the potential of the huge volumes of bioassay data that they had generated but struggled to gain insights from this valuable resource. A lack of standardization across their data repositories, including LIMS and other bioassay databases, had resulted in different ways to describe the same thing, for example, ‘mouse’, ’mice’, ‘Mus musculus’ and ‘m. musculus’, making it hard to collate data for a particular species. This was compounded by the fact that some database fields were sparsely populated fields while others contained useful information buried in long assay descriptions.
We enriched our species, gene, and bioassay vocabularies with customer-specific terms and synonyms to ensure all relevant information would be recognized. We then analysed the assay names from the legacy database and extracted the different entities within each one. Each entity was extracted and mapped to a single, standard vocabulary term to normalize the data.
Figure: Extraction of Cell Line, Drug, Species and Target entities within the unstructured titles of a selection of assays. The resulting semantic index enables connections to be made between bioassays
Find out more about how our Ontology Services can benefit your business.
SciBite CSO and Founder Lee Harland shares his views on why ontologies are relevant in a machine learning-centric world and are essential to help "clean up" scientific data in the Life Sciences industry.
Read![]() |
![]() |
In this blog we describe the pivotal role of semantic enrichment in the creation of effective Knowledge Graphs, and illustrate how semantic Knowledge Graphs help answer complex scientific questions.
ReadGet in touch with us to find out how we can transform your data
© Copyright © 2023 Elsevier Ltd., its licensors, and contributors. All rights are reserved, including those for text and data mining, AI training, and similar technologies.