Bioassays are crucial in helping pharmaceutical companies determine the potency (biological activity) of their products, However, due to their complex nature, bioassays are among of the most challenging experiments to perform reliably and with accurate results.
Databases dedicated to managing bioassay data contain a wealth of research and development knowledge, but hurdles exist when it comes to extracting knowledge from these resources.
Many companies deploy data management systems that are geared towards entering rather than mining data. In addition, replacing such systems over time results in silos of legacy data in a variety of formats and aligned to different standards.
Bioassay data management systems are typically based on relational databases. While this affords some structure to the data, front-end applications tend to capture data as free text fields to avoid burdening or restricting users.
Even for defined entries, the meaning of a field or its contents may be ambiguous, imprecise or contain multiple different data types such as Gene, Target and Species.
The inconsistent use of synonyms during data entry may also make it difficult to collate data for a disease or target of interest. For example, searching a bioassay database for the Alzheimer’s related gene, PSEN1, would miss references to synonyms such as Presenilin-1, AD3 and PSNL1.
The normalization of literature and alignment of text to ontologies and industry standards is a vital part of the process as illustrated below.
The above diagram shows the normalisation of entities captured in unstructured bioassay titles: extracting the Cell Line, Drug, Species and Target entities results in a semantic index that allows users to make connections between different bioassays.
When applying standard ontologies and vocabularies to bioassay data, the source of these ontologies is a key consideration if an organisation is to avoid relying on specific vendors.
By employing public standards such as BAO (BioAssay Ontology), ChEMBL (chemical entities), CLO (Cell Ontology) and EFO (Experimental Factor Ontology), the resulting enriched data is open and interoperable from system to system – something that is also crucial in creating FAIR data principals within the enterprise.
Implementing a change in the organization’s data management strategy should not be confined to legacy data. Prospective data capture should also be aligned to the same ontological standards to ensure seamless integration of historic and future data.
Below is a workflow that combines ontology management with semantic enrichment to unlock the value of bioassay data.
This workflow also ensures that bioassay data and metadata conform to FAIR data principals by:-
Enriching bioassay data not only makes it simpler to interrogate this data, it also allows more complex ontology-based questions to be asked. For example, it may be of interest to ask the following questions of your assay data:-
Once data is normalized and aligned to ontologies, the task of enriching data with alternative sources (internal or external) becomes much simpler, allowing additional evidence to be integrated to automate and enrich the data analysis process.
The above use case combines retrospective and prospective data management, which has been deployed by a number of SciBite’s customers.
It brings intelligent scientific search to any bioassay platform, making bioassay data computationally accessible for automated analysis, ensuring realization of its full value.
Data cleansing to unlock the potential of bioassay data
Get in touch with us to find out how we can transform your data
© SciBite Limited / Registered in England & Wales No. 07778456