We provide thought leadership based on our unique and valuable combination of extensive industry experience, coupled with unparalleled expertise in ontology development, standards and biocuration.
With an extensive range of tools and techniques at our disposal, we deliver significant value across a range of use cases, including:
Flex your capacity and extend the capabilities of your team quickly and easily
Extensive Life Science experience and active engagement with industry initiatives
Unparalleled expertise in ontology development, standards and biocuration
Get in touch with us to find out how we can transform your dataContact us
A global bioscience company wanted to standardise its use of scientific terminology. However, they found that most publicly available ontologies focused on pharmaceutical terminology and did not provide appropriate coverage relevant to their business, which is to develop solutions for the food, nutritional and agricultural industries. The manual development and review of a new vocabulary would have consumed significant internal resource.
In an attempt to identify articles of interest amongst the background ‘noise’, LifeArc’s Scientific Horizon Scanning Team members used to manually scan through PubMed, grant information and a range of biotech-focussed news websites for potentially interesting articles. This process was resource intensive, limiting the coverage, depth and frequency of review possible.
The Business Challenge:
Many applications of AI involve pattern recognition, but their accuracy is highly dependent on the data being unambiguous. Machine learning models can be used to identify sentences describing positive and negative relations between entities (i.e. X has some relation with Y). However, in order to train such models, it is vital to have as clean a dataset as possible. For example, without prior semantic enrichment of the text, a machine model would not be able to correctly identify that the phrase “...the binding of repaglinide to HSA in human plasma...” refers to an interaction between a drug and a protein, rather than between two proteins.
The SciBite Solution:
We created a tool that makes use of SciBite’s Named Entity Recognition (NER) engine, TERMite, to accurately identify and categorise examples of sentences that mention protein-protein interactions. First, all sentences mentioning entity type 1 and entity type 2 were extracted from MEDLINE. In the case of protein-protein interactions, we were looking for two GENE mentions in a sentence. These sentences were then surfaced to a curator, along with related metadata. The curator then assigns the sentence to one of three sets: i) sentences that describe a positive interaction, ii) sentences that describe a negative interaction, or iii) coincidental mentions. This data that can then be used to train machine learning models to automate the extraction of sentences describing a relation of interest.
A global pharmaceutical company recognized the potential of the huge volumes of bioassay data that they had generated but struggled to gain insights from this valuable resource. A lack of standardization across their data repositories, including LIMS and other bioassay databases, had resulted in different ways to describe the same thing, for example, ‘mouse’, ’mice’, ‘Mus musculus’ and ‘m. musculus’, making it hard to collate data for a particular species. This was compounded by the fact that some database fields were sparsely populated fields while others contained useful information buried in long assay descriptions.
We enriched our species, gene, and bioassay vocabularies with customer-specific terms and synonyms to ensure all relevant information would be recognized. We then analysed the assay names from the legacy database and extracted the different entities within each one. Each entity was extracted and mapped to a single, standard vocabulary term to normalize the data.
The Business Challenge:
A leading business intelligence company had developed and acquired a range of life sciences databases. However, each database was indexed differently, resulting in silos of data that had to be searched independently.
The SciBite Solution:
As an initial step, we mapped the client's internal lists of indexing terms to standard controlled life sciences vocabularies, including the Indication branch of MeSH (Medical Subject Headings) and Drugs from ChEMBL. This resulted in a single consistent means to index the clients databases. With the index mapping in place, connections could be made between entries in previously disconnected databases, enabling users to seamlessly navigate the content within them.
Our latest blog explains how SciBite's Ontologies team takes public biomedical ontologies and tailors them so that they can be used for named entity recognition (NER).Read
In our latest blog we discuss the challenges life sciences companies, like LifeArc, face in keeping up-to-date with scientific literature, and how semantic enrichment technology can automate this process to reduce the time spent mining data by up to 80%.Read
Get in touch with us to find out how we can transform your data
© SciBite Limited / Registered in England & Wales No. 07778456