The Business Challenge:
Many applications of AI involve pattern recognition, but their accuracy is highly dependent on the data being unambiguous. Machine learning models can be used to identify sentences describing positive and negative relations between entities (i.e. X has some relation with Y). However, in order to train such models, it is vital to have as clean a dataset as possible. For example, without prior semantic enrichment of the text, a machine model would not be able to correctly identify that the phrase “…the binding of repaglinide to HSA in human plasma…” refers to an interaction between a drug and a protein, rather than between two proteins.
The SciBite Solution:
We created a tool that makes use of SciBite’s Named Entity Recognition (NER) engine, TERMite, to accurately identify and categorise examples of sentences that mention protein-protein interactions. First, all sentences mentioning entity type 1 and entity type 2 were extracted from MEDLINE. In the case of protein-protein interactions, we were looking for two GENE mentions in a sentence. These sentences were then surfaced to a curator, along with related metadata. The curator then assigns the sentence to one of three sets: i) sentences that describe a positive interaction, ii) sentences that describe a negative interaction, or iii) coincidental mentions. This data that can then be used to train machine learning models to automate the extraction of sentences describing a relation of interest.
Key Business Benefits:
Find out more about how our Ontology Services can benefit your business.
SciBite CSO and Founder Lee Harland shares his views on why ontologies are relevant in a machine learning-centric world and are essential to help "clean up" scientific data in the Life Sciences industry.Read
Get in touch with us to find out how we can transform your data
© Copyright © 2024 Elsevier Ltd., its licensors, and contributors. All rights are reserved, including those for text and data mining, AI training, and similar technologies.