For most drug discovery companies, maintaining an awareness of the scientific literature relevant to their business’ therapeutic programmes remains a challenge. Yet to ensure competitiveness, this task is essential. It commonly involves manual scanning of a wide array of sources by individuals or teams, which can be time-consuming, expensive, and laborious.
Researchers at life sciences medical research charity, LifeArc, have experienced this challenge first hand, with their team of analysts spending multiple hours per day scanning through literature in PubMed, publicly available grant information, and a range of biotech-focussed news websites in an attempt to identify articles of interest amongst the background ‘noise’. Manually searching different sources with multiple keywords or phrases is resource intensive, resulting in constraints on the number of sources that can be scanned, the frequency at which the scanning can be performed and the depth of review possible for potentially relevant articles. However, this work is necessary if like LifeArc, companies wish to uncover a wealth of information regarding novel technologies, new drug targets, biomarkers and rare disease connections.
Automating the process of scanning multiple sources of data is challenging because terms or phrases of interest can be spread out in an article and different authors use different terminology when describing the same thing. Fortunately for LifeArc, SciBite’s semantic enrichment technology provides a seamless, automated solution to this issue, which led to our collaboration in developing a machine learning technology combining artificial intelligence approaches.
“We knew this process was inefficient and we were keen to find an alternative way to triage through publications and data in order to free up our analyst’s time and allow them to focus on the most important and relevant findings,” explains Ben Cryar, Senior Analyst, Opportunity Assessment Group, LifeArc. “Ultimately we want to streamline the research process so that we can focus on the next steps in identifying the most exciting new discoveries.”
Through semantic enrichment technology, SciBite ensure that all relevant information is found through search, regardless of which synonym is used as the search term. Users can create specific searches containing multiple relevant terms and entities forming so called search patterns. To accommodate the reality that there are often multiple ways of describing an outcome of interest, multiple patterns can be aggregated into ‘bundles’ which can be run across the same data simultaneously. SciBite calculates a score for a piece of text based on how many patterns match and whether those matches are complementary or competing, providing an incredibly powerful yet clear method to identify relevant data.
SciBite instantly enriches data sources to derive more value from the reading experience so reviewers can focus on understanding interesting findings rather than searching for them. With SciBite’s technology, LifeArc can cast a wide net to continuously monitor a comprehensive set of data sources and be automatically alerted to interesting scientific and medical advances.
“The technology is designed to reduce the time spent mining data by up to 80%, providing researchers with a subset of scientifically-relevant information filtered from the vast amounts of raw data in a rapid, easy-to-interpret manner, allowing them to focus and accelerate their research,” says Jane Lomax, Head of Ontologies at SciBite. “We want more and more of the hidden knowledge in scientific content to be unlocked by simple services provided by our platform, helping application developers and informatics professionals build even more intelligent systems.”
The foundation of the SciBite platform are the vocabularies and ontologies which apply an explicit, unique meaning and description to a term. This enables unstructured text to be contextualised so that it can be understood and used as high quality, actionable data, irrespective of its source. Comprising tens of millions of synonyms, SciBite’s manually curated vocabularies have unrivalled depth and breadth, ensuring comprehensive coverage of relevant terminology and providing the robust foundation necessary for an effective and impactful literature monitoring strategy.
These vocabularies can be augmented with bespoke terms, including as those relating to novelty and specific technologies such as biomarkers and diagnostics. This enables SciBite to cope with the evolving language in new scientific fields, such as “CRISPR”, “10x Genomics” and “Drop-Seq”.
Figure 1: Examples of different phrases used to reflect novelty
Working together with LifeArc, SciBite’s team of experienced curators manually curated a library comprising tens of millions of synonyms tailored specifically to LifeArc’s internal vocabularies, such as compound identities and study codes, to identify genes, diseases, devices and many more scientific concepts. By tailoring this with bespoke semantic patterns to extract key pieces of information from heterogeneous sources, LifeArc have been able to reduce the duration of the review process by over 80%, enhance the certainty involved in evaluating the vast body of research and news to track trends, highlight up and coming technologies and gain early insight into potentially groundbreaking scientific advances. SciBite’s Ontology Services utilise it’s team with their many years of experience of working with ontologies to help life sciences companies, like LifeArc, to tackle their data challenges.
SciBite is not limited to publicly available data, users can also apply semantic enrichment to internal and third-party data sources such as internal document repositories, ELNs, patents, and commercially available databases. Since the resulting data will be as well-structured and interoperable as public data, it becomes facile to integrate multiple disparate sources and gain a holistic view of everything that is known, both internally and externally, about any compound, target, or disease. SciBite provides users with a single, consistent, and simple user interface, enabling them to ask questions across data sources that would have otherwise been time-consuming or impossible to answer.
The results of such analyses deliver valuable business insight and enable companies to understand the research landscape, such as identifying trending topics, and revealing how the volume of information associated with the current ‘hot topic’ has changed over time.
It becomes straightforward for a company to define its internal strengths and assess the competitive landscape relating to specific targets or diseases. Similarly, it is possible to identify which companies or institutions are working in which disease or technology area of interest to explore options for collaboration or acquisition. In each case, alerts can be set up to ensure information is highlighted to the right people in a timely manner.
This innovative project with LifeArc on real time ontology-led horizon scanning has been shortlisted for Bio-IT World’s 2019 inaugural Innovative Practices Awards.
To learn more about how your business could benefit from comprehensive competitive intelligence monitoring in real time in a flexible, easy-to-use, accessible environment, download the use case or get in touch with the team to learn more about our Ontology Services.
1 See SciBite’s publication ‘Unlock the Full Potential of Departmental Scientific Documents’, for details
2 See SciBite’s publication ‘Unlock the Full Potential of ELN Data’, for details
SciBite CSO and Founder Lee Harland shares his views on why ontologies are relevant in a machine learning-centric world and are essential to help "clean up" scientific data in the Life Sciences industry.Read
Our latest blog explains how SciBite's Ontologies team takes public biomedical ontologies and tailors them so that they can be used for named entity recognition (NER).Read
Get in touch with us to find out how we can transform your data
© Copyright © 2023 Elsevier Ltd., its licensors, and contributors. All rights are reserved, including those for text and data mining, AI training, and similar technologies.