Novelty in life science: Looking into the unseen

Image and link to LinkedIn profile of blog author Zahra Hosseini

In life science research, navigating the complexities of innovation is crucial for breakthroughs. SciBite’s Novelty model, a sophisticated Machine Learning classifier, distinguishes true innovation in scientific texts.

Rainbow Over The Sea


In the dynamic realm of life science research, groundbreaking discoveries not only pave the way for commercial success but are pivotal in propelling scientific advancement forward. The essence of innovation and uniqueness behind these discoveries often marks the fine line between triumph and obscurity in the competitive landscape of research. For instance, many would find it difficult to recall the name of the second antibiotic ever discovered, similarly, naming alternatives to landmark drugs, like Viagra, poses a challenge to far for the majority! This underscores the crucial role of differentiation and novelty in the life sciences, emphasizing how these pioneering achievements are instrumental in setting new benchmarks and opening uncharted territories for exploration and development

The quest for novelty

In the world of scientific discovery, where evidence-based decision-making reigns supreme, the stakes are exceptionally high. Missteps, such as an incorrect dosage or the misapplication of a treatment, can lead to dire, even life-threatening, consequences. This underscores the importance of utilizing high-quality data that rigorously adheres to established standards, ensuring the processes of review, control, and regulation are consistent and reliable.

Such rigor in data integrity not only guards against potential errors or deviations that could undermine the validity of scientific findings or breach regulatory compliance but also empowers researchers to precisely navigate through documents detailing critical relationships among diseases, compounds, genes, and more. Furthermore, this foundation becomes even more crucial when the goal extends to uncovering cutting-edge insights, such as identifying the latest research, or the most recent biomarkers associated with a specific therapeutic area.

The pursuit of such advanced knowledge demands access to data that not only meets rigorous standards but is also capable of capturing the forefront of scientific innovation.

Limitations of tradition

In the dynamic world of life science research, the essence of novelty, crucial for scientific breakthroughs, eludes strict definition and traditional capture methods. Traditional approaches often fall short in recognizing the nuanced intricacies of innovation, as the concept of novelty itself resists simplification into rigid rules. This reflects the complex nature of scientific discovery in life sciences, where advancements frequently defy conventional boundaries, underscoring the necessity for a more adaptable and nuanced appreciation of innovation.

Spotting the gems

What do we mean by novelty?

In the realm of life science, discussing novelty brings us to the forefront of both groundbreaking discoveries and the equally vital iterative innovations—those incremental enhancements that progressively build on established knowledge. Each piece of scientific literature, whether it unveils a pioneering breakthrough or details subtle, iterative advancements, holds the potential to alter the field significantly.

At SciBite, we emphasize not just the identification of key entities (using our FAIRFactory capabilities) within scientific texts but also the critical context surrounding these mentions. To this end, we’ve engineered a Novelty model adept at classifying sentences to uncover complex patterns indicative of true ‘innovation.’ This tool transcends simple binary classifications, skillfully demonstrating its prowess in distinguishing between groundbreaking, recent scientific discoveries, and sentences that merely present known facts (see Figure 1). Through such nuanced analysis, we underscore the multifaceted nature of innovation, recognizing the transformative power lurking within every detail of scientific inquiry.

Blog_Novelty In Life Science_Figure 1

Figure 1: Adding different Novelty concepts to the search (Novel, New, Report, Others).

The model we developed is a Machine Learning (ML) model (BioBERT, to be precise) that classifies sentences into four categories: Novel, New, Report, or Other – described below.

Novel sentences either explicitly mention synonyms for novelty or report recent technologies, advancements, and approaches that have the potential to be game-changers or more efficient. For example: Furthermore, these data might provide the theoretical foundation for further clinical applications of hUCMSCs in burn areas.

Sentences falling into the New category state facts and recent findings without including novelty-related vocabulary. For instance: A regulatory relationship between miR-361-3p and circPOLR1C was confirmed by qRT-PCR, circRNA in vivo precipitation, RIP, FISH, CircInteractome database, dual-luciferase reporter assay, and immunohistochemistry.

The Report category looks for factual statements in the literature, methods used, statistics, and benchmarking reports. Example: Similar results were seen with candida; all milled ZNPs inhibited C. albicans, followed by C. tropicalis, whereas C. knisei was resistant to all ZNP sizes.

The final class identifies all other sentences. For instance: Circulation of seasonal influenza is the product of a complex interplay among multiple drivers, yet characterizing the underlying mechanism remains challenging.

SciBite Search to surface NER and ML derived annotations!?

But how do we ensure that these invaluable annotations reach those who need them most? Enter SciBite Search, our cutting-edge platform designed to transform the way knowledge is accessed and analyzed. With SciBite Search, users can seamlessly navigate through diverse data sources, all enhanced by our sophisticated Named Entity Recognition (NER) technology. By complementing our NER annotations with those captured by ML models (in this instance the Novelty model), users can not only identify documents capturing key entities, but also home in on sentences that capture these entities AND are indicative of novelty (Figure 2)! SciBite Search empowers users to easily access curated content and unearth potentially new discoveries, paving the way for unprecedented insights and exploration in the life sciences.

Blog_Novelty In Life Science_Figure 2

Figure 2: SciBite Search empower users via ontologies, NER, ML models, semantic search, and advanced filters.

Time is of the essence but what else?

Time plays a pivotal role in defining the novelty and relevance of scientific discoveries. What may be considered groundbreaking today could very well become standard knowledge in five years. Recognizing that the significance of scientific findings can evolve over time, our system, SciBite Search, enables users to apply a variety of functional and semantic filters, including date – ensuring that our assessments are not only contextually relevant but also dynamically aligned with the shifting paradigms of scientific research. Moreover, SciBite Search offers a feature to set up notifications for updates on specific queries, ensuring that users are always in the loop with the latest discoveries and advancements as they are integrated into the tool!

Blog_Novelty In Life Science_Figure 3

Figure 3: Novelty search in combination with SciBite Search capabilities.

Blog_Novelty In Life Science_Figure 4

Figure 4: Look for recently published scientific reports discussing new results.

The annotation with ML models is by no means limited to the novelty model… let’s take eligibility criteria for example, where negation is a vital component – there is a model for that. (Figure 5).

Blog_Novelty In Life Science_Figure 5

Figure 5: Adding Negated sentence to the search.


In the fast-paced world of life science, SciBite Search stands out as an effective tool to deliver the combination of annotations of known entities via or rich set of manually curated VOCabs whilst also presenting annotations derived from bespoke machine learning models. We have demonstrated how one of these models can be used to help offer a shortcut to groundbreaking discoveries, enabling users to stay at the forefront of innovation. This approach not only saves precious time but also revolutionizes our exploration of life sciences, inviting us to embrace the unknown and unlock new realms of knowledge.


About Zahra Hosseini

Senior Data Scientist, SciBite

Zahra Hosseini, Senior Data Scientist. Holds a Ph.D. in machine learning from the Science and Research University of Tehran, focusing on Natural Language Processing (NLP) and knowledge discovery. She was an Assistant professor at Azad University of Isfahan for 7 years before switching to Industry. She has been with SciBite since 2021 as a part of the data science team.

View LinkedIn profile

Other articles by Zahra

1. [Blog] How SciBite and Elsevier manage KOL identification read more.
2. [Blog] How SciBite technology can facilitate gene-disease relationship extraction read more.

Related articles

  1. SciBite announces the release of SciBite Search 2.0

    In this blog we announce the v2.0 release of SciBite Search, our intelligent scientific search platform. We’ve expanded our Elsevier data connectivity, broadening the sources you can load and search, as well as a host of features that improve the user experience.

  2. FAIR as a means to get more value from your data

    Image and link to LinkedIn profile of blog author Jane Lomax

    In this blog, we’ll explore a selection of the many ways organizations can leverage the rapid developments in data discovery, machine learning, and data mining to release value from this asset.


How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us