Scientific output in the form of peer reviewed journals, conference abstracts and book chapters is dramatically increasing. The complexity of this literature is also growing, with fast moving research areas, exemplified by the recent COVID-19 pandemic. A consensus on vocabulary in these new research areas may not have been established, presenting real challenges for search and analysis. Building custom ontologies to support search using CENtree, SciBite’s ontology management tool was covered in our recent webinar. The recent reduction in personal interaction between scientists has further exacerbated the reliance on scientific content to stay up to date.
Researchers face real challenges to stay in touch with their field. They need to ensure their knowledge base is current; however they cannot physically monitor such a huge volume of content being generated. So how can researchers tackle these challenges?
With an ocean of content out there, finding relevant content can be like finding a needle in a haystack. First and foremost, researchers need access to a corpus of scientific literature and secondly, they need a robust and reproducible way to search that content. Through partnership, SciBite and Copyright Clearance Center bring together scientific content and advanced search capabilities.
SciBite takes a rule-based approach to semantic enrichment, the process by which key scientific concepts are identified within a body of text. Starting with public ontologies as a base, SciBite’s VOCabs are optimized for named entity recognition, involving extensively expansion of the supported synonyms through a combination of manual annotation and automated synonym expansion. Together with TERMite, SciBite’s named entity recognition (NER) tool, key entities are identified and tagged within the scientific literature and a specific identifier is assigned to that concept, and SciBite supports around 100 scientific concepts. From a researcher’s perspective, this means that they return consistent and inclusive results, independently of how they searched or which vocabulary the author adopted.
Having content semantically enriched unlocks a host of additional benefits that researchers can use to zero in on related information. Semantic concepts such as gene, indication or drug can be leveraged for document clustering, identifying research articles with related concepts. These concepts can also be utilized for trend analysis, looking for the frequency with which these concepts occur in the literature. Furthermore, these concepts can also be used to identify key opinion leaders (KOLs), collaborators or competitors who publish on these concepts.
In addition to improving search recall, SciBite enables researchers to improve the precision of search, increasing the chances of finding relevant content. An example of this is sentence level searching, where entities need to be matched within the same sentence, rather than simply being mentioned in the same document. This is particularly relevant in the life sciences, where list of genes are often published within a journal article. What researchers are really interested in are the connections between a particular gene and another entity, such as indication. These relationships are much more likely if gene and indication occur in the same sentence than simply in the same document.
Identifying relevant content is the first step on the ladder; however, for time-strapped researchers, they need to get straight to the part of the document that relates to their query. Doing so enables researchers to quickly discount an article or flag it for further review. SciBite enables entities to be highlighted within a document and provides a semantic search solution in the form of SciBite Search. Copyright Clearance Center exploit SciBite’s semantic search capabilities in a number of their own solutions, providing customers with a powerful search experience over a broad range of scientific content.
Snippets of documents returned by search can be used to further contextualize the search hit. In many cases, this will provide the researcher with the information that they are looking for, e.g., gene x is upregulated in disease y. If these relationships are of particular interest to researchers, SciBite offers another tool to refine the search process. TExpress is SciBite’s relationship extraction capability. Patterns of entities can be developed to pull out specific relationships between entities, and these are fully customizable. As with the scientific entities, these relationships can also be highlighted within the document, contextualising where the match occurred.
While the volume and complexity of scientific content has dramatically increased over the last decade, so has the technology for searching and extracting knowledge from this content. This has enabled researchers to better cope with the escalation in scientific output and focus their efforts on the content relevant to their work.
SciBite works closely with Copyright Clearance Center to bring the power of semantic enrichment to scholarly information. Interested in learning more?
In this blog, we delve into how we applied novel machine learning and curation methods to Japanese language literature, techniques we believe are transferable to other under-supported languages.Read
In the first of this two-part blog, I describe what ontologies are and how you can use them to make the best use of scientific data within your organisation.Read
Get in touch with us to find out how we can transform your data
© Copyright © 2024 Elsevier Ltd., its licensors, and contributors. All rights are reserved, including those for text and data mining, AI training, and similar technologies.