First and foremost, researchers need access to a corpus of scientific literature and secondly, they need a robust and reproducible way to search that content. Through partnership, SciBite and Copyright Clearance Center bring together scientific content and advanced search capabilities.
Scientific output in the form of peer reviewed journals, conference abstracts and book chapters is dramatically increasing. The complexity of this literature is also growing, with fast moving research areas, exemplified by the recent COVID-19 pandemic. A consensus on vocabulary in these new research areas may not have been established, presenting real challenges for search and analysis. Building custom ontologies to support search using CENtree, SciBite’s ontology management tool was covered in our webinar. The recent reduction in personal interaction between scientists has further exacerbated the reliance on scientific content to stay up to date.
Researchers face real challenges to stay in touch with their field. They need to ensure their knowledge base is current; however they cannot physically monitor such a huge volume of content being generated. So how can researchers tackle these challenges?
With an ocean of content out there, finding relevant content can be like finding a needle in a haystack. First and foremost, researchers need access to a corpus of scientific literature and secondly, they need a robust and reproducible way to search that content. Through partnership, SciBite and Copyright Clearance Center bring together scientific content and advanced search capabilities.
SciBite takes a rule-based approach to semantic enrichment, the process by which key scientific concepts are identified within a body of text. Starting with public ontologies as a base, SciBite’s VOCabs are optimized for named entity recognition, involving extensively expansion of the supported synonyms through a combination of manual annotation and automated synonym expansion. Together with TERMite, SciBite’s named entity recognition (NER) tool, key entities are identified and tagged within the scientific literature and a specific identifier is assigned to that concept, and SciBite supports around 100 scientific concepts. From a researcher’s perspective, this means that they return consistent and inclusive results, independently of how they searched or which vocabulary the author adopted.
Having content semantically enriched unlocks a host of additional benefits that researchers can use to zero in on related information. Semantic concepts such as gene, indication or drug can be leveraged for document clustering, identifying research articles with related concepts. These concepts can also be utilized for trend analysis, looking for the frequency with which these concepts occur in the literature. Furthermore, these concepts can also be used to identify key opinion leaders (KOLs), collaborators or competitors who publish on these concepts.
In addition to improving search recall, SciBite enables researchers to improve the precision of search, increasing the chances of finding relevant content. An example of this is sentence level searching, where entities need to be matched within the same sentence, rather than simply being mentioned in the same document. This is particularly relevant in the life sciences, where list of genes are often published within a journal article. What researchers are really interested in are the connections between a particular gene and another entity, such as indication. These relationships are much more likely if gene and indication occur in the same sentence than simply in the same document.
Identifying relevant content is the first step on the ladder; however, for time-strapped researchers, they need to get straight to the part of the document that relates to their query. Doing so enables researchers to quickly discount an article or flag it for further review. SciBite enables entities to be highlighted within a document and provides a semantic search solution in the form of SciBite Search. Copyright Clearance Center exploit SciBite’s semantic search capabilities in a number of their own solutions, providing customers with a powerful search experience over a broad range of scientific content.
Snippets of documents returned by search can be used to further contextualize the search hit. In many cases, this will provide the researcher with the information that they are looking for, e.g., gene x is upregulated in disease y. If these relationships are of particular interest to researchers, SciBite offers another tool to refine the search process. TExpress is SciBite’s relationship extraction capability. Patterns of entities can be developed to pull out specific relationships between entities, and these are fully customizable. As with the scientific entities, these relationships can also be highlighted within the document, contextualising where the match occurred.
While the volume and complexity of scientific content has dramatically increased over the last decade, so has the technology for searching and extracting knowledge from this content. This has enabled researchers to better cope with the escalation in scientific output and focus their efforts on the content relevant to their work.
We partner with leading enterprise search platforms to enhance real-time big data analytics for pharma and biotech companies. Semantic search capabilities improve the accuracy of search results allowing companies to make data-informed decisions. Find out more about how SciBite’s solutions can help unlock the potential of the R&D data in your business.
Sam leads partnerships and alliances at SciBite, working collaboratively with existing partners and developing new partnerships aligned to SciBite’s strategic goals. He has a strong technical background in the life sciences, with a PhD in Protein Biochemistry from the University of Nottingham and post-doctoral training in bioinformatics within the department of Neurosurgery at the University of California San Francisco.
Prior to Joining SciBite he held technical sales and commercial roles at Carl Zeiss and most recently led business development at Repositive, building relationships with contract research organisations, biotech’s and pharma companies, facilitating data exchange and search across multiomic datasets. He has a good grasp of the challenges of dealing with unstructured scientific data, and collaboratively developing practical solutions to overcome these.