AI-based chat application for life sciences [Part 3]: Design decisions

SciBite / News / AI-based chat application for life sciences [Part 3]: Design decisions

We’re all set with the key components – Large Language Models (LLMs) and ontologies. Our goal is to integrate them into an ontology-based Retrieval augmented generation (RAG) system, selecting the perfect junctions to combine both strengths for our users’ ultimate benefit.

LLMs excel in interpreting human language and generating and summarizing text. On the other hand, ontologies ensure uniformity in scientific language across different systems by using consistent vocabularies.

As mentioned in Part II, using ontology-based enrichment in natural language questions aids us in extracting scientific concepts and understanding the context of the question. Indexing the data with the same ontologies facilitates efficient information retrieval.

Ontologies-based information retrieval

The foundational design choice made by the SciBite team was to establish a robust information retrieval system. Using ontology-based, expert-curated, and life-sciences-optimized vocabularies for entity recognition, SciBite is well-equipped to efficiently and accurately extract scientific concepts. These extracted concepts are then used as indices for a precise search mechanism. The information retrieval system developed by SciBite includes the following features:

The system employs elastic search, a mature technology with lower operational costs compared to running a vector search or fine-tuning LLMs.
It provides flexible deployment options, allowing SciBite to implement the system as a SaaS or at customer premises.
The system is equipped with connectors and parsers for various data sources, whether public or proprietary, external or internal to your network. Examples include Embase, Science Direct, Medline, PMC, clinicaltrials.gov, SharePoint, S3, sFTP, and more (a separate license may be required for certain data sources). It also possesses the functionality to build pipelines for regular updates, ensuring the breadth and currency of available data.
Document-level security is implemented, ensuring that users can only access data they are authorized to view. If a user doesn’t have access to a specific document, the answers generated will not include information from those documents.
The system comes with a dedicated user interface. This allows users to delve into the answer-generation process, examine the underlying documents, and independently engage with the information retrieval mechanism using further filters and facets.
In line with SciBite’s design philosophy, the system is built with an “API-first” approach. This means it’s designed to integrate with existing systems easily or can be used to support the creation of a custom interface.

Structuring the natural language question

The second design decision focuses on semantically parsing the user’s natural language question to identify user intent and establish a scientific context.
The SciBite team harnesses the combined strengths of LLMs and SciBite’s award-winning scientific named entity recognition (NER) engine to recognize scientific entities and transform this query into a structured format. This approach offers several benefits:

The use of structured queries improves consistency in results. This means that every time the same structured query is used, it will consistently return the same set of candidate documents, thus enhancing the reproducibility of the question-answering process.
The structured query is displayed in the interface. Users can identify which part of their query is recognized as a scientific entity and match it against potential documents when generating answers, adding a level of transparency to the process.

Explainability in answer generation

The third design decision capitalizes on the benefits of the ontology-based retrieval mechanism. The algorithms searching for matching scientific entities between the question and documents can provide a list of potential documents and pinpoint the exact snippets that match the query, along with the scientific entities in question that correspond with individual sentences.

These evidence snippets, identified by the search algorithm, are then used to generate an answer using LLMs’ language generation capabilities. Additionally, the matching scientific entities provide a rationale to the user, explaining why the application believes these documents answer their question.

How did we meet the key considerations?

To recap, here’s how our design decisions help us meet these requirements:

Accuracy: The use of ontology-based retrieval with high-quality, expert-curated, NER-optimized vocabularies ensures precise results.
Provenance: The RAG system provides a traceable source of information.
Transparency: The use of a structured query and explainability in answer generation provides clear insight into the process.
Domain Expertise: Ontology-based vocabularies, specifically tailored for life sciences and curated by SciBite experts, provide deep domain knowledge.
Dynamic Source Selection: The flexibility of the underlying information retrieval system, with its multiple connectors and parsers, allows for dynamic source selection.
Security & Privacy: Document-level security provided by the underlying information retrieval system ensures data safety and privacy.
Operational Efficiency: The implementation of an ontology-based retrieval mechanism using mature search technologies guarantees operational efficiency.

Whether you’re seeking an application to equip your researchers with the ability to uncover scientific insights using natural language or on a journey to build your own application for scientific questions and answers, stay tuned. In the next part, we’ll discuss how SciBite technology can assist you.

Harpreet Singh Riat

Director of Technical Sales, SciBite

Harpreet is the Director of Technical Sales at SciBite, a leading data-first, semantic analytics software company. With a strong background in data management and analytics, Harpreet has played a vital role in assisting numerous organizations in implementing knowledge graphs, from data preparation to visualization to gaining insights.

Other articles by Harpreet:

AI-based chat application for Life Sciences: Part I key considerations: read more
AI-based chat application for Life Sciences: Part II role of ontologies; read more
AI-based chat application for Life Sciences: Part III design decisions; read more
Utilising the power of LLMs and ontologies in life sciences; watch webinar

Share this article

Relevant resources, events and news

https://scibite.com/knowledge-hub/news/ai-based-chat-application-for-life-sciences/ thumbnail image

News AI-based chat application for life sciences [Part 1]: Key considerations

Are your teams now posing potentially confidential questions to consumer tools such as Bard and ChatGPT, relying on their responses?