Use Cases

Discover how SciBite’s powerful solutions are supporting scientists and researchers.

Use Cases Overview

Gartner report

Gartner® The Pillars of a Successful Artificial Intelligence Strategy

Access report

Knowledge Hub

Explore expert insights, articles, and thought leadership on scientific data challenges.

Knowledge Hub

Resources

Discover our whitepapers, spec sheets, and webinars for in-depth product knowledge.

Resources

Events

Join us at upcoming events and webinars to learn more about SciBite solutions.

Events

News

Stay informed with the latest SciBite updates, announcements, and industry news.

News

About SciBite

Explore SciBite’s full suite of solutions to unlock the potential of your data.

Discover more about us

Our Partners

We build powerful partnerships with world-leading organizations.

Our Partners

AI-based chat application for life sciences [Part 1]: Key considerations

Are your teams now posing potentially confidential questions to consumer tools such as Bard and ChatGPT, relying on their responses? Or have you noticed a slowdown in your research process due to information overload, hindering the ability to swiftly identify critical findings?

Whatever the reason, you’ve acknowledged the pressing need for a dedicated AI-based chat application for your teams. If this scenario resonates with you, allow us to guide you to the next level.

What are the essential requirements that must be met by such an application?

  1. Accuracy: Before your researchers commit to additional resources, it is imperative that the application provides accurate answers to your questions. Particularly in the fields of life sciences, medicine, and clinical domains, reliance on such a system is only possible if it provides accurate and current information. While certain limitations of Large Language Models (LLMs), such as hallucinations and bias, may lead to inaccuracies, can your application effectively address these challenges?
  2. Provenance: Even if the summarized answer is accurate, it may still be insufficient. Can you justify conducting research solely based on the fact that “the LLM said that”? The application should enable you to trace the answers back to the reference documents, whether they are external or internal. Evidence-based decision-making is paramount in life sciences, where wrong decisions can have dire consequences.
  3. Transparency: Let’s push for more. Why settle for just a list of reference documents? The application should provide insights into why it considers the given reference documents relevant to the search and highlight which sections of the documents contain evidence that contributed to the answer.
  4. Domain expertise: The data within the Life Sciences field can be intricate and rife with ambiguities. It’s imperative that the application can navigate this complexity without becoming entangled in subtle nuances, terminologies, abbreviations, and similar intricacies.
  5. Dynamic source selection: Your question might pertain to a competitor, internal research, a patent, or a clinical trial – relying on a single document source may not yield all the answers you seek. To cater to a broader range of users, an application must support a diverse array of data sources and possess the capability to dynamically switch between them based on the nature of the questions being posed.
  6. Security & privacy: Regardless of whether it concerns the confidentiality of the questions posed or the user’s access level to the document housing the answer, the application must uphold data privacy and respect the user’s access permissions.
  7. Operational efficiency: It’s a technology currently in high demand, and one doesn’t require expertise in economics to grasp the substantial operational costs associated with the utilization of generative AI and LLMs, along with their accompanying computing expenses. To ensure a system remains up to date, the application must strike a delicate balance, meeting most of the requirements without depleting all available funds.

In addition to these requirements, any application striving to thrive in today’s world must meet a minimum standard, which is already set quite high, for user experience (UX), performance, and availability.

What do we know about the technology?

Before we attempt to meet these requirements, let’s pause to understand the strengths and weaknesses of LLMs.  LLMs essentially rely on statistical probabilities derived from extensive training data, determining the likelihood of word sequences within sentences. Consequently, if the training data lacks an answer to a query, the model resorts to generating sentences based solely on these statistics, resulting in nonsensical outputs or “hallucinations.”

Moreover, if the training data contains inherent biases, the generated answers are prone to reflecting those biases. Additionally, since the model is trained on data without preserving its sources, it lacks the technical capability to provide source links for generated responses.

Given that the training data for LLMs comprises essentially all text accessible on the internet, maintaining an LLM to incorporate the latest information consistently is an exceedingly costly endeavor.

Nevertheless, LLMs excel in summarizing text, generating content, and interpreting human language.

Can we meet these requirements?

At SciBite, through persistent efforts, our teams conducted rigorous experiments involving various flavors of LLMs, vector-based retrieval, ontologies-based retrieval, and hybrid approaches. We also integrated ontologies enrichment at different stages of the question-answer flow. The culmination of these efforts has resulted in an AI chat application that fulfills all requirements.

As an advantage, the application renders the answer-generation process entirely transparent. It utilizes ontologies to provide clarity on how and why results were identified. It maintains not only a list of relevant documents but also segments of the documents utilized for answer generation, along with an explanation of why it considers them to contain the answers. This stands in contrast to any other system that operates as a black box and lacks the ability to offer this level of transparency.

Use of ontologies also facilitates structuring the natural language question, ensuring reproducibility, and allowing saving or approval.

In the next part, I will explore further how utilization of ontologies for enrichment at various stages addresses gaps in a RAG application and enhances its accuracy, reliability, and efficiency.

Harpreet Singh Riat
Director of Technical Sales, SciBite

Harpreet is the Director of Technical Sales at SciBite, a leading data-first, semantic analytics software company. With a strong background in data management and analytics, Harpreet has played a vital role in assisting numerous organizations in implementing knowledge graphs, from data preparation to visualization to gaining insights.

Other articles by Harpreet:

  1. AI-based chat application for Life Sciences: Part I key considerations: read more
  2. AI-based chat application for Life Sciences: Part II role of ontologies; read more
  3. AI-based chat application for Life Sciences: Part III design decisions; read more
  4. Utilising the power of LLMs and ontologies in life sciences; watch webinar
Share this article
Relevant resources, events and news