AI in Life Sciences: Insights into recent industrial events

SciBite / News / AI (and other stuff!) in life sciences: Insights into recent industrial events

AI (and other stuff!) in life sciences: Insights into recent industrial events

We have been busy these last few weeks at SciBite!

From April 15th to 17th, we immersed ourselves in the esteemed Bio-IT World conference held in Boston. Not only did we attend and present, but we also proudly launched our newest offering, SciBite Chat. As always, it was a wonderful and tiring event!

Next up, on April 23rd, we had the privilege of hosting a workshop in London on behalf of the renowned Pistoia Alliance. The focus was on “Democratising Data in the Life Sciences: the role of LLMs.” This event provided a platform for sharing insights, addressing common issues, and fostering non-competitive collaborations that benefit all. If you’re not already part of the Alliance, it’s worth considering!

Finally, the following day, we returned to the main Spring Pistoia Conference to deliver another presentation. These events serve as invaluable opportunities to reconnect with our customers, colleagues, and friends while staying abreast of the latest advancements in our field.

Although each event had distinct objectives, we noticed several common themes resonating throughout. Below, we would like to provide a whistle-stop tour of some of these.

AI is everywhere

Coming as a surprise to… well no one! AI was undeniably the talk of the town at recent events, with GenAI taking center stage. It was fascinating to learn about industry efforts to facilitate seamless access to AI technology, particularly Large Language Models (LLMs), within enterprises. Brian Martin of Abbvie and Xiaoying Xu of J&J deserve specific mentions after providing great insight into their respective organizations’ plans on this as part of Bio-IT.

Despite some immediate wins in the pragmatic application of LLMs to search and accessibility, the current rollout of GenAI is still struggling to live up to its silver bullet expectations. This is supported by the Gartner hype cycle that predicts hitting the trough of disillusionment soon enough. Nevertheless, there have been clear and notable successes, particularly in scenarios where human expertise is involved, such as regulatory or clinical document generation. Challenges remain regarding reproducibility due to the stochastic nature of LLMs.

Additionally, LLMs have demonstrated their prowess in translating text to syntax X, aiding in code generation, and breaking down data siloes via removing technical debt associated with learning yet ANOTHER query language! In the search and prediction space, the importance of grounding LLMs, in fact, is well accepted, and the field is clearly well-versed with RAG architectures.

Although their limitations when used in isolation for the information retrieval step in RAG systems are well understood, the ‘magic’ of vectors, albeit with more domain-specific embeddings models, appears to be widely utilized.

SciBite colleague at the Pistoia Alliance conference

Another interesting thread was the role of alternative architectures, where general-purpose LLMs may find their place in orchestrating workflows as part of autonomized agents. This could lead to a shift towards smaller, customized models, with an emphasis on routing and other less glamorous components, which was discussed in detail during the aforementioned Workshop.

While the limitations of GenAI are acknowledged, the community remains hopeful for ground-breaking use cases. For instance, in the field of GenChem, there is a desire to apply the same approach that emerged from AlphaFold for target structure prediction to small molecule prediction. However, the lack of evolved data in chemistry presents challenges that still need to be addressed.

Workshop attendees reached a consensus that LLMs contribute to efficiency improvements but not necessarily accuracy. They can assist in generating email blueprints or ontology candidates, yet human expert review remains essential to ensure accuracy; or indeed one can utilize computational representations of domain-specific knowledge to help ground, or steer, a generalized LLM in the right direction… enter ontologies.

Ontologies and Knowledge-graphs

Modeling knowledge in ontologies is not a new idea – the community has been capturing domain-specific knowledge in these standards for decades, and this is showing no sign of slowing!

It was wonderful to see some of the collaborative efforts in place in a variety of Pistoia Alliance projects to continue the ontologies – with presentations touching on:

i) a potential project to develop an ontology to capture metadata for RWD (real-world data) and

ii) a deeper dive into the IDMP ontology (Identification of Medicinal Products), a collaborative industrial effort that was the proud recipient of an Innovative Practises award at Bio-IT.

Furthermore, a notable appetite for manufacturing ontologies appeared in numerous presentations and conversations as part of the Bio-IT conference.

SciBite colleague presenting SciBIte Chat at BioIT

The concept of a Digital Twin, which was expertly walked through by Caroline Chung, MD Anderson, at Bio-IT, shows wonderful potential within medicine and demands a mention. Replicating the idea demonstrated in aerospace in healthcare presents challenges due to the complexity of modeling multiple scales such as genome, cells, organs, and the entire human body, as well as the need for high-quality and real-time data. The application of ontologies and standards is crucial for progress in this area.

The importance of ontologies in generating and modeling knowledge graphs is well known. However, a common topic during events was the interplay between knowledge graphs and LLMs, particularly during a fantastic panel discussion entitled Pharma Knowledge Graphs and LLMs: Antagonistic or Synergistic, chaired by Tom Plasterer of AstraZeneca. The main interplay between the technologies appears to be twofold: i) generating relationships and KG content and ii) simplifying querying through NL-to-syntax conversion.

With the understanding that ontologies and knowledge graphs can be used to capture the knowns, an interesting thread discussing how these may be used in conjunction with the output of LLMs to help distinguish hallucinations and identify candidate novel hypothesis generation emerged. Obviously, one must acknowledge the associated risks with such approaches – ensuring that one can disprove and fail fast for those who are going nowhere.

A typical application of LLMs in the context of KG was lowering the barrier of entry, supporting the concept of data democratization. However, one obvious thing, or if it is not, it should be – we cannot lose focus on the data. There is no point in lowering the barrier of entry to rubbish data…

Data still rules supreme

The wonderful presentation by Dame Janet Thornton from the EBI at Pistoia highlighted the importance of quality data. Dame Janet touched on some of the limitations of AI within the Life Sciences, including data quality and a lack of standards (as well as confidentiality and ethical barriers). Her advice? …begin with data.

Furthermore, the importance of quality and FAIR data was covered in the Data Readiness for AI panel, led by Santha Ramakrishnan of Bayer. A direct quote from that discussion was a want to ‘increase the value we get out of data ‘and that, to do so, scientists should play a more central role in data management processes.

Image data, particularly in pathology, came up throughout the Bio-IT conference as well an increasing need to support multi-modal data, particularly text and images; others such as audio, video, 3D, and IoT received less attention.

As we navigate limitations and explore new frontiers, AI continues to shape the future of the Life Sciences industry, and the requirement for quality data to train these models persists. For areas with sparse regions, the concept of synthetic data persists as a noteworthy topic, utilizing LLMs to generate data for training smaller, more bespoke models for business tasks.

Lastly, the notion of a Data Product gained significant traction. These self-contained data solutions directly address business challenges and have the potential for monetization.

Launch of SciBite Chat

As part of Bio-IT World, we also launched SciBite Chat – our conversational chatbot that sits atop SciBite Search and enables users to have complex natural language-based conversations with their data.

SciBite Chat is a tool that builds on the foundational semantic search tool, SciBite Search. SciBite augmented ontologies are used to enrich textual data enabling human explainable, and repeatable, information retrieval as part of a RAG-based system.

By combining quality semantic data with conversational chat, SciBite chat supports evidence-based decision-making by providing accurate, transparent, and flexible solutions to answering complex scientific questions!

Click on image to enlarge

Conclude

While LLMs offer immense potential, their value is still to be exercised across the industry. With a focus on getting LLMs into the hands of all, caution is needed to avoid, ironically, stifling innovation by relying solely on them as THE solution looking for problems. Rather, they should be treated like any other ‘tool‘ and utilized as and when problems require them… and it makes sense!

It is encouraging to see the community recognize this to some extent and consider how LLMs (and smallLMs) can play roles in larger architectures designed to address specific needs. The potential of generalized LLMs, for example, in orchestrating workflows for autonomous agents (the natural progression from interactive chatbots) potentially calling out or interacting with smaller, customized models as well as other tools and FAIR data siloes.

The value that LLMs bring right now sits in the democratization of data and the lowering of entry barriers brought about by complex query syntaxes. This is a great, pragmatic, and sensical application of the tech. However, we must maintain a focus on ensuring that the data we democratize is of high quality; reducing barriers to poor data serves no purpose.

To conclude, finding the right balance between AI capabilities and human expertise is the key to unlocking the full potential of AI in various domains, and still, as it always has, relies on…. quality data!

Joe Mullen

Director of Data Science & Professional Services, SciBite

Leading SciBite’s data science and professional services team, Joe is dedicated to helping customers unlock the full potential of their data using SciBite’s semantic stack. Spearheading R&D initiatives within the team and pushing the boundaries of the possible. Joe’s expertise is rooted in a PhD from Newcastle University, focussing on novel computational approaches to drug repositioning; building atop semantic data integration, knowledge graph & data mining.

Since joining SciBite in 2017, Joe has been enthused by the rapid advancements in technology, particularly within AI. Recognizing the immense potential of AI, Joe combines this cutting-edge technology with SciBite’s core technologies to craft tailored, bespoke solutions that cater to diverse customer needs.

Other articles by Joe

What is agentic AI and is there a role for ontologies? read more
Are ontologies still relevant in the age of LLMs? read more
What is Retrieval Augmented Generation, and why is the data you feed it so important? read more
Large language models (LLMs) and search; it’s a FAIR game, read more
Revolutionizing Life Sciences: The incredible impact of AI in Life Science [Part 1], read more
Why use your ontology management platform as a central ontology server? read more

Share this article

Relevant resources, events and news

https://scibite.com/knowledge-hub/news/ai-based-chat-application-for-life-sciences/ thumbnail image

News AI-based chat application for life sciences [Part 1]: Key considerations

Are your teams now posing potentially confidential questions to consumer tools such as Bard and ChatGPT, relying on their responses?