SciBite AI in depth

An artificial Intelligence (AI) platform combining deep learning with powerful semantic algorithms to enable our customers to exploit life science data and accelerate its downstream use in research and development

Within the pharmaceutical industry, the combination of Artificial Intelligence (AI) and big data is triggering a revolution across the entire drug development lifecycle, from the way new drugs and treatments are discovered, to identifying opportunities to re-purpose those already in the market.

For a sector that has seen the cost of bringing a new drug to market rise from $1.2bn to $2bn in the last ten years, and its return on investment drop from 10% to under 2% over the same period, AI has the potential to deliver unprecedented productivity improvements and drive better outcomes for both pharmaceutical companies and patients.

The challenges posed by big data were well articulated in Doug Laney’s benchmark definition (the so-called 5 V’s):-

  • Volume: The quantity of data produced across all sources.
  • Velocity: The speed at which new data are created, collected, and analysed.
  • Variety: The different types of data being created, including structured data sets, semi-structured data, and unstructured text.
  • Veracity: The unpredictability of the data collected. Is it of good quality? Did it come from a trusted source?
  • Value: The worth of the data being collected.

Within the pharmaceutical and healthcare sectors, big data represents an even greater hurdle, as approximately 80% of clinical data is stored as unstructured text.  AI techniques such as text mining and Natural Language Processing (NLP) are therefore required to identify concepts, entities, and relationships within the document corpus.

While the volume and variety of big data represent a major technical challenge to any pharmaceutical organization, the payoffs are also substantial: enabling patterns and trends to be identified to inform decision-making at all stages of the drug development process.


SciBite’s Artificial Intelligence platform (SciBite AI) combines deep learning with our powerful semantic algorithms to enable customers to exploit life science data and accelerate its downstream use in research and development.

Implemented as a server-based application and deployed via Docker, SciBite AI enables users to rapidly load and run deep learning models.

SciBite AI’s application programming interface (API) provides customers with a simple, consistent interface for both users and applications, insulating them from the complexities of the underlying implementation.

Building on SciBite’s wealth of experience in data preparation and standards mapping, we also offer consulting services to help select, train, and test machine learning models for specific use cases.

Download the SciBite AI datasheet or SciBite AI use case to learn more.

Download datasheet     Download Use Case

Cornerstone components

SciBite AI provides a framework for leveraging AI and deep learning models alongside our award-winning semantic technologies to unlock insights into your data.

Making data AI-ready (SciBite AI .prepare)

If your data is bad, your machine learning tools are useless…” (Harvard Business Review).

Even today, 80% or more of an organization’s data is held in unstructured text such as Word documents, PowerPoint slides, and PDFs. This is also true of external data sources such as patents, blogs, clinical notes, call center scripts, literature databases, and the growing body of experimental data typically entered via online forms or electronic laboratory notebooks (ELNs).

SciBite’s standards-based semantic tools enable Findable Accessible Interoperable Reusable (FAIR) data across the entire enterprise, a crucial pre-requisite to obtaining the high-quality training data required by machine learning models.

Our powerful ontology management builds on the FAIR data approach, turning “strings into things” and delivering a dataset capable of sophisticated operations such as synonym independent searches (e.g., Viagra or Sildenafil), ontology searches (e.g., “Find projects on Kinases…”) and connection searches (“Drugs that reduce inflammation…”).

This next-generation ontology-based FAIR data is the essential bedrock for AI in all its forms.

Training machine learning models (SciBite AI .model)

Among the deep learning models employed within biomedicine, three of the most important are named-entity recognition (NER), semantic relationship extraction, and question-answering based on semantic structures.

At SciBite, we have in-depth experience in building all three of these models, and our consultancy service offers you the opportunity to work with our experts in creating, refining, and deploying sophisticated deep-learning models for your project.

With first-hand experience with industry-leading models such as BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), LSTM (Long Short-Term Memory), and Word2vec, we can help you select the right algorithm for your data.

Our practical experience within life sciences means we can also assist in planning and costing your project – right down to calculating the number of training samples required to prepare a deep learning model for a specific application.

The models are able to detect context-specific relationships within the text and disambiguate simple ‘mentions’ of entities from a sentence that is asserting a statement. For instance, protein-protein interactions appear in sentences where two proteins are mentioned, however, not all sentences with two proteins in them are interactions. To determine whether a relationship is actually being described requires the actual understanding of the sentence. The following shows an example of a protein interaction with complicated language that SciBite AI can identify:-

That NPM1 promotes PEDV growth is due to N protein inhibition of caspase-3-mediated cleavage of NPM1, which prevents proteolytic cleavage of NPM1 and enhances host cell survival.

This is opposed to examples of text where proteins are mentioned but are not describing interactions and were not identified by the SciBite AI PPI model:-

In order to identify cellular RNAs that stimulate mutant MDA5, Ahmad et al. recently described an RNase protection assay where total RNA extracted from cells is mixed in the test tube with recombinant MDA5 protein bearing a mutation in its helicase domain.

Deploying machine learning models (SciBite AI .deploy)

There are several machine-learning language models now in the public domain, the best-known of these being BERT, BioBERT, ELMo, and Word2vec. While these represent a genuine leap forward in our ability to process natural language, they do not fully address real-world use cases:-

  • They are algorithms, not services, making them cumbersome to install and integrate.
  • The code for machine learning models (e.g., a Python script) can be difficult to maintain and distribute within an organization – a significant constraint as these models change frequently.
  • Machine learning models can be difficult to re-train using internal data, and many organizations struggle to achieve the metrics reported for models trained on public data.
  • To realise their full potential, these models still require domain-specific ontologies at the training, validation, and interpretation stages.

At SciBite, we understand these limitations and recognize that customers need simple, deployable machine-learning services. Our solution, therefore, separates the API from the implementation and does not require labor-intensive python coding.

SciBite AI is a Docker container-based application for serving multiple models via a simple REST API, enabling you to leverage the power of deep learning models across the whole enterprise.

Amazon Sagemaker can also be used to train and deploy machine learning models created using SciBite AI, allowing customers to develop their models within an AWS cloud environment.

Connecting machine learning output (SciBite AI .connect)

SciBite AI offers a powerful REpresentational State Transfer (REST) API for leveraging the power of deep learning models across your enterprise.

The API provides a consistent, easy to use interface that can be quickly adapted to new architectures, and which shields users from implementation issues associated with the underlying machine learning models.

The API is also integrated into TERMite 6.4.

Leverage our experience of deep learning

At SciBite, we have experience in developing and deploying semantic deep learning models that perform a wide variety of functions:-

  • Named Entity Recognition (NER): Identifying concepts not covered by existing vocabularies;
  • Context-Specific Detection: The detection of concepts only in certain contexts. Examples include new vs. pre-existing conditions and the anatomical sites of tutors;
  • Relationship Identification: Identify complex relationships between concepts such as protein-protein interactions, reporting of drug adverse events, etc.; Learn more about SciBite AI relationship extraction models
  • Assisted ontology development: The use of AI to suggest new terms, identify inconsistencies and accelerate ontology development and quality control;
  • Predictors: Spot patterns in data that help predict future outcomes;
  • Clustering and classification: Group documents and concepts based on their underlying data relationships.

Download the SciBite AI datasheet, SciBite AI use case or contact the team to learn more about SciBite AI.

Download datasheet     Download use case

How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us