Departmental Search

The challenge of dealing with the plethora of data sources within pharmaceutical departments

Much of the knowledge found within pharmaceutical departments or cross-functional teams is contained in the plethora of documents, reports and emails that they produce and the scientific articles they have downloaded.

The unstructured nature of these files, the range of formats used and the fact that they are typically spread across different locations limits the ability to mine them for useful information. Even where such documents are organised in a file store or structured in a document management system, the accompanying search capabilities are limited to exact matches of what was written by the document author.

Similarly, inconsistent use of synonyms during data entry makes it difficult to identify and collate all relevant data for a disease or target of interest. For example, a document describing work on “muscarinic acetylcholine receptor M1” will not be found by anyone searching for the commonly used synonym “cholinergic receptor muscarinic 1”.

Even for more defined entries, the meaning of a field or its contents may be ambiguous, imprecise or contain multiple different data types, such as “gene”, “assay type” and “species”.

SciBite provides scientific teams with the opportunity to semantically enrich their documents, opening up new possibilities to mine the data more effectively and derive valuable insights.

Semantic Enrichment of Scientific Documents

SciBite can handle a wide range of file formats including emails, Word documents, PowerPoint presentations, CSV files and PDFs – including batch loading of zip files. This process can be automated by polling a location for new content.

At the core of SciBite’s platform are established vocabularies which apply an explicit, unique meaning and description to scientific terms. This enables complex experimental text to be contextualised so that it can be understood and used as high quality, actionable data – irrespective of its source.

Standard reference vocabularies can be augmented with proprietary information, such as project codes and IDs used to track materials such as compounds and cell lines.

SciBite’s tools generate a semantic index which transforms unstructured experimental text (including supporting files such Word documents, PowerPoint presentations and PDFs) into a structure that can be queried via a simple user interface. These queries can be used to provide answers to questions that would otherwise require time-consuming, error-prone manual aggregation.

As illustrated below, this semantically enriched data can be discovered using SciBite’s built-in user interface or a 3rd-party search and visualisation tool such as Spotfire.

Semantic Search

Benefits of Semantic Enrichment

Most documents repositories only have rudimentary search capabilities. For example, a search of a typical document store for the Alzheimer’s related gene, PSEN1, would miss references to synonyms such as Presenilin-1, AD3 and PSNL1.

Departmental Search

Semantic enrichment ensures that all relevant data is found, regardless of which synonym is used as the search term. SciBite not only makes it simpler to interrogate the information managed within internal documents, it also facilitates more complex ontology-based questions, for example:-

  • Find all references to project ABC-101, regardless of the syntax used by the author (e.g. ABC-101, ABC101 and ABC 101);
  • Find all experiments that reference a compound of interest used in combination with one or more other compounds of interest;
  • Find all experiments for a specific target across the organisation, regardless of which synonym was used by the author of the experiment;
  • Which projects are investigating potential biological therapeutics?
  • Which targets have we studied that are associated with inflammatory disorders?
  • Which diseases have we studied for both a target of interest and other targets in the same class and what were the outcomes?
  • Which pre-clinical studies have utilised a specified mouse model?
  • Which experimental techniques are growing across the organisation and would benefit from a core facility?

SciBite accurately marks-up all relevant terms and concepts within a document, enabling scientists to rapidly identify the topics covered in an experiment, easily interpret the text and thereby gain an understanding of what a document is about.

SciBite enables users to perform searches for terms that co-occur within a sentence or within a document. For example, new avenues for research can be revealed by generating a list of genes that are mentioned most frequently with a disease of interest.

A summary of all the content from one or more projects or studies can be provided. This information can be presented to users as Spotfire dashboards and Linkurious network views to deliver insight into what a study is about, without having to read all the associated documents.

Use Case

Unlock the Full Potential of Departmental Scientific Documents

Read more

How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us