We understand that sometimes terminology can be ambiguous, particularly when it's technical, so we've put together this resource to ensure we're on the same page.


Application Programming Interface (API)

Big data is a term used to refer to data sets that are too large or complex for traditional data-processing application software to adequately deal with.

Artificial intelligence

Artificial intelligence (AI) is the ability of a computer to perform intellectual processes commonly associated with human beings, such as the ability to reason, discover meaning, generalize, or learn from experience.


Big data

Big data refers to data sets that are too large or complex for traditional data-processing applications to manage. Big data is typically described using four attributes: Volume (the scale of the data), Velocity (the frequency at which data needs to be processed), Variety (the range of different data types) and Veracity (the uncertainty or trustworthiness of data). A fifth attribute is also sometimes used to describe big data: it’s Value, i.e. will the data help deliver valuable insights.


Drug repositioning

The goal of drug repositioning is to discover new uses for drugs to treat clinical indications other than those for which they were originally intended.



Apache Hadoop is an open source framework that manages data processing and storage for big data applications running in clustered systems.

Heterogeneous data

Data that lacks standardisation or consistency and has a high degree of variability in terms of both the type and format. This variety data is often a barrier to integrating disparate data sources.


Keyword search

Keyword searches are limited to exact matches of what was written by the author, resulting in high risk that something important will be missed.

Machine learning

Machine learning is an application artificial intelligence (AI) based on the premise that systems can learn, improve and make predictions from experience without being explicitly programmed.


Named entity recognition

Named entity recognition (NER) is the ability to identify and extract relevant terms found in scientific text. NER transforms unstructured content into rich, machine-readable data.



Ontologies contain the terms associated with a domain and encapsulate a common model of knowledge associated of that domain. Ontologies are organised as a hierarchy to represent the relationships between scientifically-related terms, such as ‘inflammatory diseases’ or ‘DNA binding proteins’.



Pharmacovigilance, also known as drug safety, relates to the detection, assessment, monitoring and prevention of adverse effects associated with pharmaceutical products.


Semantic enrichment

The process of semantic enrichment applies an explicit meaning and description to all scientific terms found within a database or document. It enables complex scientific text to be contextualised so that it can be understood and used as high quality, actionable data, irrespective of its source.

Semantic patterns

Semantic patterns describe a relationship between two concepts, such as a gene and drug, in the form Gene-Verb-Drug.

Semantic search

Semantic search enables the natural language used in scientific text to be queried by understanding the intent of the searcher, the context of the query and the relationship between words in query.

Semi structured data

Data that doesn’t reside in a structured database but that can be organised to some degree to make it simpler to find and analyse. Examples include file types such as Excel spreadsheets, XML and JSON. NoSQL databases are considered as semi structured.

Structured data

Data that has been organized into a formatted repository, typically a relational database or data warehouse, so that it is accessible for downstream processing and analysis.


Synonyms are alternative terms used to describe the same thing, such as ‘heart attack’ and ‘myocardial infarction’. This ambiguity presents a challenge for computational techniques.


TExpress Bundles

TExpress Bundles enable multiple semantic patterns that encompass different ways of describing the same thing to be aggregated and run across the same data simultaneously.


Unstructured data

Data that doesn’t fit neatly in a database is considered to be unstructured. Examples of unstructured data include scientific articles, word processing documents, presentations, images, emails, web pages and blogs. Unstructured data presents a problem to many computational techniques yet around 80% of data is considered unstructured. Hence, the ability to mine unstructured data has the potential to deliver enormous business benefits.

How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us