TERMinology - SciBite

SciBite / Knowledge Hub / Resources / TERMinology

TERMinology

We understand that sometimes terminology can be ambiguous, particularly when it’s technical, so we’ve put together this resource to ensure we’re on the same page.

A

Algorithm

An algorithm is a systematic, logical approach to solving problems, often implemented in programming languages to perform tasks such as data analysis, pattern recognition, and decision-making.

Application Programming Interface (API)

An application programming interface is a set of defined protocols, tools, and standards that enable different software applications to communicate with each other. APIs provide a structured way to access the functionalities and services of an external system, without needing to understand its internal workings. This abstraction facilitates interoperability, modularity, and the integration of diverse systems.

Artificial Intelligence (AI)

Artificial intelligence is an interdisciplinary field of computer science and engineering dedicated to the development of algorithms and systems that exhibit cognitive functions typically associated with human intelligence. These functions include learning from experience, understanding complex data, reasoning through problems, and making autonomous decisions.

Augmented Generation

Augmented generation is a technique in artificial intelligence where the generative capabilities of models are supplemented with external information or constraints. This can include integrating real-time data, user preferences, or domain-specific knowledge to produce more tailored and effective results. For example, a system generating medical reports uses Augmented Generation by combining AI-generated summaries with patient-specific data from electronic health records (EHRs). This ensures the reports are tailored to each patient’s unique medical history, providing more precise and relevant information.

Go to the top.

B

Big Data

Big data refers to data sets that are too large or complex for traditional data-processing applications to manage. Big data is typically described using four attributes: Volume (the scale of the data), Velocity (the frequency at which data needs to be processed), Variety (the range of different data types) and Veracity (the uncertainty or trustworthiness of data). A fifth attribute is also sometimes used to describe big data: it’s Value, i.e. will the data help deliver valuable insights.

Go to the top.

C

Clinical Trials

A clinical trial is a controlled study conducted on human subjects to evaluate the safety, efficacy, and pharmacokinetics of new medical interventions, including drugs and devices. These trials are conducted in phases (I, II, III, IV) to assess the intervention under various conditions.

Cloud Computing

Cloud computing is a paradigm that enables ubiquitous, on-demand access to a shared pool of configurable computing resources, such as servers, storage, applications, and services. These resources can be rapidly provisioned and released with minimal management effort or service provider interaction, facilitating scalability, flexibility, and cost-efficiency.

Go to the top.

D

Data Curation

Data curation involves the active and ongoing management of data throughout its lifecycle, including collection, annotation, validation, storage, and preservation. This process ensures data quality, accessibility, and reusability for future research and analysis. For example, curators collect and standardize cancer-related data to ensure consistent terminology

Dataset

A dataset is an organized collection of data, typically structured in rows and columns, where each row represents a record, and each column represents a variable.

Drug Repositioning

The goal of drug repositioning is to discover new uses for drugs to treat clinical indications other than those for which they were originally intended.

Go to the top.

E

Embeddings

Embedding is a technique in machine learning that converts data, such as words or sentences, into numerical vectors that capture their meanings and relationships. For instance, the disease “diabetes” and its treatment “insulin” can be represented as vectors in a multi-dimensional space. These vectors would be close to each other, indicating a strong relationship.

Entity Relationship Model (ERM)

An entity relationship model is a conceptual framework used to represent the structure and relationships of entities within a system. It identifies entities and the relationships between them, providing a clear visualization of how various parts of a system interact and depend on each other.

Go to the top.

F

FAIR Principles

The FAIR Principles are a set of guidelines to ensure that data is Findable, Accessible, Interoperable, and Reusable. These principles help organizations manage their data more effectively, making it easier to share and utilize. In life sciences, the FAIR Principles ensure that research data, such as genetic information or clinical studies, are organized and accessible. This supports data sharing and collaboration, enhancing the reproducibility and impact of scientific research.

Go to the top.

G

Generative AI / GenAI

Generative AI or GenAI is a type of artificial intelligence that can create new content, such as text, images, or music, based on patterns it has learned from existing data. This technology can help businesses automate content creation, enhance creativity, and improve customer experiences.

Go to the top.

H

Hallucinations

Hallucinations in large language models (LLMs) refer to instances where the models produce outputs that are fabricated or incorrect, lacking grounding in the training data or real-world knowledge.

Heterogeneous Data

Data that lacks standardization or consistency and has a high degree of variability in terms of both the type and format. This variety data is often a barrier to integrating disparate data sources.

Go to the top.

I

Information Retrieval (IR)

Information retrieval is the process of locating relevant information from vast amounts of data, improving decision-making and efficiency by providing the right information at the right time.

Go to the top.

K

Keyword Search

Keyword search is a technique used to find specific information by entering precise words or phrases into a search system, which then retrieves results that match those terms exactly.

Knowledge Graph (KG)

A knowledge graph structures information into a network of interconnected data points (nodes) and their relationships (edges), providing a clear and intuitive way to access and analyze complex information.

Go to the top.

L

Large Language Models (LLMs)

LLMs are a type of artificial intelligence model, that can recognize and generate human-like text by processing vast amounts of language data, enabling them to assist with tasks like writing, summarizing, translating, and answering questions.

Go to the top.

M

Machine Learning (ML)

Machine learning is an application of artificial intelligence (AI) based on the premise that systems can learn, improve and make predictions from experience without being explicitly programmed.

Go to the top.

N

Named Entity Recognition (NER)

Named entity recognition is the ability to identify and extract relevant terms found in scientific text. NER transforms unstructured content into rich, machine-readable data.

Go to the top.

O

Ontologies

Ontologies contain the terms associated with a domain and encapsulate a common model of knowledge associated with that domain. Ontologies are organised as a hierarchy to represent the relationships between scientifically-related terms, such as ‘inflammatory diseases’ or ‘DNA binding proteins’.

Ontology-Based Information Retrieval (OBIR)

It is a method of searching unstructured or semi-structured text using a structured framework of concepts and relationships, leading to more accurate and relevant results by understanding context and meaning.

Ontology Mapping

Ontology mapping is the process of establishing relationships between entities in different ontologies. This integration allows users to align and reconcile knowledge from various sources, making data more interoperable. For instance, mapping the mouse anatomy (MA) to human anatomy terms within the target National Cancer Institute (NCIt) ontology

OWL (Web Ontology Language)

A language for creating and sharing ontologies. It can be expressed in formats like RDF/XML and Turtle and allows for automated reasoning and the representation of complex relationships in ontologies.

Go to the top.

P

Pharmacovigilance

Pharmacovigilance, also known as drug safety, relates to the detection, assessment, monitoring and prevention of adverse effects associated with pharmaceutical products.

Go to the top.

R

Retrieval Augmented Generation (RAG)

Is a technique where a generative AI model first retrieves relevant documents or pieces of information from a large dataset. It then uses this retrieved information to generate new information. This approach enhances the performance of LLMs by grounding their text generation in specific, relevant data, thereby improving the accuracy and relevance of the output.

Go to the top.

S

Semantic Enrichment

The process of semantic enrichment applies an explicit meaning and description to all scientific terms found within a database or document. It enables complex scientific text to be contextualised so that it can be understood and used as high quality, actionable data, irrespective of its source.

Semantic Patterns

Semantic patterns describe a relationship between two concepts, such as a gene and drug, in the form Gene-Verb-Drug.

Semantic Search

Semantic search enables the natural language used in scientific text to be queried by understanding the intent of the searcher, the context of the query and the relationship between words in query.

Semi-Structured Data

Data that doesn’t reside in a structured database but that can be organised to some degree to make it simpler to find and analyse. Examples include file types such as Excel spreadsheets, XML and JSON. NoSQL databases are considered as semi structured.

SKOS (Simple Knowledge Organization System)

SKOS is a W3C standard designed for representing knowledge organization systems such as thesauri, classification schemes, taxonomies, and subject heading systems. It defines a model or framework for describing common features of controlled terminology, such as standard label predicates (preferred label, alternative label) and taxonomic relationships.

Structured Query Language (SQL)

SQL is a standard language used to manage and manipulate databases. It allows businesses to store, retrieve, and update data efficiently, making it easier to handle large data flows and generate reports.

SciBite Search Query Language (SSQL)

SSQL is an expressive query language designed for advanced search capabilities within the SciBite platform. It enables users to construct complex search queries to retrieve precise information.

Structured Data

Data that has been organized into a formatted repository, typically a relational database or data warehouse, so that it is accessible for downstream processing and analysis.

Synonyms

Synonyms are alternative terms used to describe the same thing, such as ‘heart attack’ and ‘myocardial infarction’. This ambiguity presents a challenge for computational techniques.

Go to the top.

U

Unstructured Data

Data that doesn’t fit neatly in a database is considered to be unstructured. Examples of unstructured data include scientific articles, word processing documents, presentations, images, emails, web pages and blogs. Unstructured data presents a problem to many computational techniques yet around 80% of data is considered unstructured. Hence, the ability to mine unstructured data has the potential to deliver enormous business benefits.

User Experience (UX)

User experience is the process of designing and evaluating a product or service to ensure it is user-friendly, efficient, and satisfying to use. It involves understanding user behaviors, needs, and motivations through research and testing.

Useability Testing

Usability testing is a process where real users try out a product, like a tool or app, to evaluate how easy and intuitive it is to use. The goal is to identify any problems or areas for improvement to ensure a better user experience.

Go to the top.

V

Vector-Based Information Retrieval (VBIR)

Vector-based information retrieval is a method used to find and rank relevant documents or data by converting text into multi-dimensional numerical values, called vectors. The similarity between the query and documents is then measured using mathematical functions, allowing the system to identify and rank the most relevant documents based on their vector representations.

Go to the top.

Richard Harrison

Senior Manager, Portfolio Marketing, SciBite

Richard is a seasoned marketing professional with over two decades of experience in the information services and life sciences sectors. Currently, he is the Senior Manager, Portfolio Marketing at Elsevier’s SciBite, where he drives strategic campaigns and harnesses data-driven strategies to amplify the platform’s online visibility and impact.

Share this article

Relevant resources, events and news

https://scibite.com/knowledge-hub/news/understanding-bio-ontology-jargon/ thumbnail image

News A hacker’s guide to understanding bio-ontology jargon

Demystifying bio-ontology jargon. Enhance understanding and streamline research with valuable insights. Explore now!