Harnessing our latest VOCab:

The 6.5.2 release of SciBite’s VOCabs introduces a range of new VOCab packs as well as updates to existing vocabularies. In this blog series we’ll be introducing each of the new VOCabs: IDMP, a new Sequence Ontology VOCab as part of the Genotype-Phenotype VOCab pack and, first up, the new Emtree VOCab pack.

VOCabs Logo (Blue) 1200x450px

What is Emtree?

Originally created by biomedicinal subject matter experts, Emtree is Elsevier’s authoritative life science thesaurus. It includes over 90,000 entities covering a wide range of topics, including drugs and medical devices, diseases and biological functions and medical procedures, as well as broader society and the environment.

Figure 1_Emtree’s Comprehensive Hierarchical Structure

Figure 1: Emtree’s comprehensive hierarchical structure.

Emtree is used to index the full text of journal articles in Embase – Elsevier’s comprehensive biomedical and pharmacological literature database – in a consistent and uniform manner. This enables researchers to stay ahead of the ever-accelerating pace of research and development, whether that is conducting systematic literature reviews, monitoring drug safety, or complying with medical device regulations.

Enriching and optimizing Emtree for text analytics

SciBite helps by further enriching and optimizing the Emtree thesaurus using a combination of our proprietary tools and manual curation, with rules and additional vocabularies. The result is our Emtree VOCab pack, which includes over 2.5 million synonyms. In the following example, the SciBite curated version of the term ‘Hepa filter’ has 72 synonyms compared to the original 17 in Emtree, a selection of which are shown below.

Figure 2 A Selection Of Synonyms For Hepa Filter Copy.jpg

Figure 2: A selection of synonyms for HEPA filter

Significantly, SciBite curation ensures relevance when context when applying meaning to unstructured text. For example, medical device names tend to include common words (e.g., Advance, ATOMS, Oxford, Exponent, Absolute). Curators address this problem using a variety of techniques, including making terms case sensitive, including sub-synonyms so that they are only matched if another synonym for the same concept is present in the text, and defining ‘booster’ words to identify the most relevant mentions, such as the type of device or associated surgical procedure.

Emtree, a common search standard across all of your data

Until now, the use of Emtree has been confined to indexing the articles within Embase. The creation of the Emtree VOCab expands existing use cases and introduces a host of exciting opportunities, from aligning your data to this common standard, expanding context-aware search results, and powering inferenced-based discoveries and predictive analysis.

With the new Emtree VOCab and TERMite, SciBite’s entity extraction engine, for the first time, users are able to annotate any text-based dataset to align with the Emtree standard. This allows you to search multiple data sources using a common interface such as SciBite Search – SciBite’s next-generation scientific search and analytics platform which offers powerful interrogation and analysis capabilities across both structured and unstructured public data and proprietary sources.

Figure 3_Use Emtree To Annotate And Search Across All Of Your Data Sources, Both Public And Proprietary

Figure 3: Use Emtree to annotate and search across all of your data sources, both public and proprietary.

Use case – Delivery of medical devices

Medical devices are key in patient healthcare, assisting in the diagnosis, prediction, and monitoring of disease, as well as treating and alleviating symptoms. In recent years, governments have sought greater assurances from medical device manufacturers around the quality, safety, and reliability before and after the marketing of these products.

In 2021, the European Union (EU) introduced the more stringent EU 2017/745 regulation [1] requiring manufacturers to consult with EU-level experts before placing high-risk medical devices into the market. This directive also specifies tighter controls around clinical evaluations, and agencies that certify medical devices. The onus is placed on improving the availability of information about medical devices to patients, and enhancements have been made to the continued vigilance and market surveillance of devices post-marketing.

In May 2022, this regulation was supported by directive (EU) 2017/746, specifically for in vitro diagnostic medical devices such as pregnancy and COVID-19 tests. In February 2021, the UK government passed into law the Medicines and Medical Devices Act 2021 [2] with a similar focus on patient safety.

Emtree has long since been regarded as a gold standard by regulatory agencies, including the European Medicines Agency (EMA) and Food and Drug Administration (FDA). With terminologies including the Medical Device Trade And General Names and Global Medical Device Nomenclature (GMDN), the Emtree VOCab supports those involved in the manufacture and conformity of medical devices, carrying out effectiveness studies, regulatory submissions or the (semi)automated vigilance-monitoring of these products.

Figure 4_Use Emtree To Analyze Multiple Data Sources

Figure 4: Use Emtree to analyze multiple data sources and types for medical devices, including clinicaltrial.gov, journal publications and FDA press releases.

Use Emtree procedures in your R&D

Having a detailed, clear, and unambiguous understanding of the methodology used in a given piece of research is key to validating, reproducing, and aligning to best practices in research and development. The ‘procedures’ component of the Emtree vocabulary includes an extensive collection of scientific, medical, and statistical techniques organized by subject. Users are able to couple this vocabulary with SciBite’s extensive biomedical dictionaries, such as Drug, Indication, and HGNC gene, to analyze literature across R&D, clinical, and manufacturing settings to address a wide range of use cases and questions, such as:<

  • What genetic mapping technique was used to identify the mutant gene X responsible for disease Y?
  • Which statistical techniques should be used to analyze specific expression data?
  • Which chemical transformations were used to generate compound Z?
  • Which formulations are typically used to manufacture particular drugs?
  • What is the most common anesthetic procedure for drug X?

The Emtree Procedures vocabulary, along with Drug, Indication, and HGNC gene, can be used to analyse literature across R&D, clinical, and manufacturing settings for a host of use cases.

Use case – Biomedical and other scientific use cases

The delivery of drugs, medical devices, and medical services is supported by the Emtree ‘health care’ vocabulary. Users are able to select concepts related to health care from multiple dictionaries, including disease management, quality, and economics terms. These may be particularly useful in ELN or admin systems for clean data capture, or economic evaluation of particular diseases or drugs, e.g., what is the “cost of illness” for heart disease or particular drug treatment?

A much broader ‘society and environment’ branch also features in Emtree, providing users with even more options such as finding articles for example pollutants and energy resources or to look for safety information (danger, risk, safety, and related phenomena branches).

For more information about SciBite’s VOCabs and SciBite products, contact us here.

Get in touch


About SciBite

SciBite is an award-winning semantic software company offering an ontology-led approach to transforming unstructured content into machine-readable clean data. Supporting the top 20 pharma with use cases across life sciences, SciBite empowers customers with a suite of fast, flexible, deployable API technologies, making it a critical component in scientific data-led strategies. Contact us to find out how we can help you get more from your data.


[1] EUR-Lex, Access to European Union Law, Consolidated text: Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, amending Directive 2001/83/EC, Regulation (EC) No 178/2002 and Regulation (EC) No 1223/2009 and repealing Council Directives 90/385/EEC and 93/42/EEC (Text with EEA relevance)Text with EEA relevance https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:02017R0745-20170505
[2] Medicines and Medical Devices Act 2021 https://bills.parliament.uk/bills/2700

Related articles

  1. What’s in our 6.5.2 TERMite / VOCabs release

    SciBite’s vocabularies fuel a host of use cases, from complex querying to data integration and discovery of new knowledge. In the 6.5.2 release of VOCabs, SciBite introduces the new Emtree VOCab pack, as well as a new Sequence Ontology vocab to the Genotype-Phenotype vocab pack. Several updates to existing vocabularies are also included.   

  2. Creating a SciBite VOCab from a public ontology

    Public ontologies are essential for applying FAIR principles to data but are not built for use in named entity recognition pipelines. At SciBite, we build on the public ontologies to create VOCabs optimized for NER. In this blog, discover how we create a SciBite VOCab from a Public Ontology.


How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us