At SciBite terminologies underpin all that we do. There are many ways to represent and build a standardised terminology, each with different levels of complexity. On one hand you have simple, informal, lightweight terminologies (e.g., glossaries, dictionaries, and thesauri), where the meaning (semantics) of terms is captured using natural language.
These can get more informative when we encode structure and semantic relationships into them, such as taxonomies or controlled vocabularies. At the other end of the spectrum, we have full-blown formal ontology language, like OWL, that allows you to build terminologies with strict and precise semantics.
At SciBite, we recognize that there’s a mixture of formats for capturing terminologies in the wild and that each serves different use cases.
Standards such as Medical Subject Headings (MeSH) are designed for document indexing and categorization. Concepts in MeSH are organized into a hierarchy using generic broader/narrower relationships that are useful in supporting document retrieval and navigation.
For example, in MeSH, the Anatomy branch organizes Body Regions into a hierarchy where concepts such as Eye, Mouth, Nose, and Chin are all narrower terms under Face, which is itself a narrower term for Head. In contrast, other standards, such as Uberon, represent body regions using an OWL ontology where stricter relationships such as subclass and part-of are used to organise the hierarchy and provide a more meaningful description of these concepts.
Figure 1: Terminologies overview. Terminologies may be represented using varied levels of expressivity and formality, depending on the use case it is designed to serve, as well as its level of maturity.
Figure 2: Strict semantics in OWL vs. weaker semantics in MeSH. Above we can see a class from Uberon, an OWL-based ontology, in CENtree and the equivalent class in MeSH. The graph view shows the type of relationships for each class, being strict part_of relations in Uberon and less specific in MeSH.
In a bid to improve the interoperability of controlled vocabularies and terminologies, where weaker semantics are required to organize concepts into hierarchies, then the Simple Knowledge Organisation System (SKOS) provides a convenient alternative to more formal ontology modeling languages like OWL.
SKOS was built as a standard by the W3C for the representation of controlled vocabularies and thesauri in the late 2000s. SKOS is often a good starting point when building new vocabularies that may later become ontologies; it provides a more complete standard for describing common features of a controlled terminology such as standard label predicates (pref label, alt label, etc.) and taxonomic information (broader/narrower relationships).
SKOS is predominantly used to support search and navigation use cases. In such settings, the alt label predicate enables synonyms to be captured, while the broader and narrower predicates allow users to browse for search terms and enable information retrieval applications to use this structure to automatically expand queries.
Furthermore, SKOS-XL, which defines an extension for SKOS, allows for the representation of literal entitles (e.g., a label or synonym) as a resource in their own right. This feature allows vocabulary editors to provide unique identification to textual labels and grants the ability to define relationships between these entities. At SciBite, we can take advantage of the SKOS-XL representation to add additional information about synonyms to aid named entity recognition (NER) in our TERMite system.
This makes SKOS-XL the perfect means for representing and sharing vocabularies within the SciBite stack. SKOS-XL allows for NER ‘rules’ to be captured in the vocabulary before it is passed on to TERMite to be used for marking up text. The same SKOS-XL representation can also be used by our search solution, SciBite Search, for encoding the associated taxonomy of the vocabulary.
Up until recently, CENtree primarily supported OWL, hiding a lot of the complexity captured in OWL through the utilization of the internal CENtree representation model. This internal representation is also used for controlled vocabularies, which aligns well with SKOS. We are very pleased to announce that in CENtree 2.1 we have some additional features that build upon CENtree’s ability to support the ingestion, manipulation, and export of SKOS-based terminologies:
Although CENtree has been designed to handle a wide variety of standards in a seamless manner, we have extended some of the SKOS support in the latest release of the tool. Additional SKOS support will not only provide a smooth integration from CENtree to TERMite but will also enable users that are either at the start of their ontology journey and, therefore, are ingesting terminologies with less complexity into CENtree, or those who are utilizing the SKOS format as a means of representing terminologies across the business.
To learn more about CENtree or find out more about how we can help you get more from your data, contact the SciBite team.
Andy Balfe received his BSc and PhD in organic chemistry from the University of East Anglia. He coordinates the delivery of innovative projects across SciBite’s product suite.
Other articles by Andy: