At SciBite terminologies underpin all that we do. There are many ways to represent and build a standardised terminology, each with different levels of complexity. On one hand you have simple, informal, lightweight terminologies (e.g., glossaries, dictionaries, and thesauri), where the meaning (semantics) of terms is captured using natural language.
These can get more informative when we encode structure and semantic relationships into them, such as taxonomies or controlled vocabularies. At the other end of the spectrum, we have full-blown formal ontology language, like OWL, that allows you to build terminologies with strict and precise semantics.
Figure 1. Terminologies overview. Terminologies may be represented using varied levels of expressivity and formality, depending on the use case it is designed to serve, as well as its level of maturity.
At SciBite, we recognize that there’s a mixture of formats for capturing terminologies in the wild and that each serves different use cases.
Standards such as Medical Subject Headings (MeSH) are designed for document indexing and categorization. Concepts in MeSH are organized into a hierarchy using generic broader/narrower relationships that are useful in supporting document retrieval and navigation. For example, in MeSH, the Anatomy branch organizes Body Regions into a hierarchy where concepts such as Eye, Mouth, Nose, and Chin are all narrower terms under Face, which is itself a narrower term for Head. In contrast, other standards, such as Uberon, represent body regions using an OWL ontology where stricter relationships such as subclass and part-of are used to organise the hierarchy and provide a more meaningful description of these concepts.
Figure 2. Strict semantics in OWL vs. weaker semantics in MeSH. Above we can see a class from Uberon, an OWL-based ontology, in CENtree and the equivalent class in MeSH. The graph view shows the type of relationships for each class, being strict part_of relations in Uberon and less specific in MeSH.
In a bid to improve the interoperability of controlled vocabularies and terminologies, where weaker semantics are required to organize concepts into hierarchies, then the Simple Knowledge Organisation System (SKOS) provides a convenient alternative to more formal ontology modeling languages like OWL.
SKOS was built as a standard by the W3C for the representation of controlled vocabularies and thesauri in the late 2000s. SKOS is often a good starting point when building new vocabularies that may later become ontologies; it provides a more complete standard for describing common features of a controlled terminology such as standard label predicates (pref label, alt label, etc.) and taxonomic information (broader/narrower relationships).
SKOS is predominantly used to support search and navigation use cases. In such settings, the alt label predicate enables synonyms to be captured, while the broader and narrower predicates allow users to browse for search terms and enable information retrieval applications to use this structure to automatically expand queries.
Furthermore, SKOS-XL, which defines an extension for SKOS, allows for the representation of literal entitles (e.g., a label or synonym) as a resource in their own right. This feature allows vocabulary editors to provide unique identification to textual labels and grants the ability to define relationships between these entities. At SciBite, we can take advantage of the SKOS-XL representation to add additional information about synonyms to aid named entity recognition (NER) in our TERMite system.
This makes SKOS-XL the perfect means for representing and sharing vocabularies within the SciBite stack. SKOS-XL allows for NER ‘rules’ to be captured in the vocabulary before it is passed on to TERMite to be used for marking up text. The same SKOS-XL representation can also be used by our search solution, SciBite Search, for encoding the associated taxonomy of the vocabulary.
Up until recently, CENtree primarily supported OWL, hiding a lot of the complexity captured in OWL through the utilization of the internal CENtree representation model. This internal representation is also used for controlled vocabularies, which aligns well with SKOS. We are very pleased to announce that in CENtree 2.1 we have some additional features that build upon CENtree’s ability to support the ingestion, manipulation, and export of SKOS-based terminologies:
Although CENtree has been designed to handle a wide variety of standards in a seamless manner, we have extended some of the SKOS support in the latest release of the tool. Additional SKOS support will not only provide a smooth integration from CENtree to TERMite but will also enable users that are either at the start of their ontology journey and, therefore, are ingesting terminologies with less complexity into CENtree, or those who are utilizing the SKOS format as a means of representing terminologies across the business.
To learn more about CENtree or find out more about how we can help you get more from your data, contact the SciBite team.
SciBite releases latest version of CENtree, the revolutionary ontology management platform for the life sciences.Read
Elsevier, a global research publishing and information analytics provider, and part of RELX, has acquired SciBite, a semantic AI company headquartered in Cambridge, UK, to help customers make faster, more effective R&D decisions through advanced text and data intelligence solutions.Read
Get in touch with us to find out how we can transform your data
© SciBite Limited / Registered in England & Wales No. 07778456