What do medical devices and medicines, food products and cosmetics, household products and even smoking devices have in common? They all contain multiple ingredients, from natural to synthetic ones, in different quantities, all of which will be exposed to the population.
Toxicology is the branch of science that examines whether these components can harm our health. The field is governed by regulators, including the EMA and FDA, and manufacturers must comply with these regulations before putting any product on the market.
So how is this done? First, we must look at all relevant information about the substance’s exposure as well as its reported effects, including not only the substance itself but also its interactions with other substances.
Furthermore, the potential risks may differ depending on how the substance is encountered; whether part of a medical device, a smoking product, ingested as food, or used to dye our hair.
So let’s say I work for a company that produces food and beverages, such as beers. After some research, we want to launch a new product: a coffee-flavoured beer. To provide that flavor and aroma, the R&D team has used a coffee extract.
Now we need to submit a toxicology profile on this extract so that we prove compliance to the relevant authority. To provide such a profile, we need to conduct a thorough search of the scientific literature as well as relevant toxicology databases, including both publicly available data and proprietary information.
Alas, these different sources of data may not use the same standards, making this process a lot more difficult. We might end up having to search for multiple different strings, including ‘Coffee bean extract,’ ‘Coffee extract,’ ‘Coffee essence,’ ‘Coffee bean oil,’ ‘Oils,’ ‘coffee,’ ‘Coffea arabica extract,’ ‘Extract of coffee,’ ‘Extract of coffee bean,’ as well as proper chemical identifiers such as CCRIS 7095, EINECS 283-481-1, RN: 84650-00-0, etc.
You can see how finding all the relevant information can be a complex task unless some form of standardization and harmonization is implemented before the search. Furthermore, in addition to identifying a chemical entity or substance, it is necessary to be able to locate, within the available literature, which articles pertain to the specific aspect of risk being evaluated (say, the cytotoxicity, or the carcinogenicity).
Successfully accomplishing this quest for knowledge on a particular substance requires a team of experts in toxicology navigating these different sources and trying to make sure to find the signal around either huge amounts of data or very little specialist information, depending on the substance, a time-consuming task.
To get that tasty and caffeinated brew out in the market, you need to have your team of experts spending hours and hours just trying to find out how safe that product instead of spending the time on critically evaluating the literature, they spend time on figuring out how to search all those different sources and standards.
Semantic technology can make a big difference to the search problem outlined above. By enabling the integration and harmonization of data from different sources, ontologies and controlled vocabularies can be used to standardize the representation of concepts and terms used in different databases and sources of information.
This can help to overcome the problems of data heterogeneity and improve the accuracy and efficiency of data retrieval. It means that scientists and toxicology experts can focus on the evaluation of the information pulled from this search. In the above example, mapping all those diverse identifiers to a single concept, say, Coffee extract, on a vocabulary, and annotating all our sources with it will ensure whenever we search for this entity, we are capturing it regardless of the string that has been used to describe it.
Another advantage of using controlled vocabularies and ontologies is that you can exploit the relationships across concepts when the amount of information on a given query is very small or absent. So, for example, you can search for compounds or effects related to the one you are interested in, in order to inform decisions as to which other tests would have to be performed or if some of that information could be extrapolated to the substance under study.
But how can SciBite help with this? SciBite Search is a semantic search platform, provides advanced search capabilities from a wide range of data sources and uses controlled vocabularies developed in alignment with public standards where those are available and enriched with millions of synonyms and professional curation by our ontologies team. Connectors to public and proprietary sources of data enable the semantic annotation of their content in a single platform so that querying is performed using the same standard and over all of the sources at the same time.
Highly customizable, SciBite Search can be tailored to the specific use case of the toxicology workflow, supporting features such as the set-up of search alerts, which will notify you of new results from any given query over all your data sources, or the addition of new document schemas and metadata that can make the search more efficient. SciBite’s controlled vocabularies already cover some of the more relevant terms and concepts important in toxicology, but they are also easily extendable as the standards progress.
As with the rest of SciBite’s products, SciBite search provides standard APIs for all the tasks that can be performed through the graphical interface. This API-first design enables automation of workflows such as performing recurrent searches over a list of multiple substances, a common scenario for manufacturers going through the process of updating the toxicology profiles of all the substances in their products.
To conclude, the SciBite stack provides a solution to a quite common problem that manufacturers face, and this solution benefits not only the business trying to sell such products but all of us, who can now enjoy our coffee-flavored product safely.
Please get in touch with us to discuss how best we can support you in!
Claudia Millan, Tech Consultant. Holds a Ph.D. from the University of Barcelona in the development of computational methods for structural biology, a field where she has worked for more than 8 years. She has been with SciBite since 2022, supporting customers on their projects and helping them make the best out of the SciBite technology.
As genomic sequencing technologies get more advanced, large numbers of gene-disease associations have emerged. A gene with an unclear role within a disease is a source of ambiguity and can lead to misdiagnosis. In this blog, we demonstrate how semantic search technology can facilitate Gene-Disease Relationship Extraction.Read
In this blog we announce the v2.0 release of SciBite Search, our intelligent scientific search platform. We’ve expanded our Elsevier data connectivity, broadening the sources you can load and search, as well as a host of features that improve the user experience.Read
Get in touch with us to find out how we can transform your data
© SciBite Limited / Registered in England & Wales No. 07778456