In this blog we introduce our new package of vocabularies designed to enable the FAIR data principles and help pharmaceutical companies navigate their documents with respect to Chemistry, Manufacture and Control (CMC) procedures.
The vast majority of people have, at some stage in their life, had to rely on medication of one type or another. What is not apparent to most, however, is the sheer extent of procedures and testing that drug has gone through to make it to market.
Aside from the initial discovery of a therapeutic chemical or biologic and the subsequent clinical trials that are required to ensure it is safe and effective for a particular health condition, there are numerous other requirements that a new drug has to achieve in order for the final drug product to be licensed and approved for manufacture and marketing. The collective name for these procedures is Chemistry, Manufacture and Control, or CMC.
CMC procedures are enforced by regulators such as the Food and Drug Administration (FDA) and the European Medicines Agency (EMA) and are essential to ensure that the performance of a drug observed in clinical trials is stable and consistent when the drug product formulation is scaled up during commercial production. CMC includes the more technical assays for physicochemical properties, such as stability, solubility and particle size distribution, which can affect bioavailability of the drug, but also includes the formulation of a drug, e.g. coated tablet or transdermal patch, and even the colour of the drug substance or how the drug is packaged, which can affect it’s shelf life.
With so many different types of requirements to fulfil, pharmaceutical companies soon accumulate a vast collection of reports, forms and general paperwork that document each of the procedures applied to a drug substance. The need for a fast and effective method of tracking the progress of each of their products through the CMC process is essential to avoid delays in delivering the drug to patients.
It should be easy to search through a collection of documents for a piece of information, right? The answer may be ‘Yes’, if you want only the one answer and you know exactly which words to search for.
Let’s imagine, however, you need to find all the dosage forms of a particular drug product that you’re interested in. You would need to first determine all possible ways in which the drug product had been referred to in the documents you are searching. This in itself is likely to be a time-consuming task. Take, for example, the common pain relief medicine, Acetaminophen. This alone has over 150 different names, not including each manufacturer’s internal codes and identifiers. Add also the potential of spelling mistakes (a high possibility in internal documents and memos) and variations in word order and suddenly you are looking at hundreds, possibly thousands, of different variations to refer to a single drug.
If we now think about searching dosage forms and all the variations in terminology for these, we can see that this simple query is very quickly turning into a complex task. Dosage forms are the ways in which a drug is formulated for administration to the patient, so it could be a tablet, capsule or intravenous solution, for example. An additional problem with the dosage form terminology is that some of the terms are incredibly general, for example “film” or “oral”, which could refer to countless other concepts not related to drug formulation. We now begin to see that context is also an essential consideration when searching for keywords in documents.
Take these two sentences that match a simple text search for acetaminophen or one of its synonyms, paracetamol, and ‘oral’:
Both meet the criteria for the search, but the context in which the drug name appears is very different. Only the second sentence is referring to the dosage form of the drug, the first sentence uses ‘oral’ in the wrong context for our purposes.
The principle of FAIR (Findable, Accessible, Interoperable, Reusable) data is that users should be able to effectively find and re-use corporate data. Thus, reducing the variances described above is key. In order to combine context with the many variations of the concepts involved, SciBite are building master vocabularies encompassing terminology used in the CMC field.So far, our Ontology team has created six new vocabularies (Table 1) and augmented a further six existing ones with CMC-specific terms.
This set will form a new CMC package (Figure 1) that can be used by customers to mine information from their internal documents and ELNs or incorporate into electronic data capture systems to ensure consistency at data entry point.
Learn more in our Webinar on scaling the data mountain with ontologies, deep learning & FAIR.
We have also worked with customers to prepare custom dictionaries containing their proprietary codes and identifiers for their portfolio of drug compounds, including intermediate chemicals and impurities. When used alongside the CMC package of vocabularies, this allows for more sophisticated and thorough text querying. If we go back to our original example of searching for all the formulations of a particular drug product, this would mean the capability to include internal drug codes and identifiers in the query, allowing for comprehensive and fast searching of internal documents and memos.Figure 2 shows an example of how the CMC vocabularies can be used in SciBite’s named entity recognition tool, TERMite. Here, the CHEMREC (Chemical Reactions), CHMO (Chemical Methods), DRUG and MATPROP (Material Property) vocabularies are used to find matching hits from a document; different names for the same concept are federated under a single term.
By working closely with our customers, we are able to tailor SciBite technologies to their specific needs, delivering bespoke solutions to their data management requirements and helping to make their data FAIR.
If you would like to discuss your specific requirements around CMC, please contact the SciBite team.
SciBite's latest TERMite 6.3 release includes a new set of clinical ontologies as it introduces a set of CDISC vocabularies.Read
Here's how some of the SciBite team got on at the FAIRplus Project SME and Innovation Forum.Read
Get in touch with us to find out how we can transform your data
© SciBite Limited / Registered in England & Wales No. 07778456