Unstructured and siloed data in the life sciences remains a significant barrier to fulfilling the promise of digital transformation. Awareness is growing for the importance of data capture and storage, enabling it to be effectively found, accessed, used interoperably and reused. These are the foundations of FAIR. Capturing data with FAIR in mind, ensuring your data is “born FAIR”, is key to unlocking the full potential of data.
However, strategies that support a unified data-centric, rather than an application-centric, approach to FAIR remain a challenge to implement. Third-party applications need to embrace the concept for it to be effectively adopted. In this blog post, we will present how SciBite is supporting enterprise-wide FAIR through its centralized ontology management tool CENtree and how we have collaborated with Benchling, a leading provider of cloud R&D solutions for the life science industry, to support mutual customers achieve enterprise FAIR.
Implementing enterprise FAIR requires a common vocabulary or terminology to be used across the entire organization. Increasingly ontologies are being used for this purpose, with broad support from the scientific community, with gold-standard public ontologies such as BAO, MedDRA, and SNOMED being maintained by the community. While these provide a great starting point, they often need to be augmented. Additional terms or synonyms may need to be added, and bespoke ontologies for proprietary business-centric concepts, such as internal product names, need to be developed and maintained from scratch.
A specific set of requirements are needed from an ontology management tool to allow ontologies to be built, maintained, and deployed at an enterprise level. Some of these are captured below:
These capabilities, along with many more, are captured in CENtree, which explains the tool’s rapid adoption within the market.
Benchling, a leader in cloud informatics software for R&D, provides digital solutions that support the biopharmaceutical, agritech, and industrial biotechnology industries. They work with hundreds of biotech companies who are on the cutting edge of innovation in fields such as cell therapy, gene editing (CRISPR), and agricultural science.
As a forward-thinking tool, Benchling supports ontology backed data capture and registration natively within their products. This allows users to model their data types and construct an ontology within the application directly. These ontologies can then be used to populate dropdowns, enabling semantic search within the application. In addition, Benchling can import an ontology in the form of a CSV file to populate their registry system. This capability proved to be a valuable starting point for a connection between CENtree and Benchling’s R&D Cloud.
While Benchling provides strong ontology support natively within their R&D Cloud, the joint customer that we recently worked with is implementing an enterprise-wide approach to FAIR and is using CENtree’s ontology management capabilities (highlighted above) to achieve this. They need a common set of ontologies to be deployed across the enterprise, not just within their Benchling deployment. In addition, these ontologies need to be kept in sync so that Benchling users are not creating edits and changes to the ontologies that conflict with another area of the business.
This requires dedicated tooling that needs to be managed at the enterprise level. Ontologies are being managed using CENtree and deployed to internal and 3rd party applications across the enterprise.
SciBite implemented a middleware solution that allows for ontologies captured in CENtree to be pushed to Benchling and served in drop-down boxes during data entry.
This solution requires a user to first create a custom entity schema in Benchling (see Fig 1.) with a middleware script then pulling a specified ontology from CENtree before pushing this to Benchling; utilizing the APIs of both systems. A custom entity represents a term from an ontology and allows us to capture useful metadata about that term inside Benchling, such as the primary identifier, preferred label, synonym, textual definition, and ontology mappings. The labels and synonyms can be used as entity aliases to enable type-ahead style lookups for ontology terms within any kind of Benchling template.
Figure 1: CENtree is used to populate the Custom Entity schema with terms from the Bio Assay Ontology. Here is how a single term for “FDA-approved compound library” (BAO:0700004) appears in CENtree on the left and in Benchling on the right.
Figure 2: Users can create data input templates in Benchling that ensure data is annotated to standard ontologies coming from CENtree. Here we show a simple sample registration table where the Benchling user can select values from the Bio Assay Ontology to describe the data using a simple typeahead. Note that in row 10 under the tissue column, colorectal is missing from the ontology and this is highlighted with the red triangle in the top corner of the cell.
The integration presented above can provide great value to our mutual customers. If you are interested in finding out more, please reach out to us. We would love to discuss your requirements and have customers shape a deeper integration in the future, such as real-time synchronization with CENtree. This would allow users to interact directly with a centralized set of ontologies rather than localised copies within specific applications, streamlining enterprise FAIR.
Achieving true enterprise FAIR is an aspiration of many leading biopharma organizations, allowing them to operate more efficiently and realize the true value of their internal data assets. Enterprise FAIR is also a pre-requisite for more advanced applications of the data, such as using AI to optimize processes or exploring alternative uses for their products via knowledge graphs. While great progress has been made in moving towards enterprise FAIR, we are still relatively early in this process. SciBite is supporting many of our customers in achieving this, and the adoption of our tooling is growing rapidly.
In this blog post, we have demonstrated how, with simple connections into Benchling’s R&D Cloud we are able to take centralised, enterprise-wide ontologies and use these within Benchling to populate pick lists for ontology backed data capture. While this connection is relatively basic in its implementation, it provides significant value to our mutual customers and helps them in achieving enterprise FAIR. Through CENtree’s API’s we can extend this connection to offer a much deeper integration and we will continue to work with Benchling and mutual customers to meet their requirements.
If you are a Benchling user and are interested in centralised ontology management, please get in touch. Likewise, if you have another application that you would like to explore this type of integration with, please reach out to SciBite.
Find out more about how SciBite’s solutions can help unlock the potential of the R&D data in your business.
Sam leads partnerships and alliances at SciBite, working collaboratively with existing partners and developing new partnerships aligned to SciBite’s strategic goals. He has a strong technical background in the life sciences, with a PhD in Protein Biochemistry from the University of Nottingham and post-doctoral training in bioinformatics within the department of Neurosurgery at the University of California San Francisco.
Prior to Joining SciBite he held technical sales and commercial roles at Carl Zeiss and most recently led business development at Repositive, building relationships with contract research organisations, biotech’s and pharma companies, facilitating data exchange and search across multiomic datasets. He has a good grasp of the challenges of dealing with unstructured scientific data, and collaboratively developing practical solutions to overcome these.