In evaluating the question of FAIR (Findability, Accessibility, Interoperability, and Reusability), the discussion included defining what FAIR means to SciBite and questions addressing what progress and been made on customer digital transformation and how to know where to begin. The roundtable also explored the broader strategic implications of FAIR, the interplay between Artificial Intelligence (AI), new developments, and the future direction of the industry.
Question 1 – There are many definitions of FAIR, but what does FAIR mean to SciBite?
Question 2 – What progress has been made on customer digital transformation?
Question 3 – How do you know where to begin?
Question 4 – How is FAIR fitting within AI?
Question 5 – What are your thoughts on new developments and the direction the industry is heading?
Global Head of Sales & Partnerships, SciBite
Heading up the Sales and Partnerships side of the business, Julien and his team, work side-by-side with life sciences organizations across the globe to help them transform how they leverage scientific text for decision-making. With an MSc in Business Administration and Management and over 13 years of experience in various roles within the life sciences sector, most notably leading the EMEA Strategic Account Team within the life sciences division of Thomson Reuters.
View LinkedIn Profile
Head of Ontologies, SciBite
Jane leads the development of SciBite’s vocabularies and ontology services. With a Ph.D. in Genetics from Cambridge University and 15 years of experience working with biomedical ontologies, including at the EBI and Sanger Institute, she focussed on bioinformatics and developing biomedical ontologies. She has published over 35 scientific papers, mainly in ontology development.
View LinkedIn Profile
Director of Technical Consultants, SciBite
Joe Mullen, Director of Professional Services, EMEA. Holds a Ph.D. from Newcastle University in the development of computational approaches to drug repositioning, with a focus on semantic data integration and data mining. He has been with SciBite since 2017, initially as part of the Data Science team.
View LinkedIn Profile
Head of Semantic Technology, SciBite
Simon heads the semantic technology group at SciBite and has over 15 years of experience in helping customers getting value out of data and specializes in semantic technology solutions and ontologies. Prior to SciBite, Simon spent eight years at was a technical coordinator and EMBL European Bioinformatics Institute (EBI), developing software for ontology tools and services.
View LinkedIn Profile
Head of Sales and Operations, North America, SciBite
Joe oversees the company’s presence and growth within the US market, leading the sales and operations within our North American side of the business. Prior to SciBite, Joe spent eight years at Thomson Reuters, followed by Clarivate Analytics (the spin-out of TR’s IP & Science business unit), in sales and sales leadership roles managing the North American strategic account management team. Joe earned his J.D. from Suffolk University Law School in Boston.
View LinkedIn Profile
The following whitepaper is the transcript from that SciBite Virtual User Group meeting in 2022.
Go to the top.
Joe McCarthy: What we thought we would do here with this time is bring together a group of specialists from inside SciBite to have a roundtable discussion about the concept of FAIR and best practices and observations that we can share and discuss with everyone. SciBite has been thinking about the concept of FAIR for quite a long time.
A few days ago, we celebrated our 11th birthday as an organization, and the technologies that we have were founded on this concept of FAIR.
We’ve been involved in this space for quite some time, and we thought this would be an opportunity to have a conversation about this.
Go to the top.
Joe McCarthy: So, I’d like to pose the first question to the group. There are many definitions of FAIR, but it would be great to hear from you about what FAIR means to SciBite?
Joe Mullen: Thank you, Joe. I think I can probably have a first stab at this one. I think we’ve seen some fantastic presentations today. We saw Charles Bettembourg’s (Sanofi) fantastic definition of the FAIR principles as part of his talk. So, I won’t go through the details of the principles in any great granular detail. But I will say that our interpretation of FAIR, or how we believe that we support customers in the FAIRification journey, focuses on supporting them in the various stages of that digitalization journey.
We know many customers are in the process of their data digitalization journey, and part of that is driven by or is leading to data-driven innovation. Now a lot of the focus and the emphasis on getting to this point where you can drive insights from your data is ensuring that you have data-centric approaches to managing that data. To take a data-centric approach, the aim is to ensure that your data becomes your primary asset. It reduces the necessity of cleaning, providing, and serving data to various applications. You’re managing that at the heart of the enterprise.
FAIRication plays a massive part in this data-centric approach, and the ability for SciBite to provide customers with the capabilities to support this approach, and ensure that the customer has the data talking the language they speak is our primary focus here at SciBite. I think it’s also worth mentioning that it’s not just a one-step thing, the FAIRification process; it’s a journey, a path, and something that you need to follow as new data comes out and your standards evolve.
Our aim at SciBite is to help customers in their journey to a data-centric approach by providing the technologies and the software to ensure that they can get along their FAIRication journey.
Jane Lomax: I think Joe, you’re right, when you said that we’ve been working in this field for much longer than the acronym FAIR has been around, and it wasn’t called FAIR when I first started work in this field, it was called biocuration. It is all about making your data usable, making it so that you can use it for other applications, so it’s interoperable. I think that’s kind of what it means to us.
Joe Mullen: And I think that was highlighted very well in the talk AbbVie gave from their interoperable approach and how CENtree supports their interoperability of the work that is part of the ARCH* Platform (*ARCH = AbbVie R&D Convergence Hub).
Jane Lomax: I think the acronym FAIR gets used, as I have heard, because it fits nicely on PowerPoint slides!
Go to the top.
Joe McCarthy: So to that point, we’re involved with customers of SciBite at various points within their organizations and at different levels across this team. Can you comment on what you’ve seen across the multiple customers that you engage with on the progress that they’ve made, on their digital transformation journeys, and on their journeys toward FAIR?
Julien Debeauvais: From a sales standpoint, as you can imagine, we have had a number of discussions with several organizations across the board. We get the opportunity to speak with organizations just starting on the journey and organizations that are well into it. A topic that often concerns FAIR, is FAIR at an application level versus a data level. That’s a topic that comes around quite often on our side and that we’re very keen to take on because it speaks to the approach SciBite decides to take in the way we build technology.
FAIRifying and enriching data using public industry standards at an application level, such as an ELN, a bioassay repository, or a data lake, is the norm nowadays. And to some extent, it’s also becoming an essential requirement in vendor selection when pharmaceutical companies are looking for new pieces of technology. So, I don’t need to go through the benefits of FAIR, we all know what they are. But I want to highlight some limitations people may face if they’re focused on FAIRifying at an application level rather than a data level.
First, each application uses different standards because they come from other vendors, and you still have that silo effect that the FAIR principles aim to eradicate. From a data integration standpoint, that will make things challenging.
The second aspect of things is that as organizations embark on their FAIR journey, and I’m sure you’re going to hear a lot of the word journey as we go along this discussion because it is one of those; they’re creating their own standards and their own view of the world, and they want to apply those to the downstream applications that they select. So, it’s key.
So, to summarize it to some extent, it all depends on where you are in the journey, but it’s essential to think ahead. Today you may want only to have a single project and an application that provides basic standards that will be enough to serve the use case or the issue you’re trying to solve. But in the future, if you plan to go beyond, it’s vital to ensure that the applications that are selected early on in that journey will be able to evolve as your capabilities and your skills.
The technology that you use to FAIRify data is also changing. So that ultimately the aim is that everything in the architecture is connected in the longer term. That’s one of the topics we see raising quite a lot out of FAIR at the application level vs. the data level. I don’t know if anyone wants to chip in or add anything on top of that.
Joe Mullen: I think that’s a good point, Jules, and I think if you already have a lot of FAIR at an application level, it’s also ensuring that you prioritize; there’s going to be a little effort to get that moved out into more of a data-centric approach. It’s about understanding the priorities with these different use cases, showing the value, and having some metrics. You can show the value add of the FAIR principles with feedback and something tangible back to businesses as you look to the scope and scale out across the rest of the enterprise.
Simon Jupp: So, I can add to that; I think what we are observing is the development of FAIR ecosystems, where you have multiple vendor applications capable of ingesting and surfacing data that is aligned with standards. We see this growing now, and I think the importance of adopting open standards is what we try and encourage and enable through our tech to take all these amazing freely available open standards that we have and build on that.
So, no matter which kind of vendors you go with as part of your FAIR journey, it’s important that any data that you put into these systems, is that you can get it back out; are they aligned to these standards and are constantly trying to reduce the cost of how the data is moving through these different systems.
It’s something that’s important to us; when we design and build our software, we always have that adherence to standards. When we come up to blockers, we try to interact with other vendor systems where they don’t have a similar sort of API first or open standard approach to software. I think for a FAIR ecosystem to work with, you need to put pressure on all the vendors, and it’s this kind of alignment helps the organization achieve what they are trying to do with FAIR.
Jane Lomax: Presuming that would be an expectation at some point that customers will expect the software they buy to be able to inter-operate within that kind of ecosystem.
Joe Mullen: Maybe Jules can touch on some of the partner ecosystems that we’ve gotten into and some of the conversations in that regard as well.
Julien Debeauvais: Absolutely. I wanted to jump in and mention that we’re constantly building on the partnership ecosystem. We are looking to partner and get close to several key technology vendors in our ecosystem. But building those bridges is proving difficult, from a commercial level, that’s the simple part, and then, from a technical level, how you plug them together, it’s where the complexity comes in.
Simon just mentioned the API first philosophy of SciBite. From our standpoint, we set out of the gate with the view that we wanted to be integrated. The technology we provided was designed to play nicely with other pieces of tech. I think, in the same way, some of our customers are starting their FAIR journey, and it’s the same on the vendor side. Not all the vendors have yet realized the importance of this.
I can’t stress enough how important it is; Whether that’s us at SciBite, to continue educating and evangelizing the importance of FAIR, but also you as the pharmaceutical industry or life science industry to keep asking those vendors about what are their capabilities in terms of FAIR; Would they be able to accept other pieces of technology or are integrable with other pieces of technology to build that ecosystem.
We’re counting on you to ensure that as part of your vendor selection process, this is one of the line items that feature in those big documents, RFP documents, thrown at technology vendors like us.
Go to the top.
Joe McCarthy: Let me jump in with a question about getting started on this. So, we heard earlier from Charles Bettembourg at Sanofi, about the question that was posed to him after his presentation: how do you know where to begin? You could easily try to boil the ocean right out of the gate, but that will not have meaningful returns. What have you seen across our customers who have successfully implemented FAIR methodologies on their journeys? As you said, Jules, with starting small and proving value, how do we see that working best?
Jane Lomax: Perhaps I can jump in on that one. I think it’s such a big thing to tackle. You’ve got a lot of data. There’s a lot of legacy data. Where do we even start? The scale of the thing can be overwhelming. Where we’ve seen the most success across our customer base is where you take the low-hanging fruit and a use case at a time.
You do the most amenable things, and then take what you learned there. You take the principles, you accept the technology, and then you kind of iterate that across, and then every time it gets a little bit better; you take another use case, you tack on another department, and then bit-by-bit, it’s never going to be finished, but you’re going to improve with each step.
Joe Mullen: I think the point is to start small and prove value as quickly as possible. We talk about the low-hanging fruit. I think it makes sense to pick something where you can see it as a priority in terms of a business use case. There are demonstrable key metrics that show you’ve improved efficiency and quality. Just to Jane’s point, you can build and rinse and repeat and scale that out across to use cases as well. I think that choosing that use case, that entry point to prove the value, is critical in ensuring that the FAIRification process runs as smoothly and efficiently as possible within our customer base.
Jane Lomax: It is not always easy because you’re using one set of ontologies in one part, and then in another, they’re using something else, so you will always have to adapt, but I think you got to start somewhere.
Simon Jupp: But to add on, I think it’s important to stress, especially with managing things like ontologies, that there isn’t a known or documented process for doing this, and it’s one of the reasons we built CENtree because we saw the struggle. It’s pretty interesting to see the evolution of CENtree over the last two years as we’ve been working with the customer’s way we started simply.
We’ve been developing many of those kinds of processes with you and hearing your use cases, and building those features into CENtree, there isn’t a known recipe for doing this. We are working, and we often have to sort of work out and see these different challenges. What’s great is that we’re seeing the same challenges from other customers, which helps us prioritize these features.
But we’ve seen a significant maturity in how people think about deploying ontologies at scale in multiple applications across an enterprise. Where several years ago, that wasn’t happening. We’re only really at the start of understanding the challenges and the consequences of saying, “right, we’re going to adopt a full FAIR strategy, and now these are things that we’ll have to start solving and making important”.
Jane Lomax: Interestingly, many of the workflows we envisaged when the first CENtree was built didn’t materialize, and they evolved in many different ways. But we’ve been able to respond to that.
Julien Debeauvais: Yeah, and I was going to add that, what you and Joe described around the start small, pick up the low-hanging fruit; I think that’s good practice in any project. I remember sitting down at a roundtable at an in-person event where we talked about big data, and the same questions applied: Where do you start? You can tackle the most complex problem or even try to boil the ocean, as the term is, and then you don’t get the momentum going.
So again, starting small, building momentum, deliver some value immediately to the business by taking on one of the easy issues to solve is important. That’s how we tend to work from a sales standpoint at SciBite. Most of you on the call have that experience working with us; we’ll know that when we started our engagement with your organizations, we started with a proof of concept.
The proof of concept is as much to prove that demonstrates the capabilities of the software, but also the value that you can generate on the use cases. So very often, those objectives that are selected for those proof of concept are to demonstrate the tech but also to make sure that we deliver value quickly to the business to get that momentum and get the resources, whether that’s human resources or monetary funding, for the projects to keep going.
Go to the top.
Joe McCarthy: Let me transition to the concept of AI and FAIR in that context; What would the panelists say about how they broadly see FAIR fitting within AI?
Joe Mullen: That is an excellent question. I suppose there is a variety of different ways that you can look at the interplay between AI and FAIR. You’ve got the concept of feeding models in terms of creating models with FAIRified data upfront. Lee Harland (SciBite Founder & CSO) touched a bit on some of the requirements around the provenance of the data used to produce future models, that’ll be required as part of regulatory submissions.
If you’re creating models, then you’re using them to create or speed up regulatory submissions, and you’re going to need to be able to provide evidence as to what data was used to produce them.
The ability to have metadata, or rich metadata at that, assigned to the data fed into the production of the AI models is obviously of great importance. There’s obviously great relevance there and something that is not only just a nice thing to do, but there are going to be regulatory requirements to do so in the future.
If I may also talk to a different strand and I will ask for others’ input as well. But there’s also the concept of using AI to actually FAIRify your data in the first place. So can you make use of some of the big kinds of transformer-based models, you have BERT models, you have other models for doing some of the normalizations, or if you look at the NER of these datasets.
Now obviously, there are limitations to making use of AI models in the context of FAIRifying your data where you’re identifying things or strings that look like something; this model is able to identify this string that looks potentially like a gene, but you also want to have the ability to be still able to align that to standards to ensure that you have that really FAIRified and enriched and semantically available for the downstream processing.
So, you can see that there’s an excellent opportunity for FAIR to support the creation of models. I think that AI in the kind of downstream process for FAIRification of data still comes with quite a few caveats that people should be aware of; With that, I will probably ask if anybody else has anything else to add to those kinds of caveats that we should be thinking about.
Jane Lomax: In terms of our own kinds of workflows, we’ve managed to use machine learning internally to do some of the tasks. One area we’ve managed to use it for is where we need to build out a Named Entity Recognition (NER) module for an area that’s new or particularly underserved in the public domain.
Things we’ve done include medical devices and lab equipment, so there isn’t a public ontology to use. So we’ve taken a sort of a small set of terms related to the area and then used a word2vec model to pull out related terms. We then curate those, and then ultimately, we can turn those into a BioBERT model to pull out things that we think might be relevant for a corpus, and then, we can curate those into an ontology. So that’s the kind of workflow we’ve used.
Joe Mullen: I think it’s important to stress that up front, there is still a human in the loop that we’re not entirely going off and going wild. There is a necessity to have that human in the loop for curation’s sake, to ensure that these are the candidate terms and are approved by real curators and SEMs in the domain, to ensure that this is as true positive as possible.
Jane Lomax: We do use it to feed into the rule-based system. When you talk about something like automatic ontology building, it is that ontologies are human artifacts, and that is what makes them valuable. They’re like a human model of understanding that can be used by a computer as well.
They have to have a human in the loop. You can make it easier by pulling out potential new terms and pulling out potential relationships, and we have been exploring ways to feed those into CENtree.
Simon Jupp: Yeah. I think we can see, definitely in the immediate future and in some of the SciBite tooling, how we’ll start to exploit some of this AI within tools, working alongside the human curator in tools like CENtree; where we can look at things like AI assistance in terms of suggesting, new relationships, new synonyms like that in your ontology, or what will be really nice, indicating where you might be making or breaking changes or things that don’t seem to make sense with respect to other ontologies. There’s probably a nice role where the AI could sit alongside and help with these large curation tasks.
Jane Lomax: Yeah, watch this space!
Go to the top.
Joe McCarthy: Yeah. OK, so let’s begin to wrap this up with the biggest question. The broad question, the hard ones about where things are headed. So, what thoughts do you have as a team, on new developments and the direction that the industry is heading?
Joe Mullen: I can probably touch on what we, as SciBite, will be focusing on and what things we will be looking at in the near to long term. Obviously, we’ve got a steady platform to build on and some fantastic technology that can support FAIRification. Now, what we’ve also got as part of our larger acquisition from Elsevier is a lot of fantastic data as well.
What we’re looking at is, how can we go from data to answers? How can we take that data and use the fantastic technology we’ve developed over the years, leverage the expertise, the scientific domain expertise that we have in-house, and come to some great answers with that data?
Bringing together a combination of all our strengths and what we’ve also got coming in from Elsevier’s part of the acquisition and helping customers get to the crux of the matter, which is answers. I think it’s essential to say that building on not only potentially Elsevier data but customer data as well. We have all the pieces in place. I believe we are in a fantastic position to go forward and help customers extract insights from not only Elsevier data but also from their internal data and other external data sources.
Jane Lomax: I think there’s going to be an expectation from customers that if you’re providing content, if that’s papers or whatever, then you need to provide that FAIRly. You can’t expect the customers to have to do that job themselves. I think that is going to be an expectation.
So, in the future, AI isn’t going anywhere, FAIR isn’t going anywhere, and these are going to be big problems; the more that people want to adopt machine learning and AI technologies, the more that they’re going to require FAIR data, so that is not going to change.
Joe Mullen: We’re fortunate in as much as we have an excellent R&D machine learning team at SciBite, continuously looking at how we can best align some of these methodologies and approaches to our existing technology; Not just for the sake of being able to call up the fact that we make use of the AI, but because they do bring value to what it is that we’re doing.
So, wherever innovation is cropping up, we will be in front of it and looking to pull that into our existing technology; Can we better and improve the features, as we move forward? The conversation with customers always drives that. We’re constantly conversing, talking to customers, feeding features back into the team, and making sure that we’re not just going off on a tangent; we are, driving toward what our customer needs.
Joe McCarthy: Thanks to our panelists for their thoughts on “Best Practice, Observations, and Realization in FAIR”.
Go to the top.
Public ontologies are essential for applying FAIR principles to data but are not built for use in named entity recognition pipelines. At SciBite, we build on the public ontologies to create VOCabs optimized for NER. In this blog, discover how we create a SciBite VOCab from a Public Ontology.Read
Get in touch with us to find out how we can transform your data
© Copyright © 2023 Elsevier Ltd., its licensors, and contributors. All rights are reserved, including those for text and data mining, AI training, and similar technologies.