How SciBite and Elsevier manage KOL identification

Image and link to LinkedIn profile of blog author Zahra Hosseini

Identifying KOLs enables our customers to be the first to follow the latest trends and markets or start new collaborations. As you can imagine, spotting and engaging KOLs as fast and accurately as possible is crucial - read more to understand how.

Rainbow Over The Sea


Our customers are among the most influential pharma companies, and they aim to improve people’s well-being and quality of life and be leaders of the most valued pharma companies. For that reason, they are keen to be aware of Key Opinion Leaders’ (KOLs) insights. KOLs are significant assets in life science, as they can speed up research programs and significantly improve public awareness and acceptance in new therapeutic areas.

There are key references for pharma companies to keep up with the pace of change. Identifying KOLs enables our customers and partners to be the first to follow the latest trends and markets or start new collaborations, grasp new opportunities, plan steps, and optimize activities. Some organizations may wish to track and assess their findings through publications, allowing them to spot new collaborators and improve their future arrangements. As you can imagine, spotting and engaging KOLs as fast and accurately as possible is crucial.

The issue

Scientists had to review publications, authors, affiliations, engagements, conferences, and more for many years to identify the most influential names and brands in a single domain. I said, “had to,” as Elsevier and SciBite data and technology put an end to this inefficient, laborious method that cannot keep up with the speed and volume of changes.

To answer our customers’ questions, we have studied available solutions, and something was missing in each of them. For instance:

  • Lack of access to up-to-date, high-quality, granular, annotated data limits following the prompt changes in the trajectory of known influencers’ scientific focus. While public resources such as PubMed actively update, search strategies are limited due to a lack of annotated FAIR data.
  • Most of the challenges in KOL analysis are around author/affiliation disambiguation. For instance, “Amy Kang” in life science will bring multiple different researchers that need to be correctly identified, as one is a nephrology researcher, while the other studies infectious diseases.
  • Proper knowledge representation is another critical factor. Identifying emerging game changers can be improved by visualizing FAIR, normalized data, and semantic enrichments.

To be more specific, keep reading to figure out how we address these issues, how much you will benefit by contacting us, and what SciBite/Elsevier can deliver to your research lab.

The Solution

In Elsevier/SciBite, we empower our customers with scientific data curation and FAIRification and also offer normalized Scopus data. It means you will have access to a superset of high-quality publications enriched with normalized entities, disambiguated author/affiliation/citations, and related patents.

The Elsevier Scopus team has addressed the issue of author and affiliation disambiguation by assigning each author and affiliation their own unique id to make them easy to search. Each abstract also contains normalized citations and a list of associated patents. And this is not all. TERMite has processed abstracts to identify and annotate scientific terms to get a more detailed picture of a given author’s specialist topics.

TERMite benefits from numerous public and customized ontologies and state-of-the-art natural language processing (NLP) pipelines. Our advanced vocabularies can cover various domains, including but not limited to drugs, indications, chemicals, genes, pathways, and Lab equipment. Wow! It’s all about fair and clean data! Exciting, isn’t it?

Collecting all the data, we have populated a knowledge graph to create a holistic view. Figure 1 depicts the graph schema.

Knowledge graph schema.

Figure 1: Knowledge graph schema.

We have considered seven types of nodes to describe a Scopus abstract: the abstract itself that can cite other abstracts, the category/class of the abstract, authors, affiliation, annotations (entities) plus their taxonomy data, and finally, recent patents that cited the abstract.

Integrating all data into a knowledge graph improves your view and interpretation in many different domains. Form coarse activities such as KOL analysis or idea monitoring to more fine matters, including drug repurposing and horizon scanning.

The proposed graph is easy to be queried for new inferences, for instance, looking for healthcare professionals and researchers who are new to the game yet need help to spot amongst the more established authors. Our approach allows you to search over several types of entities and taxonomies. Personalization of the graph requires little effort, and you can add entities, relations, or categories to regulate it for your applications.

What type of questions are we able to answer?

The presented names and numbers in this section may be partial and come from a limited number of research.

Use case 1:

I want to find the authors with the most publications on Alzheimer’s Disease.

Since TERMite annotates the entities before being added to the graph, you can quickly look for the research that has mentioned the Alzheimer’s Disease unique id, filter the results by authors, and finally sort them by the sum of their contributions.

Table 1: Influencers on Alzheimer’s disease
Table 1: Influencers on Alzheimer’s disease

Use case 2:

I am interested to find out the interaction among hub authors who collaborate externally on Alzheimer’s Disease research. This type of information is helpful in discovering active communities for collaboration.

It is very straightforward as well, look for author ids that have been involved in the same research but have different affiliation ids. It is also possible to rank them by the number of contributions.
Table 2: Author / Affiliation

Knowledge Graph of hub authors

Figure 2: Hub authors

Use case 3:

I am looking for researchers of drugs that can be used to treat Alzheimer’s disease.

First, we can search our graph for drug mentions in Alzheimer-related research. For instance, in Figure 2, two drugs related to cholinesterase inhibitor showed up. Subsequently, you can search the graph for all the authors’ participations (Table 2)

Knowledge Graph of Drugs that can treat Alzheimer’s Disease

Figure 3: Drugs that can treat Alzheimer’s Disease

Table 3: Author and Affiliation

We stored hierarchy of entities (if applicable); therefore, it is seamless to query for all the nodes in the same taxonomy, then look for authors that have been involved in the related research.

Use case 4:

What are the highly related genes to a rare disease such as Pemphigus and scientists that recently have researched them?

In this scenario, you may create a subgraph around recent Pemphigus research (based on publish date) and mentioned prototypes. Then, apply the PageRank algorithm to find the order of correlations (Table 3). Further, you can search the graph for involved authors/organizations.

Table 4: Correlated genes with Pemphigus disease

Table 4: Correlated genes with Pemphigus disease

Now, is it possible to find out which Authors have been involved in MT-RNR2 Research? Yes! We need to follow the authors related to the research.

Table 5: List of Authors who investigate connections between pemphigus Vulgaris (PV) and MT-RNR2 gene

Table 5: List of Authors who investigate connections between pemphigus Vulgaris (PV) and MT-RNR2 gene

And finally…

Finally, using our normalized and annotated Scopus data can help you find answers to your questions about the dynamic of your target market and research. You can speed up your processes by having inferences and fact-checked. As mentioned, we can offer a focused subset of Scopus data with your preferred customizations.

While we only showcase KOLs, multiple other types of questions can be answered using our data and technology. You can visit us later if you’d like to learn more about how we can help your business.


About Zahra Hosseini

Senior Data Scientist, SciBite

Zahra Hosseini, Senior Data Scientist. Holds a Ph.D. in machine learning from the Science and Research University of Tehran, focusing on Natural Language Processing (NLP) and knowledge discovery. She was an Assistant professor at Azad University of Isfahan for 7 years before switching to Industry. She has been with SciBite since 2021 as a part of the data science team.

View LinkedIn profile

Other articles by Zahra

1. [Blog] How SciBite technology can facilitate gene-disease relationship extraction read more.

Related articles

  1. How SciBite technology can facilitate gene-disease relationship extraction

    Image and link to LinkedIn profile of blog author Zahra Hosseini

    As genomic sequencing technologies get more advanced, large numbers of gene-disease associations have emerged. A gene with an unclear role within a disease is a source of ambiguity and can lead to misdiagnosis. In this blog, we demonstrate how semantic search technology can facilitate Gene-Disease Relationship Extraction.

  2. Large language models (LLMs) and search; it’s a FAIR game

    Headshot of Joe Mullen, SciBite

    Large language models (LLMs) have limitations when applied to search due to their inability to distinguish between fact and fiction, potential privacy concerns, and provenance issues. LLMs can, however, support search when used in conjunction with FAIR data and could even support the democratisation of data, if used correctly…


How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us