![]() |
![]() |
The highly informal language used in social media posts is challenging to analyse. Recognizing this need, SciBite has created a machine learning-based model capable of identifying adverse Drug Reactions associated with medications from the informal language found in social media posts.
Social media posts may not seem like a traditional channel for reporting adverse drug reactions (ADRs), but patients are increasingly turning to Facebook, Twitter, and forums like Reddit, to report these experiences. This presents pharmacovigilance experts tasked with monitoring the safety of medicines with an additional source of information.
“Mining data from social media gives us a greater chance of capturing adverse drug reactions that a patient wouldn’t necessarily complain about to their doctor or nurse…” says David Lewis, head of global safety at the Switzerland-based pharmaceutical company, Novartis. Added to this, “Patients are great at reporting subjective reactions and feelings… a psychiatrist can’t see suicidal ideation as an ADR while a patient can describe it perfectly”.
However, the highly informal language used in this type of real-world evidence (RWE) is challenging to analyse. The following graphic describes some of the differences in language used to describe adverse drug events from official regulatory channels compared to the colloquial language found in social media.
Figure 1: Adverse drug reactions reported through formal regulatory channels vs. the informal language found in social media posts.
Recognising this need, SciBite has created a machine learning-based model capable of identifying adverse events associated with medications from the informal language found in social media. Specifically, we have trained a machine learning model from social media posts that mentioned adverse events related to a COVID‑19 vaccine. This model was also able to identify adverse reactions of other drugs since it had been trained to “understand” this context.
The first step of our method was to manually identify different social media sources which mentioned the COVID‑19 vaccine with side effects. In this phase, Facebook and Reddit posts where users mentioned receiving a COVID‑19 vaccine and subsequently reported an adverse reaction were collated. Examples of these posts are displayed in Figure 2.
Figure 2: Real-world evidence (RWE) social media posts that include adverse events associated with a COVID‑19 vaccine.
A training data set, prepared by curating these social media posts with adverse events, was used to create a paragraph-level adverse event RWE named entity recognition (NER) model. The following example (Figure 3) shows how adverse events, highlighted in green by our curators, were used as part of the training data set to produce the adverse event RWE model.
Figure 3: Example of manually curated training data where adverse events are highlighted in green.
This model correctly identifies adverse events (true positive values) in the corresponding paragraph in SciBite AI, as displayed in Figure 4.
Figure 4: Model identifies adverse events in a test set of unseen data, as shown by SciBite AI.
The adverse event RWE model was exposed to novel social media posts in the final testing phase. As well as being able to annotate adverse events associated with the COVID‑19 vaccine, this model was capable of indexing side effects mentioned with other drugs.
The following example (Figure 5) demonstrates how the ADRs (headaches, migraines, shakiness) associated with an anti-asthmatic drug were identified from a social media post using SciBite’s adverse event RWE model.
Figure 5: Results from the testing phase – NER model identifies adverse events associated with asthma drugs in social media post.
This study shows how SciBite AI was used to identify and annotate adverse event data from the highly informal language found in real-world evidence such as Facebook and Reddit. While it can be difficult to draw meaningful conclusions from individual reports of adverse events, crucially, this solution presents pharmacovigilance experts with the ability to collate frequently reported similar side effects to identify emerging AE trends associated with new treatments.
SciBite’s automated text annotation tooling allows researchers to mine data for a host of use cases and can be integrated to support expert-led research activities. The SciBite AI platform can be used to develop models, allowing customers to analyse and exploit complex and idiomatic data sets in life science research and development.
Download our SciBite AI use case or for more information, please get in touch with the team here.
![]() |
![]() |
In this blog hear how our SciBite AI team demonstrated a de novo vocabulary approach for generating a machine learning model, allowing researchers to identify and annotate text containing mutant descriptors.
ReadSciBite CSO and Founder Lee Harland shares his views on why ontologies are relevant in a machine learning-centric world and are essential to help "clean up" scientific data in the Life Sciences industry.
ReadGet in touch with us to find out how we can transform your data
© SciBite Limited / Registered in England & Wales No. 07778456