When engaging in semantic search, many researchers opt for the use of article abstracts over full-text articles as they are easily accessible via biomedical databases like MEDLINE and are in XML, a format used widely for encoding documents so that computer programs can parse. Even though using abstracts seems like a reasonable approach, there are major advantages to searching across the full text of an article. For example, abstracts often don’t include essential facts and relationships, access to secondary findings, and adverse event data.
While abstracts do provide some valuable information, researchers need access to full-text articles to get the best results from semantic search efforts.
Full-text articles also contain more relationships between named entities than abstracts. According to a study published in the Journal of Biomedical Informatics, only 8% of the scientific claims made in full-text articles were found in their abstracts.
The same Elsevier study compared the use of abstracts and full-text articles to derive relevant information about drugs and proteins that affect the progression of fibromyalgia. They found 31 relationships in the literature by mining abstracts and an additional 53 relationships when they ran the same search across the full-text articles.
A recent study conducted by bioinformaticians at University of Copenhagen and the University of Denmark confirms that vital information goes undiscovered when mining abstracts rather than full-text articles. Using a named entity recognition system, the team analyzed more than 15 million full-text scientific documents and their abstracts published between 1823 and 2016 and compared their full-text findings to corresponding results from a matching set of MEDLINE abstracts.
The team extracted protein-protein, disease-gene, and protein subcellular associations. In every case, the results showed that mining the full-text article corpus outperformed the same analysis using abstracts only. The biggest performance gain in mining full-text articles was the associations found between diseases and genes (see figure below).
While article abstracts yield some information, there are limitations to what can be discovered through that process. Researchers need access to the full text of the articles to ensure they don’t miss vital data and undiscovered assertions that can lead to new discoveries.
CCC (Copyright Clearance Center) and SciBite offer an integrated solution to help organizations improve the results of semantic enrichment initiatives, reduce costs and simplify copyright compliance. For more information, visit www.copyright.com.
In this blog we cover how to look potentially reduce the cost of and speed up the repurposing pipeline.Read
Just released by the Copyright Clearance Center, a semantic search solution applied to full-text articlesRead
Get in touch with us to find out how we can transform your data
© SciBite Limited / Registered in England & Wales No. 07778456