Loving the Data Others Don’t

Like it or loathe it, plain text is a goldmine of information. The challenge is that data mining is often complicated through ambiguity. Sure, identifying, disambiguating and extracting those scientific terms is a big challenge but we’ve got it covered.

Semantics bigdata

The Challenge of Datamining

Like it or loathe it, plain text is a goldmine of information. The challenge is that data mining is often complicated by ambiguity. Sure, identifying, disambiguating and extracting those scientific terms is a big challenge – but we’ve got it covered.

Fit for purpose

Our 80+ hand-curated vocabularies containing more than 20 million synonyms, enriched over any publicly available alternative, are just what’s needed for the job.

Combined with our entity extraction engine analysing over 1 million words per second (that’s the entire Harry Potter collection every second), you have a pretty powerful solution in front of you.

The end result

Taking Medline as an example: over 24 million articles of unstructured plain text. Using just 6 vocabularies we identified 121 million individual, disambiguated assertions in 4.5 hours (did we mention our tools are fast?) and that was just on a laptop.

Back to the topic of data mining, learn more about our text analysis engine.

Related articles

  1. TERMite v5.9 now available

    Announcing the latest version of our flagship text analytics software for life sciences, TERMite 5.9.

    Read
  2. The 5 Star of Structured Data

    Sir Tim Berners-Lee, the creator of the Internet, defined a 5-star deployment scheme for open data. In recent customer discussions, we’ve talked about a similar scheme to describe the status of data across their organisation and how text analytics can help contextualise unstructured data.

    Read

How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us