DOCstore 1.2 – the semantic search tool is released and live

Search just got a whole lot more powerful. DOCstore enables researchers to harness the power of semantic analysis search to rapidly and comprehensively scan multiple biomedical sources.


It’s here! DOCstore v1.2 is ready with a whole host of new features, including an all new Connectors package.  Powered by elastic, DOCstore provides faceted, semantic search for unstructured data.  The ideal tool for a range of roles, from bench scientists to business analysts, it allows you to:

  • Create a highly enriched, more analytical in-house version of Medline.
  • Combine multiple data sources such as grants, trials and literature into a broad literature search tool
  • Create bespoke project team databases; organize your team’s documents in a relevant way.
  • Build an intelligent business/competitive intelligence platform to share information across your organisation

We thought it would be the perfect time to share the new features with you, so here’s a rundown of what you can expect.

  • Advanced Search
  • Customisation of the User Interface
  • Optional Google Analytics Integration
  • Better memory profile
  • DOI field now searchable
  • API additions
  • Text attributes added to the data model
  • Connectors – automated data management pipeline

These developments are a direct result of our customers’ feedback – as always at SciBite, we love hearing from clients as to how we can make things even better.

Let’s look at each one a bit more closely.

Advanced Search

Search just got a whole lot more powerful with the ability to add multiple queries on multiple fields.  So, for example, if I wanted to search for documents that mention ‘PDE5A’ in the title, but also mention ‘university’ in the organisation field I can now do this using the Advanced Search feature.

I can also filter on publication date or document index date.

And that’s not it.  Additional filters exist for SubSource , Project ID and SubProject ID fields if they’re populated.  Each of these fields represents a way to sub categorise a document.  They’re settable at DOCstore load time and allow you to hold multiple copies of the same underlying document, perhaps indexed in different manners, under different contexts.

Customisation of the User Interface

You can now customise the user interface by supplying custom HTML snippets in the configuration.  Examples include adding bespoke links in the ‘Explore’ dropdown, or adding an icon to the results panels with an icon linking back to the source document.

Additionally, DOCstore is now able to serve static content such as original PDF files.  If you’ve indexed PDF files, you can now link to the original ones and have them show up in your browser.

Optional Google Analytics Integration

If you want to integrate Google Analytics into your DOCstore server to monitor patterns of usage, you can.  Extremely useful to see what it’s being used for and when, which we know matters to organisations.

Better memory profile

DOCstore can now be run in 2G of memory with Medline and data.  That’s about a 6x saving compared to DOCstore v1.1 for this dataset.

DOI field now searchable

The doi field is now included in the list of searched fields, making it possible to search for articles that have a doi.

API Additions

There is now more comprehensive input validation to the REST API, and better error messages.  There is also the ability to add document unique identifiers into Co-occurrence Matrix API calls for the top 200 (sorted on publication date) documents that fulfill the co-occurrence criteria.

Fig 6 DOCstore 1.2 768x40

You can also do document or sentence level searches and retrieve only document metadata, such as ids, sources etc, rather than the entire documents themselves.  This reduced payload option is ideal if you only need to use a small part of the data from each returned data set.  You can now simply get the data you require, instead of a huge amount of other information that would just cost transfer time and slow down your processing.

A new operation was added for this to happen on the sentence level:

and a new parameter for the document level:

example output:

Text Attributes added to the data model

Attributes in the termite output ‘attributes’ section are now stored as key-value text in DOCstore.  They’re not yet searchable (you’ll have to wait for the next version for that), but they’re returned in the data and if you apply the customisations detailed above to the user interface, you can see the data there.

Colour coded entities in the User Interface

There are now visual cues as to the entity types in the User Interface, such as different coloured underlines.


Now you can take the pain out of maintaining your data pipeline into DOCstore.  Imagine being able to automate the management of that data pipeline without manually setting up, checking and approving each update, or having to outsource the task.

Without Connectors, scripting is necessary maintain the load pipeline.  Usually, this involves separate scripts to:

  1. Fetch eg. Medline data
  2. Annotate (run TERMite on it)
  3. Load that output to DOCstore

With Connectors, this is all handled via a web-based user interface with no command line access required.

And that’s not all.  The parameters required to run TERMite vary according to data sources. Again, Connectors helps here, where it will suggest the appropriate ones for the right data source.

Connectors moves the burden of loading DOCstore from the hands of the IT technician to the scientist.

Other features include:

  • Run updates regularly or everyday, with the option to schedule more precisely
  • Control for you – you define the pipeline and the number of steps
  • Intelligent – utilising a checkpoint system, Connectors will return to the last sound update point, should anything go awry
  • Non expert and expert modes
  • Extensible architecture – simply plug in code into the pipeline

And that’s DOCstore 1.2 .  Faster, more efficient search, allowing you to cut out the noise of unwanted information and customise what you see.  And with Connectors, once again, our developments are democratising data management for the life sciences.

To find out more about how DOCstore and the rest of the SciBite platform can transform your data, get in touch with the team today.

Related articles

  1. Drug repurposing, rare diseases and semantic analytics

    In this blog we cover how to look potentially reduce the cost of and speed up the repurposing pipeline.

  2. SciBite are finalists in Bio-IT World 2018’s Best Practices Award

    We’re finalists! SciBite has been shortlisted for Bio-IT World's prestigious Best Practices #Award at the upcoming Expo on 15-17 May. We’ve been nominated for our ground breaking collaborative project with Pfizer, ClassifR.


How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us