Search just got a whole lot more powerful. DOCstore enables researchers to harness the power of semantic analysis search to rapidly and comprehensively scan multiple biomedical sources.
It’s here! DOCstore v1.2 is ready with a whole host of new features, including an all new Connectors package. Powered by elastic, DOCstore provides faceted, semantic search for unstructured data. The ideal tool for a range of roles, from bench scientists to business analysts, it allows you to:
We thought it would be the perfect time to share the new features with you, so here’s a rundown of what you can expect.
These developments are a direct result of our customers’ feedback – as always at SciBite, we love hearing from clients as to how we can make things even better.
Let’s look at each one a bit more closely.
Search just got a whole lot more powerful with the ability to add multiple queries on multiple fields. So, for example, if I wanted to search for documents that mention ‘PDE5A’ in the title, but also mention ‘university’ in the organisation field I can now do this using the Advanced Search feature.
I can also filter on publication date or document index date.
And that’s not it. Additional filters exist for SubSource , Project ID and SubProject ID fields if they’re populated. Each of these fields represents a way to sub categorise a document. They’re settable at DOCstore load time and allow you to hold multiple copies of the same underlying document, perhaps indexed in different manners, under different contexts.
You can now customise the user interface by supplying custom HTML snippets in the configuration. Examples include adding bespoke links in the ‘Explore’ dropdown, or adding an icon to the results panels with an icon linking back to the source document.
Additionally, DOCstore is now able to serve static content such as original PDF files. If you’ve indexed PDF files, you can now link to the original ones and have them show up in your browser.
If you want to integrate Google Analytics into your DOCstore server to monitor patterns of usage, you can. Extremely useful to see what it’s being used for and when, which we know matters to organisations.
DOCstore can now be run in 2G of memory with Medline and ct.gov data. That’s about a 6x saving compared to DOCstore v1.1 for this dataset.
The doi field is now included in the list of searched fields, making it possible to search for articles that have a doi.
There is now more comprehensive input validation to the REST API, and better error messages. There is also the ability to add document unique identifiers into Co-occurrence Matrix API calls for the top 200 (sorted on publication date) documents that fulfill the co-occurrence criteria.
You can also do document or sentence level searches and retrieve only document metadata, such as ids, sources etc, rather than the entire documents themselves. This reduced payload option is ideal if you only need to use a small part of the data from each returned data set. You can now simply get the data you require, instead of a huge amount of other information that would just cost transfer time and slow down your processing.
A new operation was added for this to happen on the sentence level:
and a new parameter for the document level:
example output:
Attributes in the termite output ‘attributes’ section are now stored as key-value text in DOCstore. They’re not yet searchable (you’ll have to wait for the next version for that), but they’re returned in the data and if you apply the customisations detailed above to the user interface, you can see the data there.
There are now visual cues as to the entity types in the User Interface, such as different coloured underlines.
Now you can take the pain out of maintaining your data pipeline into DOCstore. Imagine being able to automate the management of that data pipeline without manually setting up, checking and approving each update, or having to outsource the task.
Without Connectors, scripting is necessary maintain the load pipeline. Usually, this involves separate scripts to:
With Connectors, this is all handled via a web-based user interface with no command line access required.
And that’s not all. The parameters required to run TERMite vary according to data sources. Again, Connectors helps here, where it will suggest the appropriate ones for the right data source.
Connectors moves the burden of loading DOCstore from the hands of the IT technician to the scientist.
Other features include:
And that’s DOCstore 1.2 . Faster, more efficient search, allowing you to cut out the noise of unwanted information and customise what you see. And with Connectors, once again, our developments are democratising data management for the life sciences.
To find out more about how DOCstore and the rest of the SciBite platform can transform your data, get in touch with the team today.
In this blog we cover how to look potentially reduce the cost of and speed up the repurposing pipeline.
ReadWe’re finalists! SciBite has been shortlisted for Bio-IT World's prestigious Best Practices #Award at the upcoming Expo on 15-17 May. We’ve been nominated for our ground breaking collaborative project with Pfizer, ClassifR.
ReadGet in touch with us to find out how we can transform your data
© SciBite Limited / Registered in England & Wales No. 07778456