SciBite-Toolkit: Our Python library to assist your semantic workflow

SciBite / News / SciBite-Toolkit: Our Python library to accompany your semantic workflow development

SciBite-Toolkit: Our Python library to accompany your semantic workflow development

Introducing the revamped scibite-toolkit: Our SciBite python library has levelled up! With a new name and added functionality, the SciBite-toolkit aims to be your companion in making the most of your SciBite platform. We will dive into this toolkit today, but first, let’s have a look at why we selected Python as the language for this library.

Why Python?

It has been my language of choice while doing my PhD and now in my career in industry. And I am not alone, Python is indeed a widely adopted programming language around the world, as shown by many statistics. To just cite one, the TIOBE index of most popular programming languages has it on the top (and has been there consistently for a while, sharing that spot closely with C and Java various flavors).

The reasons for the success of these languages are varied and not the main topic for today’s piece, but I want to comment on those from my personal experience with it.

1. Low entry barrier

Python has a very low entry barrier. Learning a few bits to get some automation done to feeling like you have progressed takes a matter of hours. I’ve supported small introductions on meetups or at introductory courses at the university and seen how, with a few useful examples and pointers, people can get the gist of it very quickly.

For when you are a bit more advanced or must do slightly more complex prototyping, Python has much less boiler-plate coding than other languages and the time to get something in place is quite low.

2. Ecosystem of libraries

A very important part of its success also comes from the ecosystem of libraries that has grown around it. For example, as much as I’m a fan of Python, for certain types of applications, its performance might not be enough as compared, let’s say to a compiled programming language. It might be that in those situations, you need to resort to a more performant programming language (e.g., good old FORTRAN or shiny Julia).

However, you also can make use of libraries designed to optimize Python in some of those situations (such as Numpy, Numba, etc). Not only that, but projects around specific use cases are widely used in the scientific and technical community and hence have wonderful documentation and support (think of Django for web development and TensorFlow in machine learning).

3. Integration with other languages and technologies

Another big win from Python is its ability to integrate with other languages and technologies. You’ll have from libraries to interface between them (think Numba and C, Rpy for interacting with R) to libraries provided by vendors and tech developers to interact programmatically with their software (e.g., boto3 from AWS).

From the perspective of data science and machine learning in general, Python is a common choice amongst its practitioners, and you’ll find many resources, courses, and examples within this area that use Python as its vehicle.

4. Community and social aspects

Last, but not least, there is a point that is not technical, but I believe it is key, particularly when you’re starting, which is that the community that shapes Python is very large and welcoming, and super active. From a social perspective, there are Python community chapters all over the world. They host events and specific conferences (PyCons, PyDatas, EuroPythons, SciPy confs, meetups) where, whether you’re a beginner or a seasoned programmer, you’ll find chances to learn more, ask questions, and discover more about it.

Not only that but it is a very diverse community that makes good use of it. For example, I participated in a project to translate Python docs to Spanish, and many languages have their translation of the documentation thanks to volunteers who contributed to it.

The birth of SciBite-Toolkit

My SciBite colleagues and I are no exception to the scientific and tech community. We have Python engrained in our workflows and everyday work. The SciBite toolkit was born not as a product or even a customer-facing library but rather as a set of Python functionality that would make our internal research, development, and customer support faster. As our customer base grew, so did the toolkit codebase, and we thought it would be helpful to share it with the customers.

API first approach

One of the features of our software that we have pushed hard to ensure, is that every tool is truly API-first. We recognized very early on that as foundational data management components, our solutions will need to play well in between them as well as with third-party components in the kind of workflows that will bring the best out of it.

The SciBite toolkit has been around for a while now and many colleagues from SciBite have contributed to it. In our next release, 1.0.0, we will be having a long-due change of name – it was formerly named termite-toolkit, due to its origins. This version contains the same modules that were previously available and that enable interacting with TERMite, Workbench and SciBite Search but also starts to introduce functionality to interact with CENtree search endpoint.

The toolkit is object-oriented with modules to interact with each of our programs and then some extra functionalities. In each module, you can instantiate a request builder object that will take care of authenticating into your server and keep track of the headers, etc. Then, the object contains functions that not only wrap the request calls to some of the API endpoints but also post-process their JSON responses to either other data formats such as data frames or by manipulating them in a useful way. Additional static functions are provided in some of the modules to perform manipulations that might not need to interact with the server.

With the set of functionalities provided by the toolkit, our licensed customers can set up workflows and applications that build on top of the SciBite platform, such as:

Dashboard applications looking into the scientific literature and other data hosted in SciBite Search to find aggregated co-occurrences of genes and diseases to prioritize further research on those genes as targets. Performing in a programmatic way calls to the search and aggregation endpoints and processing the output scores, entities, and snippets to feed into the dashboard.
Scripts that make the same type of search in SciBite Search (for example, mentions of certain toxicological events) but iterates over different main entities (e.g., chemicals for the toxicology use case), and outputs to a dataframe/excel kind of format for expert review.
Scripts that, after having established a desired configuration of vocabs and rules in Workbench, processes multiple files containing experimental metadata and annotate it with the same standards in a programmatic fashion.
Scripts that use TERMite named entity recognition (NER) to enrich data on the fly obtained from an internal source.

Explore the new scibite-toolkit

If you are a licensed customer and already use our APIs, we suggest you give the toolkit a try! It is an ongoing project in active development so new functionality will be added and we’re keen to hear what will be more useful for you. Give us a shout if you try it, and let us know what do you think and which enhancements would be nice to add.

A Python community – PyLady

Disclaimer: I’m a PyLady at heart. The Python programming language moved from something cool and helpful when I was finishing my degree (and such an improvement over Perl and others I played with at the time) to a wonderful community project that I got to know in depth thanks to the local events that the PyData and PyLadies chapters did on the cities I lived since then.

Resources

– Pip project page (old): https://pypi.org/project/termite-toolkit/
– Pip project page (new): https://pypi.org/project/scibite-toolkit/

Claudia Millán

Technical Consultant, SciBite

Claudia holds a Ph.D. from the University of Barcelona in the development of computational methods for structural biology, a field in which she has worked for more than eight years. She has been with SciBite since 2022, supporting customers on their projects and helping them make the most of SciBite technology.

Other articles by Claudia:

SciBite-Toolkit: Our Python library to accompany your semantic workflow development: read more
Toxicology keeps us safe using scientific evidence; read more
Accurate prediction of protein structures and interactions using a three-track neural network, Science, 2021, 373, Issue 6557, 871-876; DOI: 10.1126/science.abj8754

Share this article

Relevant resources, events and news

News Toxicology keeps us safe using scientific evidence

Toxicology – What do medical devices and medicines, food products and cosmetics, household products and even smoking devices have in common?