By Jupyter–Is This the Future of Open Science?

jupyter logo

Taking the scientific paper to the next level.

In a recent article, I explained why open source is a vital part of open science. As I pointed out, alongside a massive failure on the part of funding bodies to make open source a key aspect of their strategies, there's also a similar lack of open-source engagement with the needs and challenges of open science. There's not much that the Free Software world can do to change the priorities of funders. But, a lot can be done on the other side of things by writing good open-source code that supports and enhances open science.

People working in science potentially can benefit from every piece of free software code—the operating systems and apps, and the tools and libraries—so the better those become, the more useful they are for scientists. But there's one open-source project in particular that already has had a significant impact on how scientists work—Project Jupyter:

Project Jupyter is a set of open-source software projects that form the building blocks for interactive and exploratory computing that is reproducible and multi-language. The main application offered by Jupyter is the Jupyter Notebook, a web-based interactive computing platform that allows users to author documents that combine live code, equations, narrative text, interactive dashboard and other rich media.

Project Jupyter was spun-off from IPython in 2014 by Fernando Pérez. Although it began as an environment for programming Python, its ambitions have grown considerably. Today, dozens of Jupyter kernels exist that allow other languages to be used. Indeed, the project itself speaks of supporting "interactive data science and scientific computing across all programming languages". As well as this broad-based support for programming languages, Jupyter is noteworthy for its power. It enables users to create and share documents that contain live code, equations, visualizations and narrative text. Uses include data cleaning and transformation, numerical simulation, statistical modeling, data visualization and machine learning.