Jupyter Notebook Fatigue

11 Feb 2025

I have written a lot of Jupyter notebooks, about 1,300 in total the last time I checked, and I don’t want to do it any more. I’ve tried a few different tools, including the regular Jupyter Notebook web interface, Jupyter Lab, Jupytext, VSCode’s Jupyter plugin and PyCharm’s notebook and scientific mode. Nothing really felt great.

Maybe it’s because I picked up Data Science many years after becoming a Software Engineer, but I have never liked the linear workflow that notebooks suggest, not to mention how easy that rule is to break. I want to build out components of the analysis then compile them into a single report.

It’s also virtually impossible to read old notebooks, let alone run them. Code, output and exposition are all mixed together, and grepping is a nightmare because notebooks are serialized as JSON. (I’ve cooked up some jq incantations that should never see the light of day, but those weren’t enough.)

I’ve contemplated whether or not this is a skill issue on my part, but I’ve come to the conclusion that there’s just something about the format that encourages sloppy output. After about ten cells of non-trivial work, you end up with many potential branches for the analysis, charts and tables that are not entirely useful, but also not worth deleting (yet), and multiple versions of models with various parameter tweaks.

An append-mostly log is not a good data structure for this kind of thing. You really need a tree. Folding cells can help, but that’s too ephemeral; the structure needs to be baked in. You can drill down to parts of a tree in isolation, without worrying about (or affecting) everything else.

Jupytext is an improvement, but still suffers from the linearity problem, and you still have to use the notebook format if you want to solve output. PyCharm scientific mode was the best of the approaches that I tried, which allowed for this pattern, mostly because it doesn’t enforce a linear sequence of notebook cells. But it doesn’t really enforce anything. This led to reinventing some kind of structure for each analysis, which ended up just as messy as notebooks.

I just wanted to do my analytics work in the terminal, using tools that I already use for software development. All of these tools are more or less nice wrappers around the IPython shell, so that firing that up alongside vim in a tmux session actually wasn’t too bad - more or less equivalent to the PyCharm scientific mode, notwithstanding its nice management of charts and data frames.

Missing those features was kind of annoying at first, but after a while, it occurred to me that they were more of a hindrance than a help for my goal: do tree-like analyses, which can compile to reports and be readable later on. This new approach forced me to factor the code that generated those charts and tables into functions, choose good names for things, and store output in a well-defined location.

Now, that might sound like extra friction that slows down the process. But it’s not. The extra friction forces you to do the thing that you’re supposed to be doing: thinking. Plus you end up with a nice library of composible functions, which makes compiling the final report much easier.

I packaged this up into a little library called jove - (shorter Jupiter, get it?) - which bootstraps a basic directory structure:

(base) ➜  /tmp jove start myanalysis
INFO:jove:Created /tmp/myanalysis
INFO:jove:Created /tmp/myanalysis/data
INFO:jove:Created /tmp/myanalysis/figures
INFO:jove:Created /tmp/myanalysis/README.md
INFO:jove:Created /tmp/myanalysis/libjove.py
INFO:jove:Created /tmp/myanalysis/code.py
INFO:jove:Created /tmp/myanalysis/shell.sh
  • data is where CSVs, JSON files, etc. go
  • figures is where chart PNGs go
  • code.py is where analysis code / functions go
  • libjove.py contains some helper functions
  • shell.sh boots up an IPython shell and loads libjove.py and code.py
  • README.md contains all the analysis exposition / notes

The libjove.py file contains a few functions, which make it easier to get code and data out of the IPython session:

  • save_csv saves a DataFrame to CSV to the data directory, using sequential numbering (e.g. data/table-0.csv, data/table-1.csv, etc.)
  • save_fig saves a Matplotlib Figure to the figures directory, using sequential numbering (e.g. figures/fig-0.png, figures/fig-1.png, etc.)
  • save_wip clears the IPython history and dumps the code into code.py, so it can be refactored into functions

The main document is named README.md so you can still get a nicely rendered document when looking at the directory in GitHub. As analysis progresses, tables and figures can be saved to their respective directories, and linked in the README.md file.

The library is available here, if you’re interested. It’s not particularly impressive at the moment, but it’s at least a good starting point for structured analyses outside of the notebook format: