4 Literate Programming with Jupyter

Jupyter Notebooks are another tool we can use to interweave code and narrative to write more complete records of our work. A Jupyter Notebook document is actually a fancy JSON document. It contains an ordered list of input/output cells which can contain code, text (using Markdown), mathematics, plots, rich media, and uses the “.ipynb” extension. Notebook documents are both human-readable documents containing the analysis description and the results (figures, tables, etc..) as well as executable documents which can be run.

A kernel is a process running an interactive session. When using IPython, this kernel is a Python process. There are kernels in many languages other than Python. In Jupyter, notebooks and kernels are strongly separated. A notebook is a file, whereas a kernel is a process.

The kernel receives snippets of code from the Notebook interface, executes them, and sends the outputs and possible errors back to the Notebook interface. A notebook is persistent (it’s a file), whereas a kernel may be closed at the end of an interactive session and it is therefore not persistent. When a notebook is re-opened, it needs to be re-executed.

You can use A TON of languages with Jupyter Notebooks, provided you can get the kernel (which is waaay harder for the proprietary languages): Python (of course), R, SageMath, Bash, Octave, Julia, Haskell, Ruby, JavaScript, Scala, PHP, GO, and many more. When installed as Jupyter kernels each language becomes accessible in the same way, using the same notebook interface. You can’t mix programming languages in one notebook, however. One notebook = one language.

Jupyter notebooks work with basically two parts:

  1. Web Application (notebook + kernel)
  • In-browser editing for code with auto-syntax highlighting, indentation, tab completion/introspection
  • In-browser code execution, with results attached to the code that generated them
  • Display results of computation in rich media (LaTeX, HTML, SVG, etc.)
  1. Notebook (document)
  • A complete computation record of a session, interweaving executable code with text, maths, and rich representations of objects
  • Can export to LaTeX, PDF, slideshows, etc. or available from a public URL that renders it as a static webpage

Jupyter notebooks are what some call an “executable paper” because of the functions outlined above. However, the same problems occur where computing environments differ, so you have to take extra steps to make these reproducible, which we’ll see in the next module.

Examples:

CONVERSATION BREAK 1:

Do you see any similarities between the notebooks? What do you like/dislike about the way they are structured?


CHALLENGE 1:

  1. Download this notebook: https://github.com/arokem/visual-white-matter/blob/03336b9b24f6ad453ad50a0f4284cbbecb24f55e/download-data.ipynb
  2. Upload it to our class Jupyter Lab instance.
  3. Try to run it!
  4. If that works, run this notebook in our class instance next: https://github.com/arokem/visual-white-matter/blob/03336b9b24f6ad453ad50a0f4284cbbecb24f55e/dMRI-signals.ipynb
  5. Raise your hand to show you’ve finished!


4.1 Getting started

You should all have access to our class JupyterLab instance here: https://reproduce.jupyter.med.utah.edu/. If you don’t, give me your GitHub user name and I will add you real quick.

If you want to run Jupyter notebooks locally, I recommend installing Anaconda 3.6 – it has a lot of packages pre-installed that are great for research, like pandas, scipy, numpy, matplotlib, etc.

Once you are in our class interface, you should see something like this: Class Jupyter interface

The dashboard contains several tabs:

  • Files shows all files and notebooks in the current directory.
  • Running shows all kernels currently running on your computer.
  • Clusters lets you launch kernels for parallel computing.

Click that button on the top right corner that says ‘New’ and select Python 3. A blank notebook should then launch: Blank notebook

Then you get your brand-new notebook!

The layout of the notebook | source

The layout of the notebook | source

The main components of the interface, from top to bottom:

  • The notebook name: you can change by clicking on it. This is also the name of the .ipynb file.
  • The menu bar gives you access to several actions pertaining to the notebook (like saving it!) and the kernel (like restarting it!)
  • To the right of the menu bar is the Kernel name. You can change the kernel language of your notebook from the Kernel menu.
  • The toolbar contains icons for common actions. In particular, the dropdown menu showing Code lets you change the type of a cell.
  • Below is the actual Notebook. It consists of a linear list of cells. You should only run your notebook from top to bottom – ONLY.

First, let’s change the name from ‘Untitled’ to something useful. There’s a running gag in the data science community that at any given time, a data scientist has a series of Jupyter notebooks that look like:

Untitled.ipynb
Untitled1.ipnyb
...
Untitled31.ipynb

CHALLENGE 2:

  1. Change the name of your jupyter notebook.
  2. Raise your hand to show you’ve finished!


You can see the notebook itself consists of cells – we have one to start out with. Once we double click on a cell, we are in insert mode. This means that we are able to edit the cells, just as you would if this were a word document. We can tell that we are in insert mode because of the green border around the cell.

When we’re in a Jupyter notebook, there are some useful shortcuts to get us started:

  • esc in highlighted cell to toggle command options:
    • esc + l - show line numbers (that’s a lowercase L)
    • esc + m - format cell as Markdown cell
    • esc + a - insert cell above current cell
    • esc + b - insert cell below current cell
  • shift + enter - run an active cell
  • command + z - undo (macOS)
  • control + z - undo (Windows)

CHALLENGE 3:

  1. Add line numbers to your jupyter notebook.
  2. Insert a cell below the current cell you’re on.
  3. Raise your hand to show you’ve finished!


Ok, so let’s edit a cell! Once you’ve double clicked on the first cell, let’s write some text about what this notebook is going to do. The cells default to code, so we need to press esc + m to change the cells to markdown. This is how we’ll write our narratives in between code cells! Since in our last challenge we added an extra cell, I made them both markdown:

Writing markdown cells in Jupyter

Writing markdown cells in Jupyter

Then, let’s add a cell beneath our markdown cells (ctrl + b) – in this next cell, we can begin our analysis and enter some basic Python code!

print('hello world')

Then click shift + enter to run it! You should see the output of your code immediately below the cell that generated it:

The results from code execute beneath the cell that generates them

The results from code execute beneath the cell that generates them

We are going to some basic code & plots in our notebook, so we have something pretty to export at the end of the session! To do that, we first need to import the relevant Python libraries for our plotting and data analysis:

from pylab import *
import matplotlib
import matplotlib.pyplot as plt
import numpy as np

Ideally, this comes first in the notebook, but this is an intro class, so some concepts need to come first!

In the following code cell, type this to ensure that our plots are inline in the notebook, instead of exported out of it as standalone images:

%matplotlib inline

Each cell should represent a conceptually different process. You can separate code cells with markdown explaining what it does.

CHALLENGE 4:

  1. Add a markdown cell above our cell that tells matplotlib to stay inline.
  2. Add some text about what the notebook will do next!
  3. Raise your hand to show you’ve finished!


Editing our new cells

Editing our new cells

In the next code cell we are going to assign two variables, so let’s add a markdown cell that says that!

# Plot Variables
In the next cell, I am going to assign the variables make the X and Y axises for my plot.

Then we can add our actual code cell right afterwards:

x = np.linspace(0, 5, 10)
y = x ** 2

After making our variables, we can then make our plot! But first, an expository markdown cell!

# Line plot
In the next cell, I am going to plot my variables x and y using matplotlib.

The `r` in my plot() means that the line is going to be red.

The next code cell is solely responsible for building the plot:

figure()
plot(x, y, 'r')
xlabel('x')
ylabel('y')
title('title')
show()
Editing our new cells

Editing our new cells

# Changing plot colors
I want to test out what colors look best in my plot, so I am going to change them!

The `r--` in my plot() means that the line is going to be dashed red line. The `g*-` means it will be a green line of asteriks.

The next code cell is solely responsible for building the two new plots:

subplot(1,2,1)
plot(x, y, 'r--')
subplot(1,2,2)
plot(y, x, 'g*-');
Editing our new cells

Editing our new cells

Now that we’re done editing all our cells, we can execute the notebook from top-to-bottom via the Cell menu > Run all.

Executing our new cells

Executing our new cells

4.2 Saving & Exporting

Jupyter autosaves your notebook, but just to be sure I always save after anything important. You can go to File > Save & Checkpoint or press ctrl + s.

I also recommend Save & Checkpoint because you can revert back to previous checkpoints in case something breaks! File > Revert to Checkpoint and then you select the checkpoint you want to go back to, which is labeled with date/time.

One of the best things about Jupyter notebook is that you can export the notebook in a variety of formats:

  • PDF - it’s executed via LaTex but you don’t have to touch it
  • HTML - a static rendering for the web
  • Python - a python script
  • LaTex - if you want the raw LaTex to apply a
  • reST - reStructuredText, another text format
  • Markdown - like how I am writing this book!

A lot of folks I know actually write their blog posts in Jupyter notebooks, then export it for their website!

CHALLENGE 5:

  1. Export your Jupyter notebook as a PDF
  2. How does it look? Would you write a paper in Jupyter?
  3. Raise your hand to show you’ve finished!


4.2.1 Moving a Notebook to the Web

You might have noticed some of our

  • NBViewer: for static rendering of notebook files
    • Put notebook (.ipynb) file on the web (e.g. Github, Gitlab…somewhere so that URL is http://NAME-OF-NOTEBOOK.ipynb
    • Enter the URL into NBViewer.
    • Click Go! (check out example here)
  • Binder: for interactive rendering of notebook files!
    • You need a repository of jupyter notebooks plus a requirements.txt file that lists all the python libraries and version of python you use for your notebooks.
    • Enter your repository information (a URL to a GitHub repo with Jupyter Notebooks) in Binder
    • Binder builds a Docker image of your repository using a requirements.txt file from the repository.
    • Binder builts the notebook environment for you, and lets you interact with your notebooks in a live environment!
    • You can also get a reusable link and badge for your live repository that you can easily share with others, like so: https://hub.mybinder.org/user/tiesdekok-learnpythonforresearch-zjynb3wo/tree
  • GitHub Pages or GitLab Pages:
    • You can use a static site generator like Nikola or Jekyll to blog with Jupyter notebooks!
    • If you want a more simple approach, just export oyur notebook as index.html, put it in a repository, and configure the settings in your repository for pages (links above for tutorial).
    • Example: GitLab pages

4.3 CONGRATS

4.4 Further Reading