Simple and quick Jupyter notebook environment setup

Kiril Aleksovski
3 min readMar 5, 2021
Photo by Sigmund on Unsplash

In this short and concise article, I try to explain a short and simple way that can be used to set up a Jupyter environment for our everyday DS/ML work.

I try not to use too many command outputs in order to keep things clean.
The main tools used are Docker and docker-compose with the corresponding docker images from official Jupyter repos like Github and Docker Hub. This should work no matter whether it’s Linux or Windows 10 machine.

Let us begin with setting up Docker first, which should be fairly easy these days.

  1. Install and configure docker
  • Following the official documentation on docs.docker.com
    The easiest way is to use this convenience script from docker.com.
  • ̶c̶h̶e̶c̶k̶ ̶m̶y̶ ̶p̶r̶e̶v̶i̶o̶u̶s̶ ̶a̶r̶t̶i̶c̶l̶e̶ ̶a̶b̶o̶u̶t̶ ̶s̶e̶t̶t̶i̶n̶g̶ ̶u̶p̶ ̶d̶o̶c̶k̶e̶r̶
    Not valid anymore since manually setting up Docker is so last year :)
  • Using Docker Desktop with Windows 10 and Mac

The following section will use Docker Desktop on Windows 10 WSL2, although it can be used also after installing and configuring docker with the other methods and on Linux.

2. Installing docker-compose

$ sudo curl -L “https://github.com/docker/compose/releases/download/1.23.1/docker-compose-$(uname -s)-$(uname -m)” -o /usr/local/bin/docker-compose
$ sudo chmod +x /usr/local/bin/docker-compose
$ docker-compose — version

In this case, I use the all-spark-notebook which according to the docs contains the following:

jupyter/all-spark-notebook includes Python, R, and Scala support for Apache Spark.
Everything in jupyter/pyspark-notebook and its ancestor images
IRKernel to support R code in Jupyter notebooks
Apache Toree and spylon-kernel to support Scala code in Jupyter notebooks
ggplot2, sparklyr, and rcurl packages

Feel free to check other images and decide which one best suits your needs.

Example docker-compose file where we specify the desired image, volume mapping, port number, and so on:

version: "3"
services:
datascience-notebook:
image: jupyter/all-spark-notebook
volumes:
- /home/${USER}/notebooks/:/home/jovyan/work
ports:
- 0.0.0.0:8890:8888
container_name: datascience-notebook-container

This file can be further modified to change the port number, container_name, service name, image as explained in the previous section…

We can check the stack that will be created:

$ docker-compose -f ./docker-compose.yml config
services:
datascience-notebook:
container_name: datascience-notebook-container
image: jupyter/all-spark-notebook
ports:
- 0.0.0.0:8890:8888/tcp
volumes:
- /home/USER/notebooks/:/home/jovyan/work:rw
version: '3.0'

Some useful commands for running the services.

First, we create and run the services in a background:

$ docker-compose -f ./docker-compose.yml up -d

After the start of the service we inspect the logs from the running container and get the token for the Jupyter notebook:

$ docker-compose -f ./docker-compose.yml logs -f
$ docker-compose -f ./docker-compose.yml logs | grep ?token

Starting and stopping the services after we created them, for example when we finish our job or for restarting the service:

$ docker-compose -f ./docker-compose.yml start/stop

Jupyter notebook can be accessed on http://localhost:configured_port in my case http://localhost:8890 and entering the token to authenticate to the environment and start working with our notebooks.

Conclusion:

This was a short guide to setting up a Jupyter notebook environment that can serve as an easy and quick way to getting started on a local machine for prototyping and testing our DS/ML work. I assume that this approach can be scaled up to a bigger machine/cloud instance which I will try to explore in some future article.

--

--

Kiril Aleksovski

Data engineer with background in Database management