Bonobo with Docker

Brand new feature

This feature is now released in the stable channel of Bonobo.

Docker extension for Bonobo

Bonobo-Docker is a lightweight package that adds Docker-related capabilities to Bonobo.

To use it, you'll need the bonobo[docker] extra, or the bonobo-docker package.

You also need a working docker installation, and a correctly setup docker client accessible from PATH (if you can run docker ps, you should be good to go).

pip install bonobo[docker]

Run an ETL job within a container

The extension defines a new runc command for bonobo to run a (local) ETL job within a docker container.

bonobo runc uses the same syntax as bonobo run.

It currently supports both running a file (with bonobo runc <filename></filename>) or a directory (using bonobo runc <path></path>), but not a python module refered as by python dot notation (bonobo run -m foo.bar.baz has no container equivalent).

# Example invocation
bonobo runc myjob.py

Run a Python REPL console in a container

Using bonobo runc without arguments will spawn a python read-eval-print-loop (REPL) in a docker container, with bonobo pre-installed and pre-loaded (you can use the bonobo symbol without importing it first.

# Spawn a python shell (with bonobo preloaded)
bonobo runc

Run a bash shell in a container

To explore a bonobo-enabled container using a bash shell, you can run bonobo runc --shell. Of course, bonobo is pre-installed.

# Spawn a bash shell
bonobo runc --shell

Volume management

There are a few candies available that includes mounting volumes in the containers run using bonobo runc.

  • ~/.cache/ will be mounted as /home/bonobo/.cache/ in the container, allowing efficient use of caches (for example, pip will store its download caches and wheels here, so multiple installs of the same software won't trigger a new download everytime, and won't trigger recompilation of C modules).
  • You can use --with-local-packages flag (or -L) with the bonobo runc command to ask for your local editable bonobo* packages to be linked and installed within the containers. It makes it easy to work on dependency sources within a container without losing your changes every time you kill the process. Those packages will be pip-editable-installed in the container on start.
  • You can manually mount additional volumes using the -v flag of bonobo runc, it will just pass your additional volumes to the docker client.
  • In the future, more options will allow to mount other editable packages, so you can work on your own software without loosing changes.
  • In the future, it'll be possible to mount data-directories to either get some input data or save output data.

Build images

To ease the process of building images for bonobo, the extension also contains a wrapper for docker build that will correctly tag the images with the right version numbers, using metadata retrieved from PyPI package page.

You should not normally need to use this, as we will build and publish images after each bonobo release (and probably in a scheduled way too on old versions, so we can include security updates from the base operating system).

As this feature is less user-land and more release-land, it's not published as a bonobo command, but is only useable using the following command:

python -m bonobo_docker build

Docker in docker (dind)

For now, bonobo-docker does not support running docker within a docker container. This is something we want to look into in the future, especially for bonobo-docker continuous integration processes.

Happy containerization!

Did I say we need feedback? Slack discussions and issues are more than welcome!