Bonobo with Cookiecutter

Brand new feature

This feature is now released in the stable channel of Bonobo.

Cookiecutter template for ETL jobs

Cookiecutter-Bonobo is a small template to be used with the cookiecutter project generator

It will take 60 seconds to bootstrap a new ETL job, or ETL job collection.

To use it, you need to have both bonobo and cookiecutter installed in your current python environment.

pip install bonobo cookiecutter

Create a new ETL job collection

Let's create a brand new job collection (project).

bonobo init example-jobs


This will call cookiecutter with the cookiecutter-bonobo template, and create a directory that contains:

  • .env → placeholder for local environment, used to configure your dependencies, secrets, etc.
  • .gitignore → ignore local environment if used with git.
  • _services.py → services definitions.
  • main.py → main entrypoint, where your code lives.
  • requirements.txt → python dependencies.

To run the example job provided in the main.py file, you can simply bonobo run it:

bonobo run example-jobs


Congratulation, you just ran your first ETL project! Wasn't that fast?

Detailed exploration

Local environment (.env)

The local environment file (.env) is a simple text file where you can setup default environment values, using a KEY=value format, one per line. It's usually a good practice to separate config values from your codebase, and this is a simple way to do it. Then, you can simply use os.environ.get('KEY') in your code to retrieve those values, and you can always override it at runtime using your shell's syntax (for example, KEY=othervalue bonobo run ... in bash).

Please note that this is a convention, and you're free to do otherwise.

Main entrypoint (main.py)

This is where your main code lives.

The only requirement to be able to bonobo run example-jobs is that this file defines a graph instance somewhere, and by convention it should be named graph (although bonobo will recognize it even if you name it something else).

Services definitions (_services.py)

When you define graphs, you'll reference external dependencies (like an API client, a SQL engine/connection, an HTTP client, a filesystem, etc.) by name, allowing to specify the implementation at runtime instead of code time. This will allow to easily go to a larger scale than «just runs on my laptop»-stage, by having different service implementations depending on the stage it runs on (you may want some mock database in tests, to add a cache layer to http requests on prod, etc...).

This file should contain a get_services() callable that returns a dict of your service implementations, and by default, it provides only one fs service used to access your filesystem.

More information about services are available in bonobo documentation.

Python dependencies (requirements.txt)

If your job collection depends on more than just bonobo (which will be most likely the case for real-world transformations), you should add them to this requirements.txt file.

Happy job collection-ing!

Did I say we need feedback? Slack discussions and issues are more than welcome!