Ploomber

graph LR r1[Get
dataset A] --> c1[Clean] --> f1[Features] --> f[Join features] f --> t[Train model] --> e[Evaluate model] r2[Get
dataset B] --> c2[Clean] --> f2[Features] --> f class r1 done; class c1 done; class f1 done; class r2 pending; class c2 pending; class f2 pending; class f pending; class t pending; class e pending;

Write better data pipelines without having to learn a specialized framework. By adopting a convention over configuration philosophy, Ploomber streamlines pipeline execution, allowing teams to confidently develop data products.

Simple

  1. (Optional) List your pipeline scripts in a pipeline.yaml file
  2. Inside each script, state dependencies (other scripts) via an upstream variable
  3. Use a product variable to declare output file(s) that the next script will use as inputs

Powerful

  1. Automated end-to-end execution
  2. Incremental builds (skip up-to-date tasks)
  3. Parametrized pipelines
  4. Hooks for pipeline testing
  5. Integration with debugging tools

Integrates with Jupyter

A Jupyter plugin allows your scripts to be rendered as notebooks in the jupyter notebook app. Once rendered, a new cell with input paths (inferred from your upstream dependencies) is injected.

Polyglot

Python, R and SQL are officially supported. Other languages can be easily integrated via Jupyter kernels or through shell scripts.