Ploomber

graph LR r1[Get
dataset A] --> c1[Clean] --> f1[Features] --> f[Join features] f --> t[Train model] --> e[Evaluate model] r2[Get
dataset B] --> c2[Clean] --> f2[Features] --> f class r1 done; class c1 done; class f1 done; class r2 pending; class c2 pending; class f2 pending; class f pending; class t pending; class e pending;

Coding an entire analysis pipeline in a single notebook file allows you to develop your code interactively, but it creates an unmaintainable monolith that easily breaks. Ploomber allows you to modularize your analysis in smaller tasks without losing the power of an interactive notebook.

Ploomber is the simplest way to turn your notebooks, (Python/R/SQL) scripts or Python functions into a reproducible data pipeline.

Simple

  1. (Optional) List your pipeline scripts in a pipeline.yaml file
  2. Inside each notebook (or script), state dependencies via an upstream variable
  3. Use a product variable to declare output file(s) that the next notebook (or script) will use as inputs

Powerful

  1. Incremental builds (skip up-to-date tasks)
  2. Pipeline testing
  3. Pipeline inspection and debugging

Integrates with Jupyter

  1. Automatically inject a new cell with the location of your input files, as inferred from your upstream variable
  2. Python and R scripts are converted to a notebook on the fly