dataset A] --> c1[Clean] --> f1[Features] --> f[Join features] f --> t[Train model] --> e[Evaluate model] r2[Get
dataset B] --> c2[Clean] --> f2[Features] --> f class r1 done; class c1 done; class f1 done; class r2 pending; class c2 pending; class f2 pending; class f pending; class t pending; class e pending;
Coding an entire analysis pipeline in a single notebook file allows you to develop your code interactively,
but it creates an unmaintainable monolith that easily breaks. Ploomber allows you to modularize your
analysis in smaller tasks without losing the power of an interactive notebook.
Ploomber is the simplest way to turn your notebooks, (Python/R/SQL) scripts or Python functions into a reproducible data pipeline.
- (Optional) List your pipeline scripts in a
- Inside each notebook (or script), state dependencies via an
- Use a
productvariable to declare output file(s) that the next notebook (or script) will use as inputs
- Incremental builds (skip up-to-date tasks)
- Pipeline testing
- Pipeline inspection and debugging
Integrates with Jupyter
- Automatically inject a new cell with the location of your input files, as inferred from your
- Python and R scripts are converted to a notebook on the fly