dataset A] --> c1[Clean] --> f1[Features] --> f[Join features] f --> t[Train model] --> e[Evaluate model] r2[Get
dataset B] --> c2[Clean] --> f2[Features] --> f class r1 done; class c1 done; class f1 done; class r2 pending; class c2 pending; class f2 pending; class f pending; class t pending; class e pending;
Write better data pipelines without having to learn a specialized framework. By adopting a convention over configuration philosophy, Ploomber streamlines pipeline execution, allowing teams to confidently develop data products.
- (Optional) List your pipeline scripts in a
- Inside each script, state dependencies (other scripts) via an
- Use a
productvariable to declare output file(s) that the next script will use as inputs
- Automated end-to-end execution
- Incremental builds (skip up-to-date tasks)
- Parametrized pipelines
- Hooks for pipeline testing
- Integration with debugging tools
Integrates with Jupyter
A Jupyter plugin allows your scripts to be rendered as notebooks
jupyter notebook app. Once rendered,
a new cell with input paths (inferred from your
upstream dependencies) is injected.
Python, R and SQL are officially supported. Other languages can be easily integrated via Jupyter kernels or through shell scripts.