Ploomber Documentation Status

Point Ploomber to your Python and SQL scripts in a pipeline.yaml file and it will figure out execution order by extracting dependencies from them.

It also keeps track of source code changes to speed up builds by skipping up-to-date tasks. This is a great way to interactively develop your projects, sync work with your team and quickly recover from crashes (just fix the bug and build again).

Try out the live demo (no installation required).

Click here for documentation.

Our blog.

Works with Python 3.5 and higher.

pipeline.yaml example

# pipeline.yaml

# clean data from the raw table
- source: clean.sql
  product: clean_data
  # function that returns a db client
  client: db.get_client

# aggregate clean data
- source: aggregate.sql
  product: agg_data
  client: db.get_client

# dump data to a csv file
- class: SQLDump
  source: dump_agg_data.sql
  product: output/data.csv
  client: db.get_client

# visualize data from csv file
- source:
    # where to save the executed notebook
    nb: output/executed-notebook-plot.ipynb
    # tasks can generate other outputs
    data: output/some_data.csv

Python script example

# annotated python file (it will be converted to a notebook during execution)
import pandas as pd

# + tags=["parameters"]
# this script depends on the output generated by a task named "clean"
upstream = {'clean': None}
product = None

# during execution, a new cell is added here

# +
df = pd.read_csv(upstream['some_task'])
# do data processing...

SQL script example


CREATE TABLE {{product}} AS
-- this task depends on the output generated by a task named "clean"
SELECT * FROM {{upstream['clean']}}
WHERE x > 10


pip install ploomber

To install Ploomber along with all optional dependencies:

pip install "ploomber[all]"

graphviz is required for plotting pipelines:

# if you use conda (recommended)
conda install graphviz
# if you use homebrew
brew install graphviz
# for more options, see:

Create a new project

ploomber new

Python API

There is also a Python API for advanced use cases. This API allows you build flexible abstractions such as dynamic pipelines, where the exact number of tasks is determined by its parameters.



Indices and tables