Spec API vs Python API

There are two ways of writing pipelines with Ploomber. This document discusses the differences and how to decide which API to use.

Data projects span a wide range of applications; from a small projects that just need a few scripts to large ones that require a greater degree of flexibility. Ploomber is designed to make as simple as possible, and only use a more sophisticated solution if you need to.

For examples using both APIs, click here

tl; dr;

Spec API

Directory entry point

The Spec API is a “no-code” solution for writing pipelines, its simplest (and implicit) case is to just use a directory with scripts. Ploomber will analyze your code, find dependencies and execute the pipeline. There is no need to write any “plumbing code”. This is great for simple script-based projects. Since Ploomber executes your pipeline by pointing it to a directory, this approach is known as a directory entry point.

Spec entry point

If you want to customize how Ploomber analyzes and executes your pipeline, you have to create a pipeline.yaml file. This approach uses the Spec API explicitly, to give you a place to configure your pipeline. The most common use cases are small pipelines that have SQL tasks. By specifying how to connect to a database in your pipeline.yaml file, you let Ploomber take care of managing db connections and focus on writing SQL scripts. Since Ploomber executes your pipeline by pointing it to a spec file, this is known as a spec entry point.

Another added feature of this approach is pipeline parametrization, see to learn more Parametrized pipelines.

Python API

Factory entry point

The last approach requires you to write Python code to specify your pipeline. It has a steeper learning curve because you have to become familiar with the API specifics but it provides the greatest level of flexibility.

The biggest advantage are dynamic pipelines, whose exact number of tasks and dependency relations are determined when executing your Python code. For example, you might use a for loop to dynamically generate a few tasks based on some input parameters.

For Ploomber to know how to build your pipeline written as Python code, you have to provide a factory entry point, which is just a function that returns a DAG object. For example, if your factory is a function called make in a file called pipeline.py, then your entry point is the dotted path pipeline.make. Internally, Ploomber will do something like this:

from pipeline import make

dag = make()

dag.build()