Python API

This section lists the available classes and functions in the Python API. If you’re writing pipelines with the Spec API (e.g., pipeline.yaml file), you won’t interact with this API directly. However, you may still want to learn about ploomber.spec.DAGSpec if you need to load your pipeline as a Python object.

For code examples using the Python API, click here.

DAG

DAG([name, clients, executor])

A collection of tasks with dependencies

OnlineModel(module)

A subclass of ploomber.OnlineDAG to provider a simpler interface for online DAGs whose terminal task calls model.predict.

OnlineDAG()

Execute partial DAGs in-memory.

DAGConfigurator([d])

An object to customize DAG behavior

InMemoryDAG(dag[, return_postprocessor])

Converts a DAG to a DAG-like object that performs all operations in memory (products are not serialized).

Tasks

Task(product, dag[, name, params])

Abstract class for all Tasks

PythonCallable(source, product, dag[, name, ...])

Execute a Python function

NotebookRunner(source, product, dag[, name, ...])

Run a Jupyter notebook using papermill.

ScriptRunner(source, product, dag[, name, ...])

Similar to NotebookRunner, except it uses python to run the code, instead of papermill, hence, it doesn't generate an output notebook.

SQLScript(source, product, dag[, name, ...])

Execute a script in a SQL database to create a relation or view

SQLDump(source, product, dag[, name, ...])

Dumps data from a SQL SELECT statement to a file(s)

SQLTransfer(source, product, dag[, name, ...])

Transfers data from a SQL database to another (Note: this relies on pandas, only use it for small to medium size datasets)

SQLUpload(source, product, dag[, name, ...])

Upload data to a SQL database from a parquet or a csv file.

PostgresCopyFrom(source, product, dag[, ...])

Efficiently copy data to a postgres database using COPY FROM (faster alternative to SQLUpload for postgres).

ShellScript(source, product, dag[, name, ...])

Execute a shell script.

DownloadFromURL(source, product, dag[, ...])

Download a file from a URL (uses urllib.request.urlretrieve)

Link(product, dag, name)

A dummy Task used to "plug" an external Product to a pipeline, this task is always considered up-to-date

Input(product, dag, name)

A dummy task used to represent input provided by the user, it is always considered outdated.

Products

Product(identifier)

Abstract class for all Products

File(identifier[, client])

A file (or directory) in the local filesystem

SQLRelation(identifier)

A product that represents a SQL relation (table or view) with no metadata (incremental builds won't work).

PostgresRelation(identifier[, client])

A PostgreSQL relation

SQLiteRelation(identifier[, client])

A SQLite relation

GenericSQLRelation(identifier[, client])

A GenericProduct whose identifier is a SQL relation, uses SQLite as metadata backend

GenericProduct(identifier[, client])

GenericProduct is used when there is no specific Product implementation.

Clients

Client()

Abstract class for all clients

DBAPIClient(connect_fn, connect_kwargs[, ...])

A client for a PEP 249 compliant client library

SQLAlchemyClient(uri[, split_source, ...])

Client for connecting with any SQLAlchemy supported database

ShellClient([run_template, ...])

Client to run command in the local shell

S3Client(bucket_name, parent[, ...])

Client for uploading File products to Amazon S3

GCloudStorageClient(bucket_name, parent[, ...])

Client for uploading File products to Google Cloud Storage

Spec

DAGSpec(data[, env, lazy_import, reload, ...])

A DAG spec is a dictionary with certain structure that can be converted to a DAG using DAGSpec.to_dag().

Env

with_env(source)

A function decorated with @with_env that starts and environment during the execution of a function.

load_env(fn)

A function decorated with @load_env will be called with the current environment in an env keyword argument

Env([source])

Return the current environment

Serialization

serializer([extension_mapping, fallback, ...])

Decorator for serializing functions

serializer_pickle(obj, product)

A serializer that pickles everything

unserializer([extension_mapping, fallback, ...])

Decorator for unserializing functions

unserializer_pickle(product)

An unserializer that unpickles everything

Executors

Serial([build_in_subprocess, ...])

Executor than runs one task at a time

Parallel([processes, print_progress, ...])

Runs a DAG in parallel using multiprocessing

SourceLoader

SourceLoader([path, module])

Load source files using a jinja2.Environment