Python API¶

This section lists the available classes and functions in the Python API. If you’re writing pipelines with the Spec API (e.g., pipeline.yaml file), you won’t interact with this API directly. However, you may still want to learn about ploomber.spec.DAGSpec if you need to load your pipeline as a Python object.

For code examples using the Python API, click here.

DAG¶

`DAG`([name, clients, executor])	A collection of tasks with dependencies
`OnlineModel`(module)	A subclass of `ploomber.OnlineDAG` to provider a simpler interface for online DAGs whose terminal task calls `model.predict`.
`OnlineDAG`()	Execute partial DAGs in-memory.
`DAGConfigurator`([d])	An object to customize DAG behavior
`InMemoryDAG`(dag[, return_postprocessor])	Converts a DAG to a DAG-like object that performs all operations in memory (products are not serialized).

Tasks¶

`Task`(product, dag[, name, params])	Abstract class for all Tasks
`PythonCallable`(source, product, dag[, name, ...])	Execute a Python function
`NotebookRunner`(source, product, dag[, name, ...])	Run a Jupyter notebook using papermill.
`ScriptRunner`(source, product, dag[, name, ...])	Similar to NotebookRunner, except it uses python to run the code, instead of papermill, hence, it doesn't generate an output notebook.
`SQLScript`(source, product, dag[, name, ...])	Execute a script in a SQL database to create a relation or view
`SQLDump`(source, product, dag[, name, ...])	Dumps data from a SQL SELECT statement to a file(s)
`SQLTransfer`(source, product, dag[, name, ...])	Transfers data from a SQL database to another (Note: this relies on pandas, only use it for small to medium size datasets)
`SQLUpload`(source, product, dag[, name, ...])	Upload data to a SQL database from a parquet or a csv file.
`PostgresCopyFrom`(source, product, dag[, ...])	Efficiently copy data to a postgres database using COPY FROM (faster alternative to SQLUpload for postgres).
`ShellScript`(source, product, dag[, name, ...])	Execute a shell script.
`DownloadFromURL`(source, product, dag[, ...])	Download a file from a URL (uses urllib.request.urlretrieve)
`Link`(product, dag, name)	A dummy Task used to "plug" an external Product to a pipeline, this task is always considered up-to-date
`Input`(product, dag, name)	A dummy task used to represent input provided by the user, it is always considered outdated.

Products¶

`Product`(identifier)	Abstract class for all Products
`File`(identifier[, client])	A file (or directory) in the local filesystem
`SQLRelation`(identifier)	A product that represents a SQL relation (table or view) with no metadata (incremental builds won't work).
`PostgresRelation`(identifier[, client])	A PostgreSQL relation
`SQLiteRelation`(identifier[, client])	A SQLite relation
`GenericSQLRelation`(identifier[, client])	A GenericProduct whose identifier is a SQL relation, uses SQLite as metadata backend
`GenericProduct`(identifier[, client])	GenericProduct is used when there is no specific Product implementation.

Clients¶

`Client`()	Abstract class for all clients
`DBAPIClient`(connect_fn, connect_kwargs[, ...])	A client for a PEP 249 compliant client library
`SQLAlchemyClient`(uri[, split_source, ...])	Client for connecting with any SQLAlchemy supported database
`ShellClient`([run_template, ...])	Client to run command in the local shell
`S3Client`(bucket_name, parent[, ...])	Client for uploading File products to Amazon S3
`GCloudStorageClient`(bucket_name, parent[, ...])	Client for uploading File products to Google Cloud Storage

Spec¶

DAGSpec(data[, env, lazy_import, reload, ...])

A DAG spec is a dictionary with certain structure that can be converted to a DAG using DAGSpec.to_dag().

Env¶

`with_env`(source)	A function decorated with @with_env that starts and environment during the execution of a function.
`load_env`(fn)	A function decorated with @load_env will be called with the current environment in an env keyword argument
`Env`([source])	Return the current environment

Serialization¶

`serializer`([extension_mapping, fallback, ...])	Decorator for serializing functions
`serializer_pickle`(obj, product)	A serializer that pickles everything
`unserializer`([extension_mapping, fallback, ...])	Decorator for unserializing functions
`unserializer_pickle`(product)	An unserializer that unpickles everything

Executors¶

`Serial`([build_in_subprocess, ...])	Executor than runs one task at a time
`Parallel`([processes, print_progress, ...])	Runs a DAG in parallel using multiprocessing

SourceLoader¶

SourceLoader([path, module])

Load source files using a jinja2.Environment

Contents