FAQ and Glossary

Why do products have clients?

Clients exist in tasks and products because they serve different purposes. A task client handles the communication to a system where the source code will be executed. On the other hand, product’s client only handles the product’s metadata.

To enable incremental runs. Ploomber has to store the source code that generate any given product. To make this process simpler, metadata is stored in the same system. But saving metadata requires a system specific implementation. Currently, only SQLite and PostgreSQL are supported via ploomber.products.SQLiteRelation and ploomber.products.PostgresRelation respectively. For this two cases task client and product client communicate to the same system.

For any other database, we provide two alternatives, in both cases, the task’s client is different from the product’s client. The first alternative is ploomber.products.GenericSQLRelation which represents a generic table or view and saves metadata in a SQLite database, on this case, the task’s client is the database client (e.g. Oracle, Hive, Snowflake) but the product’s client is a SQLite client. If you don’t need the incremental builds features, you can use ploomber.products.SQLRelation instead which is a product with no metadata.

Which databases are supported?

The answer depends on the task to use. interact with databases via clients, there are two clients available. If the database you want to use is supported by SQLAlchemy, you can use ploomber.clients.SQLAlchemyClient, if the database has a client that implements Python’s Database API Specification (PEP 249), you can use ploomber.clients.DBAPIClient.

ploomber.tasks.SQLDump supports both types of clients, you should be able to dump data to local files from pretty much all databases.

ploomber.products.SQLScript supports both types of clients but since it is intended to create new tables/views in the database, the product also needs a client. See the answer above for details.

ploomber.tasks.SQLUpload relies on pandas.to_sql to upload a local file to a database. Such method relies on SQLAlchemy to work, hence it only supports SQLAlchemyClient.

ploomber.tasks.PostgresCopyFrom is a faster alternative to SQLUpload when using PostgreSQL. It relies on pandas.to_sql only to create the database, but actual data upload is donce using psycopg which calls the native COPY FROM procedure.

Glossary

  1. Dotted path. A dot separated string pointing to a Python module/class/function, e.g. “ploomber.DAG”.

  2. Entry point. A location to tell Ploomber how to initialize a DAG, this can be a spec file, a directory or a dotted path

  3. Hook. A function executed after certain event happens, e.g. the task “on finish” hook executes after the task executes successfully

  4. Spec. A dictionary-like specification to initialize a DAG, usually provided via a YAML file