FAQ and Glossary¶
Why do products have clients?¶
Clients exist in tasks and products because they serve different purposes. A task client handles the communication to a system where the source code will be executed. On the other hand, product’s client only handles the product’s metadata.
To enable incremental runs. Ploomber has to store the source code that generate
any given product. To make this process simpler, metadata is stored in the
same system. But saving metadata requires a system specific implementation.
Currently, only SQLite and PostgreSQL are supported via
ploomber.products.PostgresRelation respectively. For this two cases
task client and product client communicate to the same system.
For any other database, we provide two alternatives, in both cases, the
task’s client is different from the product’s client. The first alternative
ploomber.products.GenericSQLRelation which represents a generic
table or view and saves metadata in a SQLite database, on this case, the
task’s client is the database client (e.g. Oracle, Hive, Snowflake) but
the product’s client is a SQLite client. If you don’t need the incremental
builds features, you can use
which is a product with no metadata.
Which databases are supported?¶
The answer depends on the task to use. interact with databases via clients,
there are two clients available. If the database you want
to use is supported by SQLAlchemy, you can use
ploomber.clients.SQLAlchemyClient, if the database has a client that
implements Python’s Database API Specification (PEP 249), you can use
ploomber.tasks.SQLDump supports both types of clients, you should
be able to dump data to local files from pretty much all databases.
ploomber.products.SQLScript supports both types of clients but since
it is intended to create new tables/views in the database, the product also
needs a client. See the answer above for details.
ploomber.tasks.SQLUpload relies on pandas.to_sql to upload a local
file to a database. Such method relies on SQLAlchemy to work, hence it only
ploomber.tasks.PostgresCopyFrom is a faster alternative to
SQLUpload when using PostgreSQL. It relies on pandas.to_sql only
to create the database, but actual data upload is donce using
which calls the native
COPY FROM procedure.
Dotted path. A dot separated string pointing to a Python module/class/function, e.g. “ploomber.DAG”.
Entry point. A location to tell Ploomber how to initialize a DAG, this can be a spec file, a directory or a dotted path
Hook. A function executed after certain event happens, e.g. the task “on finish” hook executes after the task executes successfully
Spec. A dictionary-like specification to initialize a DAG, usually provided via a YAML file