Ploomber has many features specifically tailored to accelerate Machine Learning workflows.
Data cleaning and feature engineering¶
Data cleaning and feature engineering are highly iterative processes, Ploomber accelerates them via incremental builds, which allow you to introduce changes to your pipeline and bring results up-to-date without having to re-compute everything from scratch.
Ploomber also plays nicely with experiment trackers, here’s an example showing how to integrate Ploomber with MLflow.
To help you find the best performing model, Ploomber allows you to parallelize
Machine Learning experiments. If you’re using the Spec API (
you can use the grid feature to create many tasks
at once with different parameters. If you’re using the Python API, you can
easily create highly customized grids of tasks.
Large-scale model training¶
If one machine isn’t enough, you can parallelize training jobs in a cluster by exporting your pipeline to any of our supported platforms (Kubernetes, Airflow, and AWS Batch).