Machine Learning¶

Ploomber has many features specifically tailored to accelerate Machine Learning workflows.

graph LR la[Load dataset A] --> ca[Clean] --> fa[Features] --> merge[Merge] lb[Load dataset B] --> cb[Clean] --> fb[Features] --> merge merge --> train1[NN] --> eval[Evaluate] merge --> train2[Random Forest] --> eval merge --> train3[SVM] --> eval

Tip

Check out our sklearn-evaluation library. It contains a large collection of Machine Learning evaluation plots, an experiment tracker, and many other features!

Data cleaning and feature engineering¶

Data cleaning and feature engineering are highly iterative processes, Ploomber accelerates them via incremental builds, which allow you to introduce changes to your pipeline and bring results up-to-date without having to re-compute everything from scratch.

Experiment tracking¶

Ploomber also plays nicely with experiment trackers, allowing you to train hundreds of models and track the results.

Example: Integration with MLflow

Instructions

pip install ploomber
ploomber examples -n templates/mlflow -o ploomber-mlflow

Parallel experiments¶

To help you find the best performing model, Ploomber allows you to parallelize Machine Learning experiments.

Example: Running a grid of experiments in parallel

pip install ploomber
ploomber examples -n cookbook/grid -o grid

Example: Model selection with nested cross-validation

pip install ploomber
ploomber examples -n cookbook/nested-cv -o nested-cv

Large-scale model training¶

If one machine isn’t enough, you can parallelize training jobs in a cluster by exporting your pipeline to any of our supported platforms (Kubernetes, Airflow, and AWS Batch).

Deployment¶

Once you find the best performing model, you can deploy it for batch processing or as an online API.

Contents