Categories
Machine Learning

MLflow Python API


This article provides an overview of the MLflow Python API.

It is intended for anyone who wants to become quickly familiar with the MLflow Python API and focuses more on giving a whirlwind tour focusing more on breadth, than depth.

The MLflow API is so vast that I feel the best way to learn about it is to do a dive into the top-level core “fluent” module, and summarizing the lesser used functions.

We will also look at a few other key modules and describe the lesser used ones.
Users who are interested in these modules can look up the documentation. I may also do follow up articles with deeper dives into these other modules.

Core mlflow module

The mlflow module is described as a high-level “fluent” API for managing MLflow runs.

It can be used for the following:

i. Starting and ending an mlflow run:

import mlflow

with mlflow.start_run as run:
     <do stuff>

ii. Logging parameters and metric values:

   mlflow.log_param("param_1" , 6.7)
   mlflow.log_metric("f1_score", 0.7)

iii. Turn on auto logging:

mlflow.autolog()

iv. Create an experiment

experiment_id = mlflow.create_experiment("Experiment Name", artifact_location=)

v. Log metadata associated with a dataset

mlflow.log_input()

vi. Search for run based on parameters

mlflow.search_runs()

Here is a link to a notebook illustrating the above functionality:

Here is a table listing the various functions of the mlflow module and their uses

Function Description
mlflow.active_run() Get current active run, or None if run doesn’t exist
mlflow.autolog(..) Automatically logs metrics, parameters, and models for supported machine learning libraries.
mlflow.start_run(..) Starts a new run, creating a new entry in the tracking database.
mlflow.delete_run(..) Deletes a run from the tracking database.
mlflow.create_experiment(..) Creates a new experiment in the tracking database.
mlflow.delete_experiment(..) Deletes an experiment from the tracking database.
mlflow.delete_tag(..) Deletes a tag from a run in the tracking database.
mlflow.end_run() Ends the current run, if one is active.
mlflow.get_experiment(..) Gets an experiment by its ID.
mlflow.get_experiment_by_name(..) Gets an experiment by its name.
mlflow.get_parent_run(..) Gets the parent run of the current run, if any.
mlflow.get_registry_uri(..) Get the current registry URI. If none has been specified, defaults to the tracking URI.
mlflow.get_run(..) Fetch the run from backend store.
mlflow.get_tracking_uri(..) Get the current tracking URI.
mlflow.load_table(..) Load a table from MLflow Tracking as a pandas.DataFrame
mlflow.log_artifact(..) Log a local file or directory as an artifact of the currently active run.
mlflow.log_figure() Log a figure as an artifact.
mlflow.log_image(..) Logs an image in mlflow.
mlflow.log_input(..) Log a dataset used in the current run.
mlflow.log_metric(..) Log a metric under the current run.
mlflow.log_metrics() Log multiple metrics for the current run.
mlflow.log_param(..) Log a parameter (e.g. model hyperparameter) under the current run
mlflow.log_params(..) Log a batch of params for the current run.
mlflow.log_text(..) Log text as an artifact.
mlflow.login(..) Configure MLflow server authentication and connect MLflow to tracking server.
mlflow.log_table(..) Log a table to MLflow Tracking as a JSON artifact.

mlflow.client module

The mlflow.client module is a lower level API that provides a Python CRUD interface to mlflow.
It maps directly to MLFlow REST API calls. There are many areas of overlap between
mlflow and mlflow.client functions.

Here is an example of mlflow.client usage:

import mlflow
import xgboost as xgb
from mlflow.tracking import MlflowClient
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 1. Prepare Data
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, 
    test_size=0.2, random_state=42)

# 2. Train an XGBoost Model
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

params = {"objective": "multi:softprob", "num_class": 3}
bst = xgb.train(params, dtrain, num_boost_round=10)

# 3. Log Model to MLflow
with mlflow.start_run() as run:
    # Log the model to the current active run
    mlflow.xgboost.log_model(bst, artifact_path="xgboost_model")

    # Log some additional metrics or parameters
    predictions = bst.predict(dtest)
    best_preds = [int(pred.argmax()) for pred in predictions]
    accuracy = accuracy_score(y_test, best_preds)
    mlflow.log_metric("accuracy", accuracy)

    # Get the run ID where the model was logged
    run_id = run.info.run_id

# 4. Use MlflowClient to register the model
client = MlflowClient()

# Create a new registered model
model_name = "Iris_XGBoost_Model"
try:
    client.create_registered_model(model_name)
except mlflow.exceptions.MlflowException:
    # The model might already exist, which is fine.
    print(f"Model {model_name} already exists.")

# 5. Register the specific model version
model_uri = f"runs:/{run_id}/xgboost_model"
client.create_model_version(name=model_name, source=model_uri, run_id=run_id)

print(f"Model registered: {model_name}")

There is a high degree of overlap between mlflow and mlflow.client, with the main difference being
that mlflow is a more high level interface than mlflow.client.

Use mlflow functions for eas of use, logging, tracking, and automatic run management.
Use mlflow.client functions when you need more control, to manage experiments and runs, and for programmatically querying of the MLflow server outside of the context of a specific run.

As of this date, some functions unique only to mlflow are:

start_run(), end_run()

while there are unique to mlflow.client:

create_registered_model(), delete_registered_model()

But the examples above are by no means exhaustive and can change over time.
It is best to look up the documentation for a comprehensive list of each module’s functions.

Other mlflow modules

Other mlflow APIs worth noting are as follows:

  • MLflow Client Tracing functions
    These are part of `mlflow.client. This API enables fine-grained control over tracing, allowing users to create, manage and retrieve traces in a programmatic fashion.
  • mlflow.models
    This module provides an API for managing machine learning models throughout their lifecycle.
    It allows users to log, load, save abd serve ML models. It provides a unified framework to package models from different libraries such as scikit-learn, PyTorch, TensorFlow and deploy them to productio
    for inference.
    mlflow.models supports different flavors which refer to the various model types it supports.
    Examples are mflow.sklearn, mlflow.pytorch, mlflow.tensorflow.
    One flavor worth mentioning is mlflow.pyfunc, which is a default model interface for MLflow Python models and allows for the creation of a custom model for any piece of Python code.

Summary

The MLflow Python API provides a high-level interface for managing machine learning experiments. The core mlflow module offers a “fluent” API for tracking runs, logging parameters, and creating experiments. The mlflow.client module provides a lower-level API for interacting with the MLflow server. Additional modules, such as mlflow.models, support model management and deployment. This API enables efficient tracking, logging, and deployment of machine learning models.