Categories
Machine Learning

Quick Intro to MLflow

What is MLflow?

MLflow is an open source framework used to manage the machine learning lifecycle. It enables ML model development, deployment, tracking, and experimentation. It is part of the OpenML project. It is available in Python, Java, and Scala.

Components of mlflow

MLflow is composed of the following core components:

  • Tracking – API/UI for logging/tracking of experiments, parameters, metrics, artifacts, code versions.
  • Models – Allows one to manage and deploy ML models.
  • Model Registry – Provides centralized store for managing model lifecycles and versions.
  • Projects – Enables one to package ML code in reusable, reproducible form for sharing and deployment.
  • Model Serving – API for serving ML models from REST endpoints.

Additional components include:

  • Evaluate – API for model evaluation.
  • MLflow Deployments – server managing use of LLMs in an organization.
  • Recipes – framework for creating ML pipelines and deploying to Production.
  • Prompt Engineering UI – UI for prompt engineering.

Uses of MLflow

There are quite a few uses of mlflow, reflecting the complexities of managing the machine learning lifecycle.

Here are some of the common cases that present a challenge:

  • Experiment management and tracking – how to keep track of experiments and their results.
  • Reproducibility – how to reproduce experiments and results across multiple runs.
  • Deployment consistency – how to ensure that models are deployed consistently.
  • Model management – how to manage models and their versions.
  • Library Agnosticism – how to work with different libraries while ensuring models are usable across different libraries

MLflow is designed to address these challenges.

It offers the following features:

  • Traceability
  • Consistency
  • Flexibility

Users of MLflow

Various personas in the Data/ML space can make use of mlflow.

  • Data Scientists
  • Data & ML Engineers
  • Prompt Engineers

Use Cases of MLflow

Typical Use cases of MLFlow include:

  • Experiment tracking and management.
  • Model selection and management.
  • Model Performnance Evaluation and Monitoring.
  • Project Management of Models and collaboration.

Scalability

MLflow is designed to be scalable.

It supports the following:

  • Distributed Execution
  • Parallel Runs
  • Interoperability with Distributed Storage
  • Centralized model Management with Model Registry

References

For a more in-depth look at MLFlow check out the following:

Official MLflow docs

Databricks Managed MLFlow