5.1. Basic MLflow Usage#

5.1.1. Getting Started with MLflow#

MLflow tracking is easily enabled in mloptimizer by setting the use_mlflow parameter to True when creating a GeneticSearch instance.

5.1.1.1. Installation#

First, ensure MLflow is installed:

pip install mlflow

If you attempt to use use_mlflow=True without MLflow installed, mloptimizer will display a clear error message with installation instructions.

5.1.1.2. Enabling MLflow Tracking#

Simply add use_mlflow=True to your GeneticSearch configuration:

from mloptimizer.interfaces import GeneticSearch, HyperparameterSpaceBuilder
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import StratifiedKFold

# Load data
X, y = load_breast_cancer(return_X_y=True)

# Get hyperparameter space
space = HyperparameterSpaceBuilder.get_default_space(RandomForestClassifier)

# Create GeneticSearch with MLflow enabled
opt = GeneticSearch(
    estimator_class=RandomForestClassifier,
    hyperparam_space=space,
    cv=StratifiedKFold(n_splits=5),
    scoring="balanced_accuracy",
    generations=10,
    population_size=20,
    early_stopping=True,
    patience=3,
    use_mlflow=True  # Enable MLflow tracking
)

# Run optimization - all results logged to MLflow
opt.fit(X, y)

print(f"Best model: {opt.best_estimator_}")
print("\nView results in MLflow UI:")
print("  mlflow ui --port 5000")
print("  Then open: http://localhost:5000")

5.1.2. What Gets Logged#

When MLflow tracking is enabled, mloptimizer automatically logs comprehensive information about your optimization run:

5.1.2.1. Parent Run (Optimization-Level)#

Each optimization creates a parent run with a timestamp-based name (e.g., 20260118_161428_RandomForestClassifier):

Generation-Level Metrics

generation_best_fitness: Best fitness in each generation
generation_avg_fitness: Average fitness in each generation
generation_worst_fitness: Worst fitness in each generation
generation_median_fitness: Median fitness in each generation
final_best_fitness: Best fitness achieved overall

Configuration Parameters

population_size: Population size
generations: Maximum generations
early_stopping: Whether early stopping is enabled
patience: Early stopping patience
use_parallel: Parallelization status
n_evolvable_params: Number of evolvable hyperparameters
evolvable_params: List of evolvable parameter names

Dataset Metadata (Tags)

dataset_samples: Number of training samples
dataset_features: Number of features
dataset_classes: Number of classes (or ‘regression’)

Optimization Results (Tags)

estimator_class: Name of the estimator being optimized
early_stopped: Whether optimization stopped early
stopped_at_generation: Generation where optimization stopped
total_evaluations: Total number of model evaluations
optimization_time_seconds: Total optimization time

5.1.2.2. Child Runs (Individual Evaluations)#

Each individual evaluation creates a nested child run:

Hyperparameters: All hyperparameter values for that individual
Fitness Metrics: Evaluation scores (accuracy, balanced_accuracy, etc.)
Generation Info: Tags indicating which generation and individual index

5.1.3. Default Storage Location#

By default, MLflow stores runs locally in the ./mlruns/ directory:

./mlruns/
├── 0/                          # Default experiment
├── 1/                          # mloptimizer experiment
│   ├── meta.yaml
│   └── <run_id>/
│       ├── meta.yaml
│       ├── metrics/
│       │   ├── generation_best_fitness
│       │   ├── generation_avg_fitness
│       │   └── final_best_fitness
│       ├── params/
│       │   ├── population_size
│       │   ├── generations
│       │   └── ...
│       └── tags/
│           ├── estimator_class
│           ├── dataset_samples
│           └── ...

5.1.4. Custom Experiment Name#

You can specify a custom experiment name using the MLflow API before creating your GeneticSearch:

import mlflow
from mloptimizer.interfaces import GeneticSearch

# Set custom experiment name
mlflow.set_experiment("breast_cancer_optimization")

# Create GeneticSearch - will log to this experiment
opt = GeneticSearch(
    estimator_class=YourEstimator,
    hyperparam_space=space,
    use_mlflow=True
)

opt.fit(X, y)

5.1.5. Example with Early Stopping#

MLflow tracking captures early stopping information:

from mloptimizer.interfaces import GeneticSearch
from sklearn.ensemble import GradientBoostingClassifier

opt = GeneticSearch(
    estimator_class=GradientBoostingClassifier,
    hyperparam_space=space,
    cv=cv,
    scoring="balanced_accuracy",
    generations=20,
    population_size=30,
    early_stopping=True,  # Enable early stopping
    patience=5,
    min_delta=0.001,
    use_mlflow=True  # Track everything
)

opt.fit(X, y)

The MLflow run will include: - Tag indicating early stopping was enabled - Tag showing which generation optimization stopped at - Tag showing the reason (no improvement for N generations) - Final best fitness achieved

5.1.6. Disabling MLflow#

MLflow tracking is disabled by default. To explicitly disable it:

opt = GeneticSearch(
    estimator_class=YourEstimator,
    hyperparam_space=space,
    use_mlflow=False  # Disable MLflow (default)
)

When disabled, no MLflow data is logged, and the library works without requiring MLflow to be installed.

Note

MLflow tracking adds minimal overhead to optimization time. The benefits of comprehensive experiment tracking typically outweigh the small performance cost.

Tip

For quick experiments, use local MLflow storage. For production or team collaboration, configure a remote MLflow tracking server (see Remote MLflow Tracking).