5.1. Basic MLflow Usage#
5.1.1. Getting Started with MLflow#
MLflow tracking is easily enabled in mloptimizer by setting the use_mlflow parameter to True when creating a GeneticSearch instance.
5.1.1.1. Installation#
First, ensure MLflow is installed:
pip install mlflow
If you attempt to use use_mlflow=True without MLflow installed, mloptimizer will display a clear error message with installation instructions.
5.1.1.2. Enabling MLflow Tracking#
Simply add use_mlflow=True to your GeneticSearch configuration:
from mloptimizer.interfaces import GeneticSearch, HyperparameterSpaceBuilder
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import StratifiedKFold
# Load data
X, y = load_breast_cancer(return_X_y=True)
# Get hyperparameter space
space = HyperparameterSpaceBuilder.get_default_space(RandomForestClassifier)
# Create GeneticSearch with MLflow enabled
opt = GeneticSearch(
estimator_class=RandomForestClassifier,
hyperparam_space=space,
cv=StratifiedKFold(n_splits=5),
scoring="balanced_accuracy",
generations=10,
population_size=20,
early_stopping=True,
patience=3,
use_mlflow=True # Enable MLflow tracking
)
# Run optimization - all results logged to MLflow
opt.fit(X, y)
print(f"Best model: {opt.best_estimator_}")
print("\nView results in MLflow UI:")
print(" mlflow ui --port 5000")
print(" Then open: http://localhost:5000")
5.1.2. What Gets Logged#
When MLflow tracking is enabled, mloptimizer automatically logs comprehensive information about your optimization run:
5.1.2.1. Parent Run (Optimization-Level)#
Each optimization creates a parent run with a timestamp-based name (e.g., 20260118_161428_RandomForestClassifier):
- Generation-Level Metrics
generation_best_fitness: Best fitness in each generationgeneration_avg_fitness: Average fitness in each generationgeneration_worst_fitness: Worst fitness in each generationgeneration_median_fitness: Median fitness in each generationfinal_best_fitness: Best fitness achieved overall
- Configuration Parameters
population_size: Population sizegenerations: Maximum generationsearly_stopping: Whether early stopping is enabledpatience: Early stopping patienceuse_parallel: Parallelization statusn_evolvable_params: Number of evolvable hyperparametersevolvable_params: List of evolvable parameter names
- Dataset Metadata (Tags)
dataset_samples: Number of training samplesdataset_features: Number of featuresdataset_classes: Number of classes (or ‘regression’)
- Optimization Results (Tags)
estimator_class: Name of the estimator being optimizedearly_stopped: Whether optimization stopped earlystopped_at_generation: Generation where optimization stoppedtotal_evaluations: Total number of model evaluationsoptimization_time_seconds: Total optimization time
5.1.2.2. Child Runs (Individual Evaluations)#
Each individual evaluation creates a nested child run:
Hyperparameters: All hyperparameter values for that individual
Fitness Metrics: Evaluation scores (accuracy, balanced_accuracy, etc.)
Generation Info: Tags indicating which generation and individual index
5.1.3. Default Storage Location#
By default, MLflow stores runs locally in the ./mlruns/ directory:
./mlruns/
├── 0/ # Default experiment
├── 1/ # mloptimizer experiment
│ ├── meta.yaml
│ └── <run_id>/
│ ├── meta.yaml
│ ├── metrics/
│ │ ├── generation_best_fitness
│ │ ├── generation_avg_fitness
│ │ └── final_best_fitness
│ ├── params/
│ │ ├── population_size
│ │ ├── generations
│ │ └── ...
│ └── tags/
│ ├── estimator_class
│ ├── dataset_samples
│ └── ...
5.1.4. Custom Experiment Name#
You can specify a custom experiment name using the MLflow API before creating your GeneticSearch:
import mlflow
from mloptimizer.interfaces import GeneticSearch
# Set custom experiment name
mlflow.set_experiment("breast_cancer_optimization")
# Create GeneticSearch - will log to this experiment
opt = GeneticSearch(
estimator_class=YourEstimator,
hyperparam_space=space,
use_mlflow=True
)
opt.fit(X, y)
5.1.5. Example with Early Stopping#
MLflow tracking captures early stopping information:
from mloptimizer.interfaces import GeneticSearch
from sklearn.ensemble import GradientBoostingClassifier
opt = GeneticSearch(
estimator_class=GradientBoostingClassifier,
hyperparam_space=space,
cv=cv,
scoring="balanced_accuracy",
generations=20,
population_size=30,
early_stopping=True, # Enable early stopping
patience=5,
min_delta=0.001,
use_mlflow=True # Track everything
)
opt.fit(X, y)
The MLflow run will include: - Tag indicating early stopping was enabled - Tag showing which generation optimization stopped at - Tag showing the reason (no improvement for N generations) - Final best fitness achieved
5.1.6. Disabling MLflow#
MLflow tracking is disabled by default. To explicitly disable it:
opt = GeneticSearch(
estimator_class=YourEstimator,
hyperparam_space=space,
use_mlflow=False # Disable MLflow (default)
)
When disabled, no MLflow data is logged, and the library works without requiring MLflow to be installed.
Note
MLflow tracking adds minimal overhead to optimization time. The benefits of comprehensive experiment tracking typically outweigh the small performance cost.
Tip
For quick experiments, use local MLflow storage. For production or team collaboration, configure a remote MLflow tracking server (see Remote MLflow Tracking).