Basic MLflow Usage ================== Getting Started with MLflow ---------------------------- MLflow tracking is easily enabled in mloptimizer by setting the ``use_mlflow`` parameter to ``True`` when creating a ``GeneticSearch`` instance. Installation ~~~~~~~~~~~~ First, ensure MLflow is installed: .. code-block:: bash pip install mlflow If you attempt to use ``use_mlflow=True`` without MLflow installed, mloptimizer will display a clear error message with installation instructions. Enabling MLflow Tracking ~~~~~~~~~~~~~~~~~~~~~~~~~ Simply add ``use_mlflow=True`` to your ``GeneticSearch`` configuration: .. code-block:: python from mloptimizer.interfaces import GeneticSearch, HyperparameterSpaceBuilder from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import load_breast_cancer from sklearn.model_selection import StratifiedKFold # Load data X, y = load_breast_cancer(return_X_y=True) # Get hyperparameter space space = HyperparameterSpaceBuilder.get_default_space(RandomForestClassifier) # Create GeneticSearch with MLflow enabled opt = GeneticSearch( estimator_class=RandomForestClassifier, hyperparam_space=space, cv=StratifiedKFold(n_splits=5), scoring="balanced_accuracy", generations=10, population_size=20, early_stopping=True, patience=3, use_mlflow=True # Enable MLflow tracking ) # Run optimization - all results logged to MLflow opt.fit(X, y) print(f"Best model: {opt.best_estimator_}") print("\nView results in MLflow UI:") print(" mlflow ui --port 5000") print(" Then open: http://localhost:5000") What Gets Logged ---------------- When MLflow tracking is enabled, mloptimizer automatically logs comprehensive information about your optimization run: Parent Run (Optimization-Level) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Each optimization creates a parent run with a timestamp-based name (e.g., ``20260118_161428_RandomForestClassifier``): **Generation-Level Metrics** - ``generation_best_fitness``: Best fitness in each generation - ``generation_avg_fitness``: Average fitness in each generation - ``generation_worst_fitness``: Worst fitness in each generation - ``generation_median_fitness``: Median fitness in each generation - ``final_best_fitness``: Best fitness achieved overall **Configuration Parameters** - ``population_size``: Population size - ``generations``: Maximum generations - ``early_stopping``: Whether early stopping is enabled - ``patience``: Early stopping patience - ``use_parallel``: Parallelization status - ``n_evolvable_params``: Number of evolvable hyperparameters - ``evolvable_params``: List of evolvable parameter names **Dataset Metadata (Tags)** - ``dataset_samples``: Number of training samples - ``dataset_features``: Number of features - ``dataset_classes``: Number of classes (or 'regression') **Optimization Results (Tags)** - ``estimator_class``: Name of the estimator being optimized - ``early_stopped``: Whether optimization stopped early - ``stopped_at_generation``: Generation where optimization stopped - ``total_evaluations``: Total number of model evaluations - ``optimization_time_seconds``: Total optimization time Child Runs (Individual Evaluations) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Each individual evaluation creates a nested child run: - **Hyperparameters**: All hyperparameter values for that individual - **Fitness Metrics**: Evaluation scores (accuracy, balanced_accuracy, etc.) - **Generation Info**: Tags indicating which generation and individual index Default Storage Location ------------------------- By default, MLflow stores runs locally in the ``./mlruns/`` directory: .. code-block:: text ./mlruns/ ├── 0/ # Default experiment ├── 1/ # mloptimizer experiment │ ├── meta.yaml │ └── / │ ├── meta.yaml │ ├── metrics/ │ │ ├── generation_best_fitness │ │ ├── generation_avg_fitness │ │ └── final_best_fitness │ ├── params/ │ │ ├── population_size │ │ ├── generations │ │ └── ... │ └── tags/ │ ├── estimator_class │ ├── dataset_samples │ └── ... Custom Experiment Name ---------------------- You can specify a custom experiment name using the MLflow API before creating your ``GeneticSearch``: .. code-block:: python import mlflow from mloptimizer.interfaces import GeneticSearch # Set custom experiment name mlflow.set_experiment("breast_cancer_optimization") # Create GeneticSearch - will log to this experiment opt = GeneticSearch( estimator_class=YourEstimator, hyperparam_space=space, use_mlflow=True ) opt.fit(X, y) Example with Early Stopping ---------------------------- MLflow tracking captures early stopping information: .. code-block:: python from mloptimizer.interfaces import GeneticSearch from sklearn.ensemble import GradientBoostingClassifier opt = GeneticSearch( estimator_class=GradientBoostingClassifier, hyperparam_space=space, cv=cv, scoring="balanced_accuracy", generations=20, population_size=30, early_stopping=True, # Enable early stopping patience=5, min_delta=0.001, use_mlflow=True # Track everything ) opt.fit(X, y) The MLflow run will include: - Tag indicating early stopping was enabled - Tag showing which generation optimization stopped at - Tag showing the reason (no improvement for N generations) - Final best fitness achieved Disabling MLflow ----------------- MLflow tracking is disabled by default. To explicitly disable it: .. code-block:: python opt = GeneticSearch( estimator_class=YourEstimator, hyperparam_space=space, use_mlflow=False # Disable MLflow (default) ) When disabled, no MLflow data is logged, and the library works without requiring MLflow to be installed. .. note:: MLflow tracking adds minimal overhead to optimization time. The benefits of comprehensive experiment tracking typically outweigh the small performance cost. .. tip:: For quick experiments, use local MLflow storage. For production or team collaboration, configure a remote MLflow tracking server (see :doc:`mlflow_remote`).