.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_xgboost_example.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_xgboost_example.py: XGBoost optimization with MLflow tracking ========================================= A complete example showing hyperparameter optimization for XGBoost with MLflow integration for experiment tracking. .. GENERATED FROM PYTHON SOURCE LINES 7-20 .. code-block:: default from sklearn.datasets import fetch_covtype from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score import xgboost as xgb import numpy as np import plotly import os from mloptimizer.interfaces import HyperparameterSpaceBuilder, GeneticSearch from mloptimizer.application.reporting.plots import (plotly_search_space, plotly_logbook, plot_logbook) import matplotlib.pyplot as plt .. GENERATED FROM PYTHON SOURCE LINES 21-23 Load and prepare a complex classification dataset ------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 23-36 .. code-block:: default print("Loading Forest CoverType dataset...") data = fetch_covtype() X, y = data.data, data.target y = y - 1 # Adjust labels to start from 0 # Use a subset for faster execution np.random.seed(42) sample_indices = np.random.choice(len(X), size=2000, replace=False) X = X[sample_indices] y = y[sample_indices] print(f"Dataset shape: {X.shape}") print(f"Number of classes: {len(np.unique(y))}") .. rst-class:: sphx-glr-script-out .. code-block:: none Loading Forest CoverType dataset... Dataset shape: (2000, 54) Number of classes: 7 .. GENERATED FROM PYTHON SOURCE LINES 37-38 Split the data .. GENERATED FROM PYTHON SOURCE LINES 38-42 .. code-block:: default X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y ) .. GENERATED FROM PYTHON SOURCE LINES 43-47 Define the XGBoost hyperparameter space using HyperparameterSpaceBuilder ------------------------------------------------------------------------- We can build a custom hyperparameter space by adding individual parameters. This gives fine-grained control over the search space for each hyperparameter. .. GENERATED FROM PYTHON SOURCE LINES 47-55 .. code-block:: default hyperparam_space = (HyperparameterSpaceBuilder() .add_int_param('max_depth', min_value=2, max_value=10) .add_float_param('learning_rate', min_value=10, max_value=30, scale=100) .add_int_param('n_estimators', min_value=50, max_value=300) .add_float_param('subsample', min_value=60, max_value=100, scale=100) .add_float_param('colsample_bytree', min_value=60, max_value=100, scale=100) .build()) .. GENERATED FROM PYTHON SOURCE LINES 56-65 Configure and run the genetic optimization WITH MLFLOW ------------------------------------------------------ Genetic Algorithm Configuration: - generations: Number of evolutionary iterations - population_size: Number of configurations per generation - n_elites: Number of best individuals preserved each generation - seed: Random seed for reproducibility - use_mlflow: Enable MLflow experiment tracking Note: Small values for documentation builds. For production, increase to 20+ generations. .. GENERATED FROM PYTHON SOURCE LINES 65-89 .. code-block:: default genetic_params = { 'generations': 5, 'population_size': 8, 'n_elites': 2, 'seed': 42, 'use_mlflow': True, 'use_parallel': False } opt = GeneticSearch( estimator_class=xgb.XGBClassifier, hyperparam_space=hyperparam_space, cv=3, scoring='accuracy', disable_file_output=False, **genetic_params ) print("Starting XGBoost optimization with MLflow tracking...") print(f"use_mlflow parameter: {opt.use_mlflow}") # Run the optimization opt.fit(X_train, y_train) .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/checkouts/master/examples/plot_xgboost_example.py:74: UserWarning: Some hyperparameters have very small integer ranges (< 10 distinct values): 'max_depth' (9 values: 2 to 10). Small ranges limit search granularity. Consider increasing the range or scale for float types. opt = GeneticSearch( Starting XGBoost optimization with MLflow tracking... use_mlflow parameter: True Genetic execution: 0%| | 0/6 [00:00`_ for more details. warnings.warn( Genetic execution: 17%|█▋ | 1/6 [00:02<00:13, 2.75s/it, best fitness=?]/home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( Genetic execution: 17%|█▋ | 1/6 [00:04<00:13, 2.75s/it, best fitness=0.708]/home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( Genetic execution: 17%|█▋ | 1/6 [00:05<00:13, 2.75s/it, best fitness=0.723]/home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( Genetic execution: 33%|███▎ | 2/6 [00:10<00:21, 5.42s/it, best fitness=0.723]/home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( Genetic execution: 50%|█████ | 3/6 [00:16<00:17, 5.70s/it, best fitness=0.723]/home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( Genetic execution: 50%|█████ | 3/6 [00:18<00:17, 5.70s/it, best fitness=0.726]/home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( Genetic execution: 67%|██████▋ | 4/6 [00:20<00:10, 5.39s/it, best fitness=0.726]/home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( Genetic execution: 67%|██████▋ | 4/6 [00:21<00:10, 5.39s/it, best fitness=0.726]/home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( Genetic execution: 83%|████████▎ | 5/6 [00:26<00:05, 5.44s/it, best fitness=0.726]/home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( Genetic execution: 83%|████████▎ | 5/6 [00:26<00:05, 5.44s/it, best fitness=0.73] /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( Genetic execution: 83%|████████▎ | 5/6 [00:29<00:05, 5.44s/it, best fitness=0.732]/home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( Genetic execution: 100%|██████████| 6/6 [00:32<00:00, 5.46s/it, best fitness=0.732]/home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/model_selection/_split.py:776: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=3. warnings.warn( Genetic execution: 100%|██████████| 6/6 [00:36<00:00, 6.09s/it, best fitness=0.732] /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): .. raw:: html
GeneticSearch(cv=StratifiedKFold(n_splits=3, random_state=42, shuffle=True),
                  estimator_class=<class 'xgboost.sklearn.XGBClassifier'>,
                  generations=5,
                  hyperparam_space=HyperparameterSpace(fixed_hyperparams={}, evolvable_hyperparams={'max_depth': Hyperparam('max_depth', 2, 10, 'int'), 'learning_rate': Hyperparam('learning_rate', 10, 30, 'float', 100), 'n_estimators': Hyperparam('n_estimators', 50, 300, 'int'), 'subsample': Hyperparam('subsample', 60, 100, 'float', 100), 'colsample_bytree': Hyperparam('colsample_bytree', 60, 100, 'float', 100)}),
                  n_elites=2, population_size=8, scoring='accuracy', seed=42,
                  use_mlflow=True, use_parallel=False)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 90-91 Get results and evaluate .. GENERATED FROM PYTHON SOURCE LINES 91-104 .. code-block:: default best_clf = opt.best_estimator_ y_pred = best_clf.predict(X_test) test_accuracy = accuracy_score(y_test, y_pred) precision = precision_score(y_test, y_pred, average='weighted') recall = recall_score(y_test, y_pred, average='weighted') f1 = f1_score(y_test, y_pred, average='weighted') print(f"\nOptimization completed!") print(f"Test accuracy: {test_accuracy:.4f}") print(f"Test precision: {precision:.4f}") print(f"Test recall: {recall:.4f}") print(f"Test F1: {f1:.4f}") .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/sklearn/metrics/_classification.py:1531: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) Optimization completed! Test accuracy: 0.7025 Test precision: 0.6866 Test recall: 0.7025 Test F1: 0.6895 .. GENERATED FROM PYTHON SOURCE LINES 105-106 Generate visualizations .. GENERATED FROM PYTHON SOURCE LINES 106-120 .. code-block:: default population_df = opt.populations_ # Search space visualization top_params = ['max_depth', 'learning_rate', 'n_estimators', 'subsample', 'fitness'] df_filtered = population_df[top_params] g_search_space = plotly_search_space(df_filtered, top_params) g_search_space.update_layout( title="XGBoost Hyperparameter Search Space - CoverType Dataset", autosize=True, width=None, height=650 ) plotly.io.show(g_search_space, config={'responsive': True}) .. raw:: html :file: images/sphx_glr_plot_xgboost_example_001.html .. GENERATED FROM PYTHON SOURCE LINES 121-122 Simple logbook visualization .. GENERATED FROM PYTHON SOURCE LINES 122-126 .. code-block:: default g_logbook_s = plot_logbook(opt.logbook_) # Show plot plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_xgboost_example_002.png :alt: plot xgboost example :srcset: /auto_examples/images/sphx_glr_plot_xgboost_example_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): .. GENERATED FROM PYTHON SOURCE LINES 127-128 Evolution logbook visualization .. GENERATED FROM PYTHON SOURCE LINES 128-137 .. code-block:: default g_logbook = plotly_logbook(opt.logbook_, population_df) g_logbook.update_layout( title="XGBoost Optimization Evolution - CoverType Dataset", autosize=True, width=None, height=500 ) plotly.io.show(g_logbook, config={'responsive': True}) .. raw:: html :file: images/sphx_glr_plot_xgboost_example_003.html .. GENERATED FROM PYTHON SOURCE LINES 138-139 Analyze optimization results .. GENERATED FROM PYTHON SOURCE LINES 139-157 .. code-block:: default print("\n=== Optimization Analysis ===") print(f"Unique evaluations performed: {opt.n_trials_}") print(f"Total individuals in population history: {len(population_df)}") print(f"Optimization time: {opt.optimization_time_:.4f} seconds") print(f"Time per evaluation: {opt.optimization_time_ / opt.n_trials_:.4f} seconds") print(f"Generations completed: {opt.generations}") final_gen = population_df[population_df['population'] == opt.generations] initial_gen = population_df[population_df['population'] == 1] final_avg_fitness = final_gen['fitness'].mean() initial_avg_fitness = initial_gen['fitness'].mean() improvement = final_avg_fitness - initial_avg_fitness print(f"Average fitness improvement: {improvement:.4f}") print(f"Initial average fitness: {initial_avg_fitness:.4f}") print(f"Final average fitness: {final_avg_fitness:.4f}") .. rst-class:: sphx-glr-script-out .. code-block:: none === Optimization Analysis === Unique evaluations performed: 38 Total individuals in population history: 48 Optimization time: 37.7855 seconds Time per evaluation: 0.9944 seconds Generations completed: 5 Average fitness improvement: 0.0112 Initial average fitness: 0.7138 Final average fitness: 0.7250 .. GENERATED FROM PYTHON SOURCE LINES 158-171 .. code-block:: default # Access generated files print("\n=== Generated Files ===") graphics_path = opt._optimizer_service.optimizer.tracker.graphics_path results_path = opt._optimizer_service.optimizer.tracker.results_path print(f"Graphics path: {graphics_path}") if os.path.exists(graphics_path): print("Graphics files:", [f for f in os.listdir(graphics_path) if f.endswith('.html')]) print(f"Results path: {results_path}") if os.path.exists(results_path): print("Results files:", [f for f in os.listdir(results_path) if f.endswith('.csv')]) .. rst-class:: sphx-glr-script-out .. code-block:: none === Generated Files === Graphics path: ./20260406_230807_XGBClassifier/graphics Graphics files: ['logbook.html', 'search_space.html'] Results path: ./20260406_230807_XGBClassifier/results Results files: ['logbook.csv', 'populations.csv'] .. GENERATED FROM PYTHON SOURCE LINES 172-194 MLflow UI Instructions ---------------------- To inspect the results recorded during the optimization, you can launch the MLflow user interface from a terminal. **Starting the MLflow UI** Open a console and run:: mlflow ui --port 5000 Then open a web browser and go to: http://localhost:5000 **In the MLflow UI you can** - View all optimization runs in the experiment - Compare hyperparameters and metrics across runs - See the evolution of fitness scores across generations - Inspect logs and stored artifacts (TODO) - Track model performance and optimization progress .. rst-class:: sphx-glr-timing **Total running time of the script:** (1 minutes 26.251 seconds) .. _sphx_glr_download_auto_examples_plot_xgboost_example.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_xgboost_example.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_xgboost_example.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_