.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_xgboost_hyperparam_opt_comparison.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_xgboost_hyperparam_opt_comparison.py: XGBoost - Genetic vs Grid Search vs Random Search vs Bayesian Optimization ========================================================================== mloptimizer example optimization of iris dataset comparing hyperparameter tuning techniques: 1) Genetic optimization - mloptimizer 2) Grid Search - scikit-learn 3) Random Search - scikit-learn 4) Bayesian Optimization - hyperopt .. GENERATED FROM PYTHON SOURCE LINES 13-16 Imports ------- The necessary libraries for the example are imported. .. GENERATED FROM PYTHON SOURCE LINES 16-40 .. code-block:: default import pandas as pd import numpy as np from time import time from functools import reduce import plotly from mloptimizer.domain.hyperspace import HyperparameterSpace, Hyperparam from mloptimizer.domain.evaluation import kfold_stratified_score from mloptimizer.application.reporting.plots import plotly_search_space from mloptimizer.interfaces import HyperparameterSpaceBuilder, GeneticSearch from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, cross_val_score, \ StratifiedKFold from sklearn.datasets import load_iris from xgboost import XGBClassifier from hyperopt import STATUS_OK, hp, tpe from hyperopt import Trials, fmin width = 1000 height = 1000 .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/alpha/lib/python3.10/site-packages/hyperopt/atpe.py:19: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. .. GENERATED FROM PYTHON SOURCE LINES 41-65 1) Dataset Description ---------------------- The Iris dataset is a classic dataset used for classification tasks. It consists of 150 samples, each representing an iris flower. Features: - Sepal Length: Length of the sepal in centimeters. - Sepal Width: Width of the sepal in centimeters. - Petal Length: Length of the petal in centimeters. - Petal Width: Width of the petal in centimeters. Target: - Species of the flower, which can be one of the following three classes: 1. Setosa 2. Versicolor 3. Virginica Characteristics: - The dataset is balanced, containing 50 samples for each class. - Features are continuous and have varying scales, which may require normalization or standardization for certain machine learning algorithms. .. GENERATED FROM PYTHON SOURCE LINES 65-72 .. code-block:: default name = 'iris' X, y = load_iris(return_X_y=True) print(f"1) Description of the dataset") print(f"Dataset: {name}, X shape: {X.shape}, y shape: {y.shape}") .. rst-class:: sphx-glr-script-out .. code-block:: none 1) Description of the dataset Dataset: iris, X shape: (150, 4), y shape: (150,) .. GENERATED FROM PYTHON SOURCE LINES 73-101 2) Genetic Search of XGBoost Algorithm -------------------------------------------- Genetic search optimization is performed using the mloptimizer library to fine-tune the hyperparameters of the XGBoost algorithm. Hyperparameters to Optimize: - `colsample_bytree`: Subsample ratio of columns when constructing each tree. - `gamma`: Minimum loss reduction required to make a further partition on a leaf node of the tree. - `learning_rate`: Step size shrinkage used in updates to prevent overfitting. - `max_depth`: Maximum depth of a tree. - `n_estimators`: Number of boosting rounds. - `subsample`: Subsample ratio of the training instances. Optimization Process: - Population Size: 10 - Generations: 10 - Fitness Score: Balanced accuracy - Evaluation Function: Stratified k-fold cross-validation with 5 folds The genetic optimization explores the hyperparameter space defined by the evolvable hyperparameters. It searches within the defined minimum and maximum values for each hyperparameter. Advantages: - Provides a more exhaustive search compared to grid search and random search. - Can be more efficient in finding optimal hyperparameters for the XGBoost algorithm. .. GENERATED FROM PYTHON SOURCE LINES 101-147 .. code-block:: default print(f"2) Genetic optimization of the algorithm XGBoost") fixed_hyperparams = {} evolvable_hyperparams = { 'colsample_bytree': Hyperparam('colsample_bytree', 3, 10, 'float', 10), 'gamma': Hyperparam('gamma', 0, 20, 'int'), 'learning_rate': Hyperparam('learning_rate', 1, 100, 'float', 1000), 'max_depth': Hyperparam('max_depth', 2, 20, 'int'), 'n_estimators': Hyperparam('n_estimators', 100, 500, 'int'), 'subsample': Hyperparam('subsample', 700, 1000, 'float', 1000) } hyperparameter_space = HyperparameterSpace(fixed_hyperparams, evolvable_hyperparams) population_size = 10 generations = 10 cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=0) opt = GeneticSearch( estimator_class=XGBClassifier, hyperparam_space=hyperparameter_space, **{"generations": generations, "population_size": population_size}, #eval_function=kfold_stratified_score, cv=cv, scoring="balanced_accuracy", seed=0, use_parallel=False ) t0_gen = time() clf = opt.fit(X,y) # Aprox 100 elements t1_gen = time() print(f"Genetic optimization around {population_size * (generations + 1)} algorithm executions") execution_time_gen = round(t1_gen - t0_gen, 2) print(f"Time of the genetic optimization {execution_time_gen} s") population_df = opt.populations_ print(f"Genetic optimization {population_df.shape[0]} algorithm executions") df = population_df[list(hyperparameter_space.evolvable_hyperparams.keys()) + ['fitness']] fig_gen = plotly_search_space(df).update_layout( autosize=True, width=width, # Adjust width as needed height=height, # Adjust height as needed margin=dict(l=20, r=20, t=50, b=20) # Adjust margins as needed ) plotly.io.show(fig_gen) .. raw:: html :file: images/sphx_glr_plot_xgboost_hyperparam_opt_comparison_001.html .. rst-class:: sphx-glr-script-out .. code-block:: none 2) Genetic optimization of the algorithm XGBoost Genetic execution: 0%| | 0/11 [00:00
Method Best Metric Time Evaluated
0 Genetic 0.946667 18.76 110
1 Grid Search 0.960000 29.29 108
2 Random Search 0.960000 31.12 110
3 Bayesian Search 0.960000 30.95 110


.. rst-class:: sphx-glr-timing **Total running time of the script:** (1 minutes 59.645 seconds) .. _sphx_glr_download_auto_examples_plot_xgboost_hyperparam_opt_comparison.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_xgboost_hyperparam_opt_comparison.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_xgboost_hyperparam_opt_comparison.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_