.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_xgboost_hyperparam_opt_comparison.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_xgboost_hyperparam_opt_comparison.py: XGBoost - Genetic vs Grid Search vs Random Search vs Bayesian Optimization ========================================================================== mloptimizer example optimization of iris dataset comparing hyperparameter tuning techniques: 1) Genetic optimization - mloptimizer 2) Grid Search - scikit-learn 3) Random Search - scikit-learn 4) Bayesian Optimization - hyperopt .. GENERATED FROM PYTHON SOURCE LINES 13-16 Imports ------- The necessary libraries for the example are imported. .. GENERATED FROM PYTHON SOURCE LINES 16-35 .. code-block:: default import pandas as pd import numpy as np from time import time import plotly from mloptimizer.domain.hyperspace import HyperparameterSpace, Hyperparam from mloptimizer.application.reporting.plots import plotly_search_space from mloptimizer.interfaces import GeneticSearch from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, cross_val_score, \ StratifiedKFold from sklearn.datasets import load_iris from xgboost import XGBClassifier from hyperopt import STATUS_OK, hp, tpe from hyperopt import Trials, fmin .. GENERATED FROM PYTHON SOURCE LINES 36-60 1) Dataset Description ---------------------- The Iris dataset is a classic dataset used for classification tasks. It consists of 150 samples, each representing an iris flower. Features: - Sepal Length: Length of the sepal in centimeters. - Sepal Width: Width of the sepal in centimeters. - Petal Length: Length of the petal in centimeters. - Petal Width: Width of the petal in centimeters. Target: - Species of the flower, which can be one of the following three classes: 1. Setosa 2. Versicolor 3. Virginica Characteristics: - The dataset is balanced, containing 50 samples for each class. - Features are continuous and have varying scales, which may require normalization or standardization for certain machine learning algorithms. .. GENERATED FROM PYTHON SOURCE LINES 60-67 .. code-block:: default name = 'iris' X, y = load_iris(return_X_y=True) print(f"1) Description of the dataset") print(f"Dataset: {name}, X shape: {X.shape}, y shape: {y.shape}") .. rst-class:: sphx-glr-script-out .. code-block:: none 1) Description of the dataset Dataset: iris, X shape: (150, 4), y shape: (150,) .. GENERATED FROM PYTHON SOURCE LINES 68-116 2) Genetic Search of XGBoost Algorithm --------------------------------------- Genetic search optimization is performed using the mloptimizer library to fine-tune the hyperparameters of the XGBoost algorithm. Hyperparameters to Optimize: - `colsample_bytree`: Subsample ratio of columns when constructing each tree. - `gamma`: Minimum loss reduction required to make a further partition on a leaf node of the tree. - `learning_rate`: Step size shrinkage used in updates to prevent overfitting. - `max_depth`: Maximum depth of a tree. - `n_estimators`: Number of boosting rounds. - `subsample`: Subsample ratio of the training instances. Optimization Process: - Population Size: 15 - Generations: 10 - Fitness Score: Balanced accuracy - Evaluation Function: Stratified k-fold cross-validation with 5 folds The genetic optimization explores the hyperparameter space defined by the evolvable hyperparameters. It searches within the defined minimum and maximum values for each hyperparameter. Advantages: - Provides a more exhaustive search compared to grid search and random search. - Can be more efficient in finding optimal hyperparameters for the XGBoost algorithm. Genetic Algorithm Configuration -------------------------------- The following parameters control the behavior of the genetic algorithm: - `population_size`: Number of individuals (hyperparameter configurations) in each generation. - `generations`: Number of evolutionary iterations to perform. - `n_elites`: Number of best individuals to preserve unchanged in the next generation (elitism). - `tournsize`: Tournament size for selection (number of individuals competing in each tournament). - `cxpb`: Probability of mating (crossover) two individuals (0.0 to 1.0). - `mutpb`: Probability of mutating an individual (0.0 to 1.0). - `indpb`: Independent probability of mutating each hyperparameter within an individual (0.0 to 1.0). - `early_stopping`: Enable early stopping if fitness does not improve for a number of generations. - `patience`: Number of generations to wait for improvement before stopping early. - `min_delta`: Minimum change in fitness to be considered an improvement. - `seed`: Random seed for reproducibility. - `use_parallel`: Whether to use parallel evaluation of individuals. Note: Values reduced for faster documentation builds. For production comparison, use generations=20-30 and population_size=20-30 for more robust results. .. GENERATED FROM PYTHON SOURCE LINES 116-166 .. code-block:: default print(f"2) Genetic Search optimization of XGBoost") fixed_hyperparams = {} evolvable_hyperparams = { 'colsample_bytree': Hyperparam('colsample_bytree', 3, 10, 'float', 10), 'gamma': Hyperparam('gamma', 0, 20, 'int'), 'learning_rate': Hyperparam('learning_rate', 1, 100, 'float', 1000), 'max_depth': Hyperparam('max_depth', 2, 20, 'int'), 'n_estimators': Hyperparam('n_estimators', 100, 500, 'int'), 'subsample': Hyperparam('subsample', 700, 1000, 'float', 1000) } hyperparameter_space = HyperparameterSpace(fixed_hyperparams, evolvable_hyperparams) genetic_params = { 'generations': 8, 'population_size': 8, 'n_elites': 2, 'tournsize': 3, 'cxpb': 0.5, 'mutpb': 0.8, 'indpb': 0.2, 'early_stopping': True, 'patience': 4, 'min_delta': 0.005, 'seed': 0, 'use_parallel': False } cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=0) opt = GeneticSearch( estimator_class=XGBClassifier, hyperparam_space=hyperparameter_space, cv=cv, scoring="balanced_accuracy", **genetic_params ) t0_gen = time() clf = opt.fit(X, y) t1_gen = time() execution_time_gen = round(t1_gen - t0_gen, 2) print(f"Time of the Genetic Search optimization: {execution_time_gen} s") population_df = opt.populations_ print(f"Genetic Search evaluated {population_df.shape[0]} configurations") population_df_filtered = population_df[list(hyperparameter_space.evolvable_hyperparams.keys()) + ['fitness']] fig_gen = plotly_search_space(population_df_filtered) fig_gen.update_layout(autosize=True, width=None, height=650) plotly.io.show(fig_gen, config={'responsive': True}) .. raw:: html :file: images/sphx_glr_plot_xgboost_hyperparam_opt_comparison_001.html .. rst-class:: sphx-glr-script-out .. code-block:: none 2) Genetic Search optimization of XGBoost /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/checkouts/master/examples/plot_xgboost_hyperparam_opt_comparison.py:146: UserWarning: Some hyperparameters have very small integer ranges (< 10 distinct values): 'colsample_bytree' (8 values: 0.300 to 1.000). Small ranges limit search granularity. Consider increasing the range or scale for float types. opt = GeneticSearch( Genetic execution: 0%| | 0/9 [00:00
Method Best Metric Time Evaluated colsample_bytree gamma learning_rate max_depth n_estimators subsample
0 Genetic 0.953333 13.28 56 0.400000 1 0.080000 8 377 0.718000
1 Grid Search 0.946667 5.15 32 0.300000 5 0.100000 2 200 0.700000
2 Random Search 0.946667 6.66 30 0.400000 15 0.100000 5 400 0.785714
3 Bayesian Optimization 0.946667 6.77 30 0.583052 16 0.206965 1 298 0.690069


.. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 38.807 seconds) .. _sphx_glr_download_auto_examples_plot_xgboost_hyperparam_opt_comparison.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_xgboost_hyperparam_opt_comparison.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_xgboost_hyperparam_opt_comparison.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_