.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_hist_gradient_boosting.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_hist_gradient_boosting.py: HistGradientBoosting Optimization ================================== Hyperparameter optimization for sklearn's fast HistGradientBoosting algorithms. .. GENERATED FROM PYTHON SOURCE LINES 6-16 .. code-block:: default from sklearn.datasets import load_breast_cancer from sklearn.ensemble import HistGradientBoostingClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, f1_score import numpy as np import plotly from mloptimizer.interfaces import HyperparameterSpaceBuilder, GeneticSearch from mloptimizer.application.reporting.plots import plotly_search_space, plotly_logbook .. GENERATED FROM PYTHON SOURCE LINES 17-18 Load and prepare the dataset .. GENERATED FROM PYTHON SOURCE LINES 18-24 .. code-block:: default print("Loading Breast Cancer dataset...") data = load_breast_cancer() X, y = data.data, data.target print(f"Dataset shape: {X.shape}") .. rst-class:: sphx-glr-script-out .. code-block:: none Loading Breast Cancer dataset... Dataset shape: (569, 30) .. GENERATED FROM PYTHON SOURCE LINES 25-26 Split the data .. GENERATED FROM PYTHON SOURCE LINES 26-30 .. code-block:: default X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y ) .. GENERATED FROM PYTHON SOURCE LINES 31-32 Define the hyperparameter space .. GENERATED FROM PYTHON SOURCE LINES 32-36 .. code-block:: default hyperparam_space = HyperparameterSpaceBuilder.get_default_space( estimator_class=HistGradientBoostingClassifier ) .. GENERATED FROM PYTHON SOURCE LINES 37-38 Configure and run the genetic optimization .. GENERATED FROM PYTHON SOURCE LINES 38-58 .. code-block:: default genetic_params = { 'generations': 5, 'population_size': 8, 'n_elites': 2, 'seed': 42, 'use_mlflow': False, 'use_parallel': False } opt = GeneticSearch( estimator_class=HistGradientBoostingClassifier, hyperparam_space=hyperparam_space, cv=3, scoring='accuracy', **genetic_params ) print("Starting HistGradientBoostingClassifier optimization...") opt.fit(X_train, y_train) .. rst-class:: sphx-glr-script-out .. code-block:: none Starting HistGradientBoostingClassifier optimization... Genetic execution: 0%| | 0/6 [00:00
GeneticSearch(cv=StratifiedKFold(n_splits=3, random_state=42, shuffle=True),
                  estimator_class=<class 'sklearn.ensemble._hist_gradient_boosting.gradient_boosting.HistGradientBoostingClassifier'>,
                  generations=5,
                  hyperparam_space=HyperparameterSpace(fixed_hyperparams={}, evolvable_hyperparams={'learning_rate': Hyperparam('learning_rate', 1, 100, 'float', 1000), 'ma...': Hyperparam('max_depth', 2, 15, 'int'), 'max_iter': Hyperparam('max_iter', 50, 500, 'int'), 'max_leaf_nodes': Hyperparam('max_leaf_nodes', 20, 100, 'int'), 'min_samples_leaf': Hyperparam('min_samples_leaf', 10, 50, 'int'), 'l2_regularization': Hyperparam('l2_regularization', 0, 10, 'float', 10)}),
                  n_elites=2, population_size=8, scoring='accuracy', seed=42,
                  use_parallel=False)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 59-60 Evaluate the optimized model .. GENERATED FROM PYTHON SOURCE LINES 60-70 .. code-block:: default best_clf = opt.best_estimator_ y_pred = best_clf.predict(X_test) test_accuracy = accuracy_score(y_test, y_pred) test_f1 = f1_score(y_test, y_pred, average='binary') print(f"\nOptimization completed!") print(f"Best parameters: {opt.best_params_}") print(f"Test accuracy: {test_accuracy:.4f}") print(f"Test F1: {test_f1:.4f}") .. rst-class:: sphx-glr-script-out .. code-block:: none Optimization completed! Best parameters: {'categorical_features': 'warn', 'class_weight': None, 'early_stopping': 'auto', 'interaction_cst': None, 'l2_regularization': 0.9, 'learning_rate': 0.09, 'loss': 'log_loss', 'max_bins': 255, 'max_depth': 2, 'max_features': 1.0, 'max_iter': 264, 'max_leaf_nodes': 48, 'min_samples_leaf': 50, 'monotonic_cst': None, 'n_iter_no_change': 10, 'random_state': 42, 'scoring': 'loss', 'tol': 1e-07, 'validation_fraction': 0.1, 'verbose': 0, 'warm_start': False} Test accuracy: 0.9561 Test F1: 0.9660 .. GENERATED FROM PYTHON SOURCE LINES 71-72 Visualize the search space .. GENERATED FROM PYTHON SOURCE LINES 72-84 .. code-block:: default population_df = opt.populations_ top_params = ['learning_rate', 'max_depth', 'max_iter', 'max_leaf_nodes', 'fitness'] df_filtered = population_df[top_params] g_search_space = plotly_search_space(df_filtered, top_params) g_search_space.update_layout( title="HistGradientBoostingClassifier Hyperparameter Search Space", autosize=True, width=None, height=650 ) plotly.io.show(g_search_space, config={'responsive': True}) .. raw:: html :file: images/sphx_glr_plot_hist_gradient_boosting_001.html .. GENERATED FROM PYTHON SOURCE LINES 85-86 Visualize the optimization evolution .. GENERATED FROM PYTHON SOURCE LINES 86-95 .. code-block:: default g_logbook = plotly_logbook(opt.logbook_, population_df) g_logbook.update_layout( title="HistGradientBoostingClassifier Optimization Evolution", autosize=True, width=None, height=500 ) plotly.io.show(g_logbook, config={'responsive': True}) .. raw:: html :file: images/sphx_glr_plot_hist_gradient_boosting_002.html .. GENERATED FROM PYTHON SOURCE LINES 96-97 Analyze optimization performance .. GENERATED FROM PYTHON SOURCE LINES 97-102 .. code-block:: default print("\n=== Optimization Performance ===") print(f"Unique evaluations performed: {opt.n_trials_}") print(f"Total individuals in population history: {len(population_df)}") print(f"Optimization time: {opt.optimization_time_:.4f} seconds") print(f"Time per evaluation: {opt.optimization_time_ / opt.n_trials_:.4f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none === Optimization Performance === Unique evaluations performed: 38 Total individuals in population history: 48 Optimization time: 9.2006 seconds Time per evaluation: 0.2421 seconds .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 10.543 seconds) .. _sphx_glr_download_auto_examples_plot_hist_gradient_boosting.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_hist_gradient_boosting.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_hist_gradient_boosting.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_