.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_catboost_example.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_catboost_example.py: CatBoost optimization example ============================== A simple example showing hyperparameter optimization for CatBoost Classifier with genetic algorithms. .. GENERATED FROM PYTHON SOURCE LINES 7-19 .. code-block:: default from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score from catboost import CatBoostClassifier import plotly import os from mloptimizer.interfaces import HyperparameterSpaceBuilder, GeneticSearch from mloptimizer.application.reporting.plots import (plotly_search_space, plotly_logbook, plot_logbook) import matplotlib.pyplot as plt .. GENERATED FROM PYTHON SOURCE LINES 20-22 Load and prepare the dataset ----------------------------- .. GENERATED FROM PYTHON SOURCE LINES 22-29 .. code-block:: default print("Loading Breast Cancer dataset...") data = load_breast_cancer() X, y = data.data, data.target print(f"Dataset shape: {X.shape}") print(f"Number of classes: {len(set(y))}") .. rst-class:: sphx-glr-script-out .. code-block:: none Loading Breast Cancer dataset... Dataset shape: (569, 30) Number of classes: 2 .. GENERATED FROM PYTHON SOURCE LINES 30-31 Split the data .. GENERATED FROM PYTHON SOURCE LINES 31-35 .. code-block:: default X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y ) .. GENERATED FROM PYTHON SOURCE LINES 36-42 Define the CatBoost hyperparameter space ----------------------------------------- CatBoost has specific hyperparameters that differ from other gradient boosting libraries. We can use the default hyperparameter space or build a custom one. Option 1: Load default space (recommended for quick start) .. GENERATED FROM PYTHON SOURCE LINES 42-59 .. code-block:: default hyperparam_space = HyperparameterSpaceBuilder.get_default_space( estimator_class=CatBoostClassifier ) # Option 2: Build custom space (uncomment to use) # hyperparam_space = (HyperparameterSpaceBuilder() # .add_int_param('depth', min_value=4, max_value=10) # .add_float_param('learning_rate', min_value=1, max_value=10, scale=100) # .add_int_param('iterations', min_value=50, max_value=200) # .add_float_param('subsample', min_value=700, max_value=1000, scale=1000) # .set_fixed_param('verbose', False) # .build()) print("\nHyperparameter space configuration:") print(f"Evolvable parameters: {list(hyperparam_space.evolvable_hyperparams.keys())}") print(f"Fixed parameters: {list(hyperparam_space.fixed_hyperparams.keys())}") .. rst-class:: sphx-glr-script-out .. code-block:: none Hyperparameter space configuration: Evolvable parameters: ['learning_rate', 'depth', 'n_estimators', 'subsample', 'l2_leaf_reg', 'colsample_bylevel', 'random_strength'] Fixed parameters: ['auto_class_weights', 'bootstrap_type', 'allow_writing_files', 'verbose', 'thread_count'] .. GENERATED FROM PYTHON SOURCE LINES 60-68 Configure and run the genetic optimization ------------------------------------------ Genetic Algorithm Configuration: - generations: Number of evolutionary iterations - population_size: Number of configurations per generation - n_elites: Number of best individuals preserved each generation - seed: Random seed for reproducibility Note: Small values for faster documentation builds. For production, increase to 20+ generations. .. GENERATED FROM PYTHON SOURCE LINES 68-92 .. code-block:: default genetic_params = { 'generations': 5, 'population_size': 8, 'n_elites': 2, 'seed': 42, 'use_parallel': False } opt = GeneticSearch( estimator_class=CatBoostClassifier, hyperparam_space=hyperparam_space, cv=3, scoring='accuracy', disable_file_output=False, **genetic_params ) print("\nStarting CatBoost optimization...") print(f"Generations: {opt.generations}") print(f"Population size: {opt.population_size}") # Run the optimization opt.fit(X_train, y_train) .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/checkouts/master/examples/plot_catboost_example.py:76: UserWarning: Some hyperparameters have very small integer ranges (< 10 distinct values): 'depth' (7 values: 4 to 10). Small ranges limit search granularity. Consider increasing the range or scale for float types. opt = GeneticSearch( Starting CatBoost optimization... Generations: 5 Population size: 8 Genetic execution: 0%| | 0/6 [00:00
GeneticSearch(cv=StratifiedKFold(n_splits=3, random_state=42, shuffle=True),
                  estimator_class=<class 'catboost.core.CatBoostClassifier'>,
                  generations=5,
                  hyperparam_space=HyperparameterSpace(fixed_hyperparams={'auto_class_weights': 'Balanced', 'bootstrap_type': 'Bernoulli', 'allow_writing_files': False, 'verbose': False, 'thread_count': 1}, evolvable_hyperparams={'l...ram('n_estimators', 100, 500, 'int'), 'subsample': Hyperparam('subsample', 700, 1000, 'float', 1000), 'l2_leaf_reg': Hyperparam('l2_leaf_reg', 1, 10, 'int'), 'colsample_bylevel': Hyperparam('colsample_bylevel', 50, 100, 'float', 100), 'random_strength': Hyperparam('random_strength', 1, 10, 'int')}),
                  n_elites=2, population_size=8, scoring='accuracy', seed=42,
                  use_parallel=False)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 93-94 Get results and evaluate .. GENERATED FROM PYTHON SOURCE LINES 94-109 .. code-block:: default best_clf = opt.best_estimator_ y_pred = best_clf.predict(X_test) test_accuracy = accuracy_score(y_test, y_pred) precision = precision_score(y_test, y_pred, average='binary') recall = recall_score(y_test, y_pred, average='binary') f1 = f1_score(y_test, y_pred, average='binary') print(f"\nOptimization completed!") print(f"Best hyperparameters: {opt.best_params_}") print(f"\nTest performance:") print(f" Accuracy: {test_accuracy:.4f}") print(f" Precision: {precision:.4f}") print(f" Recall: {recall:.4f}") print(f" F1 Score: {f1:.4f}") .. rst-class:: sphx-glr-script-out .. code-block:: none Optimization completed! Best hyperparameters: {'learning_rate': 0.02, 'depth': 4, 'l2_leaf_reg': 5, 'thread_count': 1, 'verbose': False, 'auto_class_weights': 'Balanced', 'random_strength': 8, 'allow_writing_files': False, 'bootstrap_type': 'Bernoulli', 'subsample': 0.835, 'n_estimators': 479, 'colsample_bylevel': 0.86, 'random_state': 42} Test performance: Accuracy: 0.9649 Precision: 0.9722 Recall: 0.9722 F1 Score: 0.9722 .. GENERATED FROM PYTHON SOURCE LINES 110-111 Generate visualizations .. GENERATED FROM PYTHON SOURCE LINES 111-125 .. code-block:: default population_df = opt.populations_ # Search space visualization top_params = ['depth', 'learning_rate', 'n_estimators', 'l2_leaf_reg', 'fitness'] df_filtered = population_df[top_params] g_search_space = plotly_search_space(df_filtered, top_params) g_search_space.update_layout( title="CatBoost Hyperparameter Search Space - Breast Cancer Dataset", autosize=True, width=None, height=650 ) plotly.io.show(g_search_space, config={'responsive': True}) .. raw:: html :file: images/sphx_glr_plot_catboost_example_001.html .. GENERATED FROM PYTHON SOURCE LINES 126-127 Simple logbook visualization .. GENERATED FROM PYTHON SOURCE LINES 127-130 .. code-block:: default g_logbook_s = plot_logbook(opt.logbook_) # plt.show() # Commented out for non-interactive environments .. image-sg:: /auto_examples/images/sphx_glr_plot_catboost_example_002.png :alt: plot catboost example :srcset: /auto_examples/images/sphx_glr_plot_catboost_example_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/docs/checkouts/readthedocs.org/user_builds/mloptimizer/envs/master/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): .. GENERATED FROM PYTHON SOURCE LINES 131-132 Evolution logbook visualization .. GENERATED FROM PYTHON SOURCE LINES 132-141 .. code-block:: default g_logbook = plotly_logbook(opt.logbook_, population_df) g_logbook.update_layout( title="CatBoost Optimization Evolution - Breast Cancer Dataset", autosize=True, width=None, height=500 ) plotly.io.show(g_logbook, config={'responsive': True}) .. raw:: html :file: images/sphx_glr_plot_catboost_example_003.html .. GENERATED FROM PYTHON SOURCE LINES 142-143 Analyze optimization results .. GENERATED FROM PYTHON SOURCE LINES 143-162 .. code-block:: default print("\n=== Optimization Analysis ===") print(f"Unique evaluations performed: {opt.n_trials_}") print(f"Total individuals in population history: {len(population_df)}") print(f"Optimization time: {opt.optimization_time_:.4f} seconds") print(f"Time per evaluation: {opt.optimization_time_ / opt.n_trials_:.4f} seconds") print(f"Generations completed: {opt.generations}") final_gen = population_df[population_df['population'] == opt.generations] initial_gen = population_df[population_df['population'] == 1] final_avg_fitness = final_gen['fitness'].mean() initial_avg_fitness = initial_gen['fitness'].mean() improvement = final_avg_fitness - initial_avg_fitness print(f"\nFitness progression:") print(f" Initial average fitness: {initial_avg_fitness:.4f}") print(f" Final average fitness: {final_avg_fitness:.4f}") print(f" Average improvement: {improvement:.4f}") .. rst-class:: sphx-glr-script-out .. code-block:: none === Optimization Analysis === Unique evaluations performed: 28 Total individuals in population history: 48 Optimization time: 109.3850 seconds Time per evaluation: 3.9066 seconds Generations completed: 5 Fitness progression: Initial average fitness: 0.9772 Final average fitness: 0.9783 Average improvement: 0.0011 .. GENERATED FROM PYTHON SOURCE LINES 163-164 Access generated files .. GENERATED FROM PYTHON SOURCE LINES 164-176 .. code-block:: default print("\n=== Generated Files ===") graphics_path = opt._optimizer_service.optimizer.tracker.graphics_path results_path = opt._optimizer_service.optimizer.tracker.results_path print(f"Graphics path: {graphics_path}") if os.path.exists(graphics_path): print(" Graphics files:", [f for f in os.listdir(graphics_path) if f.endswith('.html')]) print(f"Results path: {results_path}") if os.path.exists(results_path): print(" Results files:", [f for f in os.listdir(results_path) if f.endswith('.csv')]) .. rst-class:: sphx-glr-script-out .. code-block:: none === Generated Files === Graphics path: ./20260406_230846_CatBoostClassifier/graphics Graphics files: ['logbook.html', 'search_space.html'] Results path: ./20260406_230846_CatBoostClassifier/results Results files: ['logbook.csv', 'populations.csv'] .. GENERATED FROM PYTHON SOURCE LINES 177-199 CatBoost-specific features -------------------------- CatBoost offers several unique features: - **Automatic handling of categorical features**: No need for manual encoding - **Balanced class weights**: Set via auto_class_weights='Balanced' (included in default space) - **GPU acceleration**: Add 'task_type': 'GPU' to fixed parameters - **Feature importance**: Access via ``best_clf.feature_importances_`` Example of adding categorical feature support: .. code-block:: python # If you have categorical features cat_features = [0, 2, 4] # Indices of categorical columns hyperparam_space = (HyperparameterSpaceBuilder() .add_int_param('depth', 4, 10) .add_float_param('learning_rate', 1, 10, scale=100) .set_fixed_param('cat_features', cat_features) .set_fixed_param('verbose', False) .build()) .. rst-class:: sphx-glr-timing **Total running time of the script:** (1 minutes 51.535 seconds) .. _sphx_glr_download_auto_examples_plot_catboost_example.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_catboost_example.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_catboost_example.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_