GeneticSearch#
This class provides a genetic algorithm-based optimization framework.
|
Run the genetic algorithm optimization to fit the best model. |
|
Make predictions using the best estimator found by the optimization process. |
|
Return the score of the best estimator on the given test data and labels. |
- class mloptimizer.interfaces.GeneticSearch(estimator_class, hyperparam_space, eval_function: callable = None, seed=None, scoring=None, use_parallel=True, cv=None, use_mlflow=False, disable_file_output=True, early_stopping=False, patience=5, min_delta=0.01, generations=20, population_size=20, cxpb=0.5, mutpb=0.8, n_elites=3, tournsize=3, indpb=0.2, initial_params=None, include_default=True, verbose=0)[source]#
Bases:
sklearn.base.MetaEstimatorMixin,sklearn.base.BaseEstimatorGenetic algorithm-based optimization for hyperparameter tuning.
The GeneticOptimizer provides an interface for optimizing an estimator’s hyperparameters using a genetic algorithm. It supports cross-validation and parallel computation.
- Parameters:
estimator_class (class) – The class of the estimator to be optimized.
hyperparam_space (dict or HyperparameterSpace) – The hyperparameter search space as a dictionary or a HyperparameterSpace object.
eval_function (callable, optional) –
[DEPRECATED] Will be removed in v1.0. Use
cvparameter instead for cross-validation configuration.Deprecated since version 0.5: The eval_function parameter is deprecated.
seed (int, optional (default=None)) – Random seed for reproducibility. If None, a random seed is generated.
scoring (str or callable, optional (default=None)) – Scoring method to evaluate the estimator’s performance. If None, the estimator’s default score method is used.
use_parallel (bool, optional (default=True)) – Whether to run the optimization in parallel. If True, parallel processing is enabled.
cv (int, sklearn.model_selection.BaseCrossValidator, or None) – Cross-validation strategy: - int: number of splits (StratifiedKFold if classifier, else KFold) - CV splitter object: e.g., StratifiedKFold, KFold, TimeSeriesSplit - None: default behavior inside the optimizer service (train_score function). Cannot be set simultaneously with eval_function.
use_mlflow (bool, optional (default=False)) – If True, the optimization process will be tracked using MLFlow. Default is False.
disable_file_output (bool, optional (default=True)) – If True, disables all file and directory creation during optimization. This includes: - Log files, checkpoint files, progress files - Result CSVs (logbook, populations) - Visualization plots (HTML, PNG) - Output directories Note: MLflow tracking (if use_mlflow=True) will still function.
early_stopping (bool, optional (default=False)) – If True, the optimization will stop early if no improvement is observed in the fitness score.
patience (int, optional (default=5)) – Number of generations to wait before stopping if no improvement is observed.
min_delta (float, optional (default=0.01)) – Minimum change in the fitness score to qualify as an improvement.
generations (int, optional (default=20)) – Number of generations to run in the genetic algorithm.
population_size (int, optional (default=20)) – Size of the population in each generation.
cxpb (float, optional (default=0.5)) – Crossover probability, the probability of mating two individuals to produce offspring.
mutpb (float, optional (default=0.8)) – Mutation probability, the probability that an individual undergoes mutation. Higher values (0.8-1.0) ensure most offspring are mutated for better exploration.
n_elites (int, optional (default=3)) – Number of elite individuals to carry over to the next generation without mutation. Should be less than population_size (typically 10-20% of population).
tournsize (int, optional (default=3)) – Tournament size for selection, the number of individuals to compete in each tournament. Should be less than population_size (typically 2-5).
indpb (float, optional (default=0.2)) – Independent probability for each gene to be mutated within a mutated individual. With mutpb=0.8, indpb=0.2, and 5 hyperparams: ~0.8 genes mutate per offspring on average.
initial_params (list of dict, optional (default=None)) – List of hyperparameter dictionaries to seed the initial population with. Example: [{‘max_depth’: 10, ‘n_estimators’: 100}, {‘max_depth’: 20, ‘n_estimators’: 200}]
include_default (bool, optional (default=True)) – If True, include an individual representing sklearn defaults in the initial population. This helps the GA start from a known good configuration.
verbose (int, optional (default=0)) – Controls the verbosity of logging output: - 0: Silent (no logging output) - 1: Info level (optimization start/end, generation summaries) - 2: Debug level (detailed evaluation info, internal state)
- best_estimator_#
The estimator with the best found hyperparameters after fitting.
- Type:
estimator
- best_params_#
The hyperparameters that produced the best performance during the optimization.
- Type:
- cv_results_#
A log of the optimization progress, containing details such as fitness scores and hyperparameters evaluated during each generation.
- Type:
list of dicts
- n_trials_#
Total number of hyperparameter configurations evaluated during optimization. This is useful for comparing computational cost with GridSearch.
- Type:
- optimization_time_#
Total time (in seconds) spent on the optimization process. This excludes the final refit on the full training set.
- Type:
Initialize the GeneticOptimizer with the necessary components.
- fit(X, y)[source]#
Run the genetic algorithm optimization to fit the best model.
- Parameters:
X (np.array) – Feature set for the optimization process.
y (np.array) – Label set for the optimization process.
- Returns:
self – Fitted GeneticOptimizer object.
- Return type:
- get_evolvable_hyperparams()[source]#
Get the evolvable hyperparameters from the hyperparameter space.
- Returns:
evolvable_hyperparams – Dictionary of evolvable hyperparameters.
- Return type:
- get_feature_names_out(input_features=None)[source]#
Get output feature names for transformation.
- Parameters:
input_features (array-like of str or None, default=None) – Input features.
- Returns:
feature_names_out – Transformed feature names.
- Return type:
ndarray of str objects
- get_genetic_params()[source]#
Get the genetic algorithm parameters.
- Returns:
genetic_params – Genetic algorithm parameters.
- Return type:
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)[source]#
Get parameters for this optimizer.
- Returns:
params – Parameter names mapped to their values.
- Return type:
- load_default_hyperparameter_space(estimator_class)[source]#
Load a default hyperparameter space for the given estimator using the HyperparameterSpaceService.
- Parameters:
estimator_class (class) – The estimator class for which to load the default hyperparameter space.
- Returns:
The loaded hyperparameter space object.
- Return type:
- load_hyperparameter_space(file_path)[source]#
Load a hyperparameter space from a file using the HyperparameterSpaceService.
- Parameters:
file_path (str) – The path to the file containing the hyperparameter space.
- Returns:
The loaded hyperparameter space object.
- Return type:
- predict(X)[source]#
Make predictions using the best estimator found by the optimization process.
- Parameters:
X (np.array) – Input features to predict labels.
- Returns:
y_pred – Predicted labels.
- Return type:
np.array
- save_hyperparameter_space(file_path, overwrite=False)[source]#
Save the current hyperparameter space to a file using the HyperparameterSpaceService.
- score(X, y)[source]#
Return the score of the best estimator on the given test data and labels.
- Parameters:
X (np.array) – Test feature set.
y (np.array) – True labels for scoring.
- Returns:
score – Score of the best estimator on the test data.
- Return type:
- set_eval_function(eval_function: callable)[source]#
Set or update the evaluator function for the optimization process.
- Parameters:
eval_function (callable) – A new evaluation function for the optimization process.
- set_hyperparameter_space(hyperparam_space)[source]#
Set or update the hyperparameter space for the optimization process.
- Parameters:
hyperparam_space (HyperparameterSpace) – The hyperparameter space object to be used for optimization.
Gallery examples#
See these examples for practical usage of this class:
Example |
Description |
|---|---|
Demonstrates how to use GeneticSearch for hyperparameter tuning. |
|
XGBoost - Genetic vs Grid Search vs Random Search vs Bayesian Optimization |
Compares GeneticSearch with GridSearchCV, RandomizedSearchCV and Bayesian Search. |