GeneticSearch#

This class provides a genetic algorithm-based optimization framework.

fit(X, y)

Run the genetic algorithm optimization to fit the best model.

predict(X)

Make predictions using the best estimator found by the optimization process.

score(X, y)

Return the score of the best estimator on the given test data and labels.

class mloptimizer.interfaces.GeneticSearch(estimator_class, hyperparam_space, eval_function: callable = None, seed=None, scoring=None, use_parallel=True, cv=None, use_mlflow=False, disable_file_output=True, early_stopping=False, patience=5, min_delta=0.01, generations=20, population_size=20, cxpb=0.5, mutpb=0.8, n_elites=3, tournsize=3, indpb=0.2, initial_params=None, include_default=True, verbose=0)[source]#

Bases: sklearn.base.MetaEstimatorMixin, sklearn.base.BaseEstimator

Genetic algorithm-based optimization for hyperparameter tuning.

The GeneticOptimizer provides an interface for optimizing an estimator’s hyperparameters using a genetic algorithm. It supports cross-validation and parallel computation.

Parameters:
  • estimator_class (class) – The class of the estimator to be optimized.

  • hyperparam_space (dict or HyperparameterSpace) – The hyperparameter search space as a dictionary or a HyperparameterSpace object.

  • eval_function (callable, optional) –

    [DEPRECATED] Will be removed in v1.0. Use cv parameter instead for cross-validation configuration.

    Deprecated since version 0.5: The eval_function parameter is deprecated.

  • seed (int, optional (default=None)) – Random seed for reproducibility. If None, a random seed is generated.

  • scoring (str or callable, optional (default=None)) – Scoring method to evaluate the estimator’s performance. If None, the estimator’s default score method is used.

  • use_parallel (bool, optional (default=True)) – Whether to run the optimization in parallel. If True, parallel processing is enabled.

  • cv (int, sklearn.model_selection.BaseCrossValidator, or None) – Cross-validation strategy: - int: number of splits (StratifiedKFold if classifier, else KFold) - CV splitter object: e.g., StratifiedKFold, KFold, TimeSeriesSplit - None: default behavior inside the optimizer service (train_score function). Cannot be set simultaneously with eval_function.

  • use_mlflow (bool, optional (default=False)) – If True, the optimization process will be tracked using MLFlow. Default is False.

  • disable_file_output (bool, optional (default=True)) – If True, disables all file and directory creation during optimization. This includes: - Log files, checkpoint files, progress files - Result CSVs (logbook, populations) - Visualization plots (HTML, PNG) - Output directories Note: MLflow tracking (if use_mlflow=True) will still function.

  • early_stopping (bool, optional (default=False)) – If True, the optimization will stop early if no improvement is observed in the fitness score.

  • patience (int, optional (default=5)) – Number of generations to wait before stopping if no improvement is observed.

  • min_delta (float, optional (default=0.01)) – Minimum change in the fitness score to qualify as an improvement.

  • generations (int, optional (default=20)) – Number of generations to run in the genetic algorithm.

  • population_size (int, optional (default=20)) – Size of the population in each generation.

  • cxpb (float, optional (default=0.5)) – Crossover probability, the probability of mating two individuals to produce offspring.

  • mutpb (float, optional (default=0.8)) – Mutation probability, the probability that an individual undergoes mutation. Higher values (0.8-1.0) ensure most offspring are mutated for better exploration.

  • n_elites (int, optional (default=3)) – Number of elite individuals to carry over to the next generation without mutation. Should be less than population_size (typically 10-20% of population).

  • tournsize (int, optional (default=3)) – Tournament size for selection, the number of individuals to compete in each tournament. Should be less than population_size (typically 2-5).

  • indpb (float, optional (default=0.2)) – Independent probability for each gene to be mutated within a mutated individual. With mutpb=0.8, indpb=0.2, and 5 hyperparams: ~0.8 genes mutate per offspring on average.

  • initial_params (list of dict, optional (default=None)) – List of hyperparameter dictionaries to seed the initial population with. Example: [{‘max_depth’: 10, ‘n_estimators’: 100}, {‘max_depth’: 20, ‘n_estimators’: 200}]

  • include_default (bool, optional (default=True)) – If True, include an individual representing sklearn defaults in the initial population. This helps the GA start from a known good configuration.

  • verbose (int, optional (default=0)) – Controls the verbosity of logging output: - 0: Silent (no logging output) - 1: Info level (optimization start/end, generation summaries) - 2: Debug level (detailed evaluation info, internal state)

best_estimator_#

The estimator with the best found hyperparameters after fitting.

Type:

estimator

best_params_#

The hyperparameters that produced the best performance during the optimization.

Type:

dict

cv_results_#

A log of the optimization progress, containing details such as fitness scores and hyperparameters evaluated during each generation.

Type:

list of dicts

n_trials_#

Total number of hyperparameter configurations evaluated during optimization. This is useful for comparing computational cost with GridSearch.

Type:

int

optimization_time_#

Total time (in seconds) spent on the optimization process. This excludes the final refit on the full training set.

Type:

float

Initialize the GeneticOptimizer with the necessary components.

fit(X, y)[source]#

Run the genetic algorithm optimization to fit the best model.

Parameters:
  • X (np.array) – Feature set for the optimization process.

  • y (np.array) – Label set for the optimization process.

Returns:

self – Fitted GeneticOptimizer object.

Return type:

object

get_evolvable_hyperparams()[source]#

Get the evolvable hyperparameters from the hyperparameter space.

Returns:

evolvable_hyperparams – Dictionary of evolvable hyperparameters.

Return type:

dict

get_feature_names_out(input_features=None)[source]#

Get output feature names for transformation.

Parameters:

input_features (array-like of str or None, default=None) – Input features.

Returns:

feature_names_out – Transformed feature names.

Return type:

ndarray of str objects

get_genetic_params()[source]#

Get the genetic algorithm parameters.

Returns:

genetic_params – Genetic algorithm parameters.

Return type:

dict

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)[source]#

Get parameters for this optimizer.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

load_default_hyperparameter_space(estimator_class)[source]#

Load a default hyperparameter space for the given estimator using the HyperparameterSpaceService.

Parameters:

estimator_class (class) – The estimator class for which to load the default hyperparameter space.

Returns:

The loaded hyperparameter space object.

Return type:

HyperparameterSpace

load_hyperparameter_space(file_path)[source]#

Load a hyperparameter space from a file using the HyperparameterSpaceService.

Parameters:

file_path (str) – The path to the file containing the hyperparameter space.

Returns:

The loaded hyperparameter space object.

Return type:

HyperparameterSpace

predict(X)[source]#

Make predictions using the best estimator found by the optimization process.

Parameters:

X (np.array) – Input features to predict labels.

Returns:

y_pred – Predicted labels.

Return type:

np.array

save_hyperparameter_space(file_path, overwrite=False)[source]#

Save the current hyperparameter space to a file using the HyperparameterSpaceService.

Parameters:
  • file_path (str) – The path to the file where the hyperparameter space will be saved.

  • overwrite (bool, optional (default=False)) – Whether to overwrite the existing file if it exists.

score(X, y)[source]#

Return the score of the best estimator on the given test data and labels.

Parameters:
  • X (np.array) – Test feature set.

  • y (np.array) – True labels for scoring.

Returns:

score – Score of the best estimator on the test data.

Return type:

float

set_eval_function(eval_function: callable)[source]#

Set or update the evaluator function for the optimization process.

Parameters:

eval_function (callable) – A new evaluation function for the optimization process.

set_hyperparameter_space(hyperparam_space)[source]#

Set or update the hyperparameter space for the optimization process.

Parameters:

hyperparam_space (HyperparameterSpace) – The hyperparameter space object to be used for optimization.

set_params(**params)[source]#

Set the parameters of this optimizer.

Parameters:

**params (dict) – Estimator parameters to update.

Returns:

self – Updated GeneticOptimizer object.

Return type:

object