Reproducibility

Reproducibility#

Reproducibility is a key aspect of scientific research, and more precisely, in machine learning. MLOptimizer provides an input parameter seed that allows to set the random seed for:

  • The random number generator of the optimizer generating the initial population and the mutations

  • The random number generator of the model on training

  • The random number generator of the data on split

An example of usage is:

from sklearn.datasets import load_breast_cancer as dataset
from sklearn.tree import DecisionTreeClassifier
from mloptimizer.core import Optimizer
from mloptimizer.hyperparams import HyperparameterSpace

X, y = load_iris(return_X_y=True)
default_hyperparam_space = HyperparameterSpace.get_default_hyperparameter_space(DecisionTreeClassifier)
population = 2
generations = 2
seed = 25
distinct_seed = 2
# It is important to run the optimization
# right after the creation of the optimizer
optimizer1 = Optimizer(estimator_class=DecisionTreeClassifier, features=X, labels=y,
                       hyperparam_space=default_hyperparam_space, seed=seed)
result1 = optimizer1.optimize_clf(population_size=population,
                                  generations=generations)
# WARNING: In case the optimizer2 would be created after the optimizer1,
# the results would be different
optimizer2 = Optimizer(estimator_class=DecisionTreeClassifier, features=X, labels=y,
                       hyperparam_space=default_hyperparam_space, seed=seed)
result2 = optimizer2.optimize_clf(population_size=population,
                                  generations=generations)

optimizer3 = Optimizer(estimator_class=DecisionTreeClassifier, features=X, labels=y,
                       hyperparam_space=default_hyperparam_space, seed=distinct_seed)
result3 = optimizer3.optimize_clf(population_size=population,
                                  generations=generations)
str(result1) == str(result2)
str(result1) != str(result3)

Warning

To ensure reproducibility, it is important to run the optimization right after the creation of the optimizer with the seed to ensure no other random number generator has been used in the meantime.