Evolution (logbook) graph#

mloptimizer provides a function to plot the evolution of the fitness function.

from sklearn.tree import DecisionTreeClassifier
from mloptimizer.application.reporting.plots import plotly_logbook
from mloptimizer.domain.evaluation import kfold_stratified_score
import plotly
import os
from sklearn.datasets import load_iris
from mloptimizer.interfaces import HyperparameterSpaceBuilder, GeneticSearch
from sklearn.model_selection import StratifiedKFold

Load the iris dataset to obtain a vector of features X and a vector of labels y. Another dataset or a custom one can be used

X, y = load_iris(return_X_y=True)

Define the HyperparameterSpace, you can use the default hyperparameters for the machine learning model that you want to optimize. In this case we use the default hyperparameters for a DecisionTreeClassifier. Another dataset or a custom one can be used

hyperparam_space = HyperparameterSpaceBuilder.get_default_space(estimator_class=DecisionTreeClassifier)

The GeneticSearch class is the main wrapper for the optimization of a machine learning model.

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
opt = GeneticSearch(
        estimator_class=DecisionTreeClassifier, hyperparam_space=hyperparam_space,
        **{"generations": 30, "population_size": 100},
        # eval_function=kfold_stratified_score, # Deprecated
        cv=cv
    )

To optimizer the classifier we need to call the fit method.

opt.fit(X, y)
Genetic execution:   0%|          | 0/31 [00:00<?, ?it/s, best fitness=?]
Genetic execution:   3%|▎         | 1/31 [00:00<00:00, 59.47it/s, best fitness=0.96]
Genetic execution:   6%|▋         | 2/31 [00:01<00:15,  1.83it/s, best fitness=0.96]
Genetic execution:  10%|▉         | 3/31 [00:01<00:18,  1.55it/s, best fitness=0.96]
Genetic execution:  13%|█▎        | 4/31 [00:02<00:18,  1.44it/s, best fitness=0.96]
Genetic execution:  16%|█▌        | 5/31 [00:03<00:18,  1.38it/s, best fitness=0.96]
Genetic execution:  19%|█▉        | 6/31 [00:04<00:18,  1.34it/s, best fitness=0.96]
Genetic execution:  23%|██▎       | 7/31 [00:05<00:18,  1.31it/s, best fitness=0.96]
Genetic execution:  26%|██▌       | 8/31 [00:05<00:17,  1.30it/s, best fitness=0.96]
Genetic execution:  29%|██▉       | 9/31 [00:06<00:17,  1.29it/s, best fitness=0.96]
Genetic execution:  32%|███▏      | 10/31 [00:07<00:16,  1.28it/s, best fitness=0.96]
Genetic execution:  35%|███▌      | 11/31 [00:08<00:15,  1.27it/s, best fitness=0.96]
Genetic execution:  39%|███▊      | 12/31 [00:08<00:14,  1.27it/s, best fitness=0.96]
Genetic execution:  42%|████▏     | 13/31 [00:09<00:14,  1.27it/s, best fitness=0.96]
Genetic execution:  45%|████▌     | 14/31 [00:10<00:13,  1.26it/s, best fitness=0.96]
Genetic execution:  48%|████▊     | 15/31 [00:11<00:12,  1.26it/s, best fitness=0.96]
Genetic execution:  52%|█████▏    | 16/31 [00:12<00:11,  1.26it/s, best fitness=0.96]
Genetic execution:  55%|█████▍    | 17/31 [00:12<00:11,  1.26it/s, best fitness=0.96]
Genetic execution:  58%|█████▊    | 18/31 [00:13<00:10,  1.25it/s, best fitness=0.96]
Genetic execution:  61%|██████▏   | 19/31 [00:14<00:09,  1.25it/s, best fitness=0.96]
Genetic execution:  65%|██████▍   | 20/31 [00:15<00:08,  1.24it/s, best fitness=0.96]
Genetic execution:  68%|██████▊   | 21/31 [00:16<00:08,  1.24it/s, best fitness=0.96]
Genetic execution:  71%|███████   | 22/31 [00:17<00:07,  1.24it/s, best fitness=0.96]
Genetic execution:  74%|███████▍  | 23/31 [00:17<00:06,  1.24it/s, best fitness=0.96]
Genetic execution:  77%|███████▋  | 24/31 [00:18<00:05,  1.24it/s, best fitness=0.96]
Genetic execution:  81%|████████  | 25/31 [00:19<00:04,  1.23it/s, best fitness=0.96]
Genetic execution:  84%|████████▍ | 26/31 [00:20<00:04,  1.23it/s, best fitness=0.96]
Genetic execution:  87%|████████▋ | 27/31 [00:21<00:03,  1.23it/s, best fitness=0.96]
Genetic execution:  90%|█████████ | 28/31 [00:21<00:02,  1.23it/s, best fitness=0.96]
Genetic execution:  94%|█████████▎| 29/31 [00:22<00:01,  1.23it/s, best fitness=0.96]
Genetic execution:  97%|█████████▋| 30/31 [00:23<00:00,  1.22it/s, best fitness=0.96]
Genetic execution: 100%|██████████| 31/31 [00:24<00:00,  1.22it/s, best fitness=0.96]
Genetic execution: 100%|██████████| 31/31 [00:25<00:00,  1.23it/s, best fitness=0.96]
GeneticSearch(cv=StratifiedKFold(n_splits=5, random_state=42, shuffle=True),
              estimator_class=<class 'sklearn.tree._classes.DecisionTreeClassifier'>,
              generations=30,
              hyperparam_space=HyperparameterSpace(fixed_hyperparams={'criterion': 'gini'}, evolvable_hyperparams={'min_samples_split': Hyperparam('min_samples_split', 2, 50, 'int'), 'min_samples_leaf': Hyperparam('min_samples_leaf', 1, 20, 'int'), 'max_depth': Hyperparam('max_depth', 2, 20, 'int'), 'min_impurity_decrease': Hyperparam('min_impurity_decrease', 0, 150, 'float', 1000), 'ccp_alpha': Hyperparam('ccp_alpha', 0, 300, 'float', 100000)}),
              population_size=100, seed=905925)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


We can plot the evolution of the fitness function. The black lines represent the max and min fitness values across all generations. The green, red and blue line are respectively the max, min and avg fitness value for each generation. Each grey point in the graph represents an individual.

population_df = opt.populations_
g_logbook = plotly_logbook(opt.logbook_, population_df)
plotly.io.show(g_logbook)

At the end of the evolution the graph is saved as an html at the path:

print(opt._optimizer_service.optimizer.tracker.graphics_path)
print(os.listdir(opt._optimizer_service.optimizer.tracker.graphics_path))
./20250808_070338_DecisionTreeClassifier/graphics
['search_space.html', 'logbook.html']

The data to generate the graph is available at the path:

print(opt._optimizer_service.optimizer.tracker.results_path)
print(os.listdir(opt._optimizer_service.optimizer.tracker.results_path))

del opt
./20250808_070338_DecisionTreeClassifier/results
['populations.csv', 'logbook.csv']

Total running time of the script: (0 minutes 29.069 seconds)

Gallery generated by Sphinx-Gallery