4.1. Metrics and score functions#

Note

This section provides an overview of metrics and score functions and how they are used in the context of genetic optimization. It also provides examples of how to use the score functions provided by the library.

4.1.1. Introduction#

Metrics and score functions are used to evaluate the performance of machine learning algorithms. They take as input the true labelsvalues of the data, the predicted labelsvalues of the data, a metric (e.g. accuracy, precision, rmse, mse, etc.) and a scoring function or strategy (train/test split, k-fold cross validation, stratified cross validation, time series split, etc.). The score function then calculates a score that quantifies how well the machine learning algorithm performed on the given data. Two main types of metrics are commonly used in machine learning:

Classification metrics: These are used to evaluate the performance of classification algorithms. They include metrics such as accuracy, precision, recall, F1 score, etc.
Regression metrics: These are used to evaluate the performance of regression algorithms. They include metrics such as mean squared error, mean absolute error, R-squared, etc.

In the context of genetic optimization, the fitness score is the metric aimed to be maximized. When passing a classification or regression estimator mloptimizer will use by default the following score functions:

Classification: balanced_accuracy_score
Regression: rmse

In the case of the regression metrics, we maximize the negative of the metric, so the optimization is done in the same way as in the classification case.

However, the user can pass any score function.

4.1.2. Metrics#

The metrics input argument in the Optimizer class is a dictionary that maps a metric name to a metric function. This function can be one of the metrics provided by the sklearn.metrics module, or a custom metric function that shoutd comply with the sklearn library metric functions.

Here’s an example of how to use the Optimizer class with custom metrics:

from sklearn.metrics import balanced_accuracy_score, mean_squared_error
from mloptimizer.application import Optimizer
from mloptimizer.domain.hyperspace import HyperparameterSpace
from sklearn.ensemble import RandomForestRegressor

regression_metrics = {
        "mse": mean_squared_error,
        "rmse": root_mean_squared_error
    }
evolvable_hyperparams = HyperparameterSpace.get_default_hyperparameter_space(RandomForestRegressor)

mlopt = Optimizer(estimator_class=RandomForestRegressor,
                  hyperparam_space=evolvable_hyperparams,
                  fitness_score='rmse', metrics=regression_metrics,
                  features=X, labels=y)

4.1.3. Score Functions#

Not only the metric should be defined, how the data is used or split to calculate the metric is also important. The model_evaluation.py module provides score functions that can be used to evaluate the performance of machine learning algorithms. These score functions take a estimator, features, labels, and a score metric as input, and return a score that quantifies how well the classifier performed on the given data.

The model_evaluation.py module provides the following score functions:

train_score: This function trains a classifier with the provided features and labels, and then calculates the score over the train data.
train_test_score: This function splits the provided features and labels into a training set and a test set. Then, it trains an estimator on the training set and calculates a score on the test set using the provided score function.
kfold_score: This function evaluates an estimator using K-Fold cross-validation. It splits the provided features and labels into K folds, trains an estimator on K-1 folds, and calculates a score on the remaining fold. This process is repeated K times, and the function returns the average score across all folds.
kfold_stratified_score: This function is similar to kfold_score, but it uses stratified K-Fold cross-validation. This means that it preserves the percentage of samples for each class in each fold. For classification problems, this can help ensure that each fold has a representative sample of each class.
temporal_kfold_score: This function is similar to kfold_score, but it uses temporal K-Fold cross-validation. This means that it respects the order of the data, making it suitable for time series data in order to avoid look-ahead bias.

Each of these score functions takes a classifier, features, and labels as input. They also take a score metric as input, which is used to calculate the score. The score function could be any function that takes the true labelsvalues and the predicted labelsvalues as input and returns a score. Examples of score functions include accuracy, precision, recall, F1 score, etc.

4.1.4. Examples#

Here’s an example of how to use the train_score function:

from mloptimizer.domain.evaluation import model_evaluation
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Define features, labels, and classifier
from sklearn.datasets import load_iris
features, labels = load_iris(return_X_y=True)
clf = RandomForestClassifier()

# Use the train_score function
score = model_evaluation.train_score(features, labels, clf, metrics={"accuracy": accuracy_score})

In this example, we first define the features, labels, and classifier. We then use the train_score function to train the classifier and calculate the score. The accuracy_score function from sklearn.metrics is used as the score function.