pyanno4rt.learning_model.frequentist

Frequentist learning models module.


The module aims to provide methods and classes for modeling NTCP and TCP with frequentist learning models, e.g. logistic regression, neural networks and support vector machines, including individual preprocessing and evaluation pipelines and Bayesian hyperparameter optimization with k-fold cross-validation.

Subpackages

Overview

Classes

DecisionTreeModel

Decision tree model class.

KNeighborsModel

K-nearest neighbors model class.

LogisticRegressionModel

Logistic regression model class.

NaiveBayesModel

Naive Bayes model class.

NeuralNetworkModel

Neural network model class.

RandomForestModel

Random forest model class.

SupportVectorMachineModel

Support vector machine model class.

Classes

class pyanno4rt.learning_model.frequentist.DecisionTreeModel(model_label, model_folder_path, dataset, preprocessing_steps, tune_space, tune_evaluations, tune_score, inspect_model, evaluate_model, display_options)[source]

Decision tree model class.

This class enables building an individual preprocessing pipeline, fit the decision tree model from the input data, inspect the model, make predictions with the model, and assess the predictive performance using multiple evaluation metrics.

The training process includes sequential model-based hyperparameter optimization with tree-structured Parzen estimators and stratified k-fold cross-validation for the objective function evaluation. Cross-validation is also applied to (optionally) inspect the validation feature importances and to generate out-of-folds predictions as a full reconstruction of the input labels for generalization assessment.

Parameters:
  • model_label (string) – Label for the decision tree model to be used for file naming.

  • dataset (dict) – Dictionary with the raw data set, the label viewpoint, the label bounds, the feature values and names, and the label values and names after modulation. In a compact way, this represents the input data for the decision tree model.

  • preprocessing_steps (tuple) –

    Sequence of labels associated with preprocessing algorithms which make up the preprocessing pipeline for the decision tree model. Current available algorithm labels are:

    • transformers : ‘Equalizer’, ‘StandardScaler’, ‘Whitening’.

  • tune_space (dict) –

    Search space for the Bayesian hyperparameter optimization, including

    • ’criterion’ : measure for the quality of a split;

    • ’splitter’ : splitting strategy at each node;

    • ’max_depth’ : maximum depth of the tree;

    • ’min_samples_split’ : minimum number of samples required for splitting each node;

    • ’min_samples_leaf’ : minimum number of samples required at each node;

    • ’min_weight_fraction_leaf’ : minimum weighted fraction of the weights sum required at each node;

    • ’max_features’ : maximum number of features taken into account when looking for the best split at each node;

    • ’class_weight’ : weights associated with the classes;

    • ’ccp_alpha’ : complexity parameter for minimal cost-complexity pruning.

  • tune_evaluations (int) – Number of evaluation steps (trials) for the Bayesian hyperparameter optimization.

  • tune_score (string) –

    Scoring function for the evaluation of the hyperparameter set candidates. Current available scorers are:

    • ’log_loss’ : negative log-likelihood score;

    • ’roc_auc_score’ : area under the ROC curve score.

  • tune_splits (int) – Number of splits for the stratified cross-validation within each hyperparameter optimization step.

  • inspect_model (bool) – Indicator for the inspection of the model, e.g. the feature importances.

  • evaluate_model (bool) – Indicator for the evaluation of the model, e.g. the model KPIs.

  • oof_splits (int) – Number of splits for the stratified cross-validation within the out-of-folds evaluation step of the decision tree model.

preprocessor

Instance of the class DataPreprocessor, which holds methods to build the preprocessing pipeline, fit with the input features, transform the features, and derive the gradient of the preprocessing algorithms w.r.t the features.

Type:

object of class DataPreprocessor

features

Values of the input features.

Type:

ndarray

labels

Values of the input labels.

Type:

ndarray

configuration

Dictionary with information for the modeling, i.e., the dataset, the preprocessing, and the hyperparameter search space.

Type:

dict

model_path

Path for storing and retrieving the decision tree model.

Type:

string

configuration_path

Path for storing and retrieving the configuration dictionary.

Type:

string

hyperparameter_path

Path for storing and retrieving the hyperparameter dictionary.

Type:

string

updated_model

Indicator for the update status of the model, triggers recalculating the model inspection and model evaluation classes.

Type:

bool

prediction_model

Instance of the class DecisionTreeClassifier, which holds methods to make predictions from the decision tree model.

Type:

object of class DecisionTreeClassifier

inspector

Instance of the class ModelInspector, which holds methods to compute model inspection values, e.g. feature importances.

Type:

object of class ModelInspector

training_prediction

Array with the label predictions on the input data.

Type:

ndarray

oof_prediction

Array with the out-of-folds predictions on the input data.

Type:

ndarray

evaluator

Instance of the class ModelEvaluator, which holds methods to compute the evaluation metrics for a given array with label predictions.

Type:

object of class ModelEvaluator

Notes

Currently, the preprocessing pipeline for the model is restricted to transformations of the input feature values, e.g. scaling, dimensionality reduction or feature engineering. Transformations which affect the input labels in the same way, e.g. resampling or outlier removal, are not yet possible.

Overview

Methods

preprocess(features)

Preprocess the input feature vector with the built pipeline.

get_model(features, labels)

Get the decision tree outcome prediction model by reading from the model file path, the datahub, or by training.

tune_hyperparameters(features, labels)

Tune the hyperparameters of the decision tree model via sequential model-based optimization using the tree-structured Parzen estimator. As a variation, the objective function is evaluated based on a stratified k-fold cross-validation.

train(features, labels)

Train the decision tree outcome prediction model.

predict(features)

Predict the label values from the feature values.

predict_oof(features, labels)

Predict the out-of-folds (OOF) labels using a stratified k-fold cross-validation.

inspect(features, labels, oof_folds)

.

evaluate(features, labels)

.

set_file_paths(base_path)

Set the paths for model, configuration and hyperparameter files.

read_model_from_file()

Read the decision tree outcome prediction model from the model file path.

write_model_to_file(prediction_model)

Write the decision tree outcome prediction model to the model file path.

read_configuration_from_file()

Read the configuration dictionary from the configuration file path.

write_configuration_to_file(configuration)

Write the configuration dictionary to the configuration file path.

read_hyperparameters_from_file()

Read the decision tree outcome prediction model hyperparameters from the hyperparameter file path.

write_hyperparameters_to_file(hyperparameters)

Write the hyperparameter dictionary to the hyperparameter file path.

Members

preprocess(features)[source]

Preprocess the input feature vector with the built pipeline.

Parameters:

features (ndarray) – Array of input feature values.

Returns:

Array of transformed feature values.

Return type:

ndarray

get_model(features, labels)[source]

Get the decision tree outcome prediction model by reading from the model file path, the datahub, or by training.

Returns:

Instance of the class DecisionTreeClassifier, which holds methods to make predictions from the decision tree model.

Return type:

object of class DecisionTreeClassifier

tune_hyperparameters(features, labels)[source]

Tune the hyperparameters of the decision tree model via sequential model-based optimization using the tree-structured Parzen estimator. As a variation, the objective function is evaluated based on a stratified k-fold cross-validation.

Returns:

tuned_hyperparameters – Dictionary with the hyperparameter names and values tuned via Bayesian hyperparameter optimization.

Return type:

dict

train(features, labels)[source]

Train the decision tree outcome prediction model.

Returns:

prediction_model – Instance of the class DecisionTreeClassifier, which holds methods to make predictions from the decision tree model.

Return type:

object of class DecisionTreeClassifier

predict(features)[source]

Predict the label values from the feature values.

Parameters:

features (ndarray) – Array of input feature values.

Returns:

Floating-point label prediction or array of label predictions.

Return type:

float or ndarray

predict_oof(features, labels)[source]

Predict the out-of-folds (OOF) labels using a stratified k-fold cross-validation.

Parameters:

oof_splits (int) – Number of splits for the stratified cross-validation.

Returns:

Array with the out-of-folds label predictions.

Return type:

ndarray

inspect(features, labels, oof_folds)[source]

.

evaluate(features, labels)[source]

.

set_file_paths(base_path)[source]

Set the paths for model, configuration and hyperparameter files.

Parameters:

base_path (string) – Base path from which to access the model files.

read_model_from_file()[source]

Read the decision tree outcome prediction model from the model file path.

Returns:

Instance of the class DecisionTreeClassifier, which holds methods to make predictions from the decision tree model.

Return type:

object of class DecisionTreeClassifier

write_model_to_file(prediction_model)[source]

Write the decision tree outcome prediction model to the model file path.

Parameters:

prediction_model (object of class DecisionTreeClassifier) – Instance of the class DecisionTreeClassifier, which holds methods to make predictions from the decision tree model.

read_configuration_from_file()[source]

Read the configuration dictionary from the configuration file path.

Returns:

Dictionary with information for the modeling, i.e., the dataset, the preprocessing steps, and the hyperparameter search space.

Return type:

dict

write_configuration_to_file(configuration)[source]

Write the configuration dictionary to the configuration file path.

Parameters:

configuration (dict) – Dictionary with information for the modeling, i.e., the dataset, the preprocessing steps, and the hyperparameter search space.

read_hyperparameters_from_file()[source]

Read the decision tree outcome prediction model hyperparameters from the hyperparameter file path.

Returns:

Dictionary with the hyperparameter names and values for the decision tree outcome prediction model.

Return type:

dict

write_hyperparameters_to_file(hyperparameters)[source]

Write the hyperparameter dictionary to the hyperparameter file path.

Parameters:

hyperparameters (dict) – Dictionary with the hyperparameter names and values for the decision tree outcome prediction model.

class pyanno4rt.learning_model.frequentist.KNeighborsModel(model_label, model_folder_path, dataset, preprocessing_steps, tune_space, tune_evaluations, tune_score, inspect_model, evaluate_model, display_options)[source]

K-nearest neighbors model class.

This class enables building an individual preprocessing pipeline, fit the k-nearest neighbors model from the input data, inspect the model, make predictions with the model, and assess the predictive performance using multiple evaluation metrics.

The training process includes sequential model-based hyperparameter optimization with tree-structured Parzen estimators and stratified k-fold cross-validation for the objective function evaluation. Cross-validation is also applied to (optionally) inspect the validation feature importances and to generate out-of-folds predictions as a full reconstruction of the input labels for generalization assessment.

Parameters:
  • model_label (string) – Label for the k-nearest neighbors model to be used for file naming.

  • dataset (dict) – Dictionary with the raw data set, the label viewpoint, the label bounds, the feature values and names, and the label values and names after modulation. In a compact way, this represents the input data for the k-nearest neighbors model.

  • preprocessing_steps (tuple) –

    Sequence of labels associated with preprocessing algorithms which make up the preprocessing pipeline for the k-nearest neighbors model. Current available algorithm labels are:

    • transformers : ‘Equalizer’, ‘StandardScaler’, ‘Whitening’.

  • tune_space (dict) –

    Search space for the Bayesian hyperparameter optimization, including

    • ’n_neighbors’ : number of neighbors (equals k);

    • ’weights’ : weights function on the neighbors for prediction;

    • ’leaf_size’ : leaf size for BallTree or KDTree;

    • ’p’ : power parameter for the Minkowski metric.

  • tune_evaluations (int) – Number of evaluation steps (trials) for the Bayesian hyperparameter optimization.

  • tune_score (string) –

    Scoring function for the evaluation of the hyperparameter set candidates. Current available scorers are:

    • ’log_loss’ : negative log-likelihood score;

    • ’roc_auc_score’ : area under the ROC curve score.

  • tune_splits (int) – Number of splits for the stratified cross-validation within each hyperparameter optimization step.

  • inspect_model (bool) – Indicator for the inspection of the model, e.g. the feature importances.

  • evaluate_model (bool) – Indicator for the evaluation of the model, e.g. the model KPIs.

  • oof_splits (int) – Number of splits for the stratified cross-validation within the out-of-folds evaluation step of the k-nearest neighbors model.

preprocessor

Instance of the class DataPreprocessor, which holds methods to build the preprocessing pipeline, fit with the input features, transform the features, and derive the gradient of the preprocessing algorithms w.r.t the features.

Type:

object of class DataPreprocessor

features

Values of the input features.

Type:

ndarray

labels

Values of the input labels.

Type:

ndarray

configuration

Dictionary with information for the modeling, i.e., the dataset, the preprocessing, and the hyperparameter search space.

Type:

dict

model_path

Path for storing and retrieving the k-nearest neighbors model.

Type:

string

configuration_path

Path for storing and retrieving the configuration dictionary.

Type:

string

hyperparameter_path

Path for storing and retrieving the hyperparameter dictionary.

Type:

string

updated_model

Indicator for the update status of the model, triggers recalculating the model inspection and model evaluation classes.

Type:

bool

prediction_model

Instance of the class KNeighborsClassifier, which holds methods to make predictions from the k-nearest neighbors model.

Type:

object of class KNeighborsClassifier

inspector

Instance of the class ModelInspector, which holds methods to compute model inspection values, e.g. feature importances.

Type:

object of class ModelInspector

training_prediction

Array with the label predictions on the input data.

Type:

ndarray

oof_prediction

Array with the out-of-folds predictions on the input data.

Type:

ndarray

evaluator

Instance of the class ModelEvaluator, which holds methods to compute the evaluation metrics for a given array with label predictions.

Type:

object of class ModelEvaluator

Notes

Currently, the preprocessing pipeline for the model is restricted to transformations of the input feature values, e.g. scaling, dimensionality reduction or feature engineering. Transformations which affect the input labels in the same way, e.g. resampling or outlier removal, are not yet possible.

Overview

Methods

preprocess(features)

Preprocess the input feature vector with the built pipeline.

get_model(features, labels)

Get the k-nearest neighbors outcome prediction model by reading from the model file path, the datahub, or by training.

tune_hyperparameters(features, labels)

Tune the hyperparameters of the k-nearest neighbors model via sequential model-based optimization using the tree-structured Parzen estimator. As a variation, the objective function is evaluated based on a stratified k-fold cross-validation.

train(features, labels)

Train the k-nearest neighbors outcome prediction model.

predict(features)

Predict the label values from the feature values.

predict_oof(features, labels)

Predict the out-of-folds (OOF) labels using a stratified k-fold cross-validation.

inspect(features, labels, oof_folds)

.

evaluate(features, labels)

.

set_file_paths(base_path)

Set the paths for model, configuration and hyperparameter files.

read_model_from_file()

Read the k-nearest neighbors outcome prediction model from the model file path.

write_model_to_file(prediction_model)

Write the k-nearest neighbors outcome prediction model to the model file path.

read_configuration_from_file()

Read the configuration dictionary from the configuration file path.

write_configuration_to_file(configuration)

Write the configuration dictionary to the configuration file path.

read_hyperparameters_from_file()

Read the k-nearest neighbors outcome prediction model hyperparameters from the hyperparameter file path.

write_hyperparameters_to_file(hyperparameters)

Write the hyperparameter dictionary to the hyperparameter file path.

Members

preprocess(features)[source]

Preprocess the input feature vector with the built pipeline.

Parameters:

features (ndarray) – Array of input feature values.

Returns:

Array of transformed feature values.

Return type:

ndarray

get_model(features, labels)[source]

Get the k-nearest neighbors outcome prediction model by reading from the model file path, the datahub, or by training.

Returns:

Instance of the class KNeighborsClassifier, which holds methods to make predictions from the k-nearest neighbors model.

Return type:

object of class KNeighborsClassifier

tune_hyperparameters(features, labels)[source]

Tune the hyperparameters of the k-nearest neighbors model via sequential model-based optimization using the tree-structured Parzen estimator. As a variation, the objective function is evaluated based on a stratified k-fold cross-validation.

Returns:

tuned_hyperparameters – Dictionary with the hyperparameter names and values tuned via Bayesian hyperparameter optimization.

Return type:

dict

train(features, labels)[source]

Train the k-nearest neighbors outcome prediction model.

Returns:

prediction_model – Instance of the class KNeighborsClassifier, which holds methods to make predictions from the k-nearest neighbors model.

Return type:

object of class KNeighborsClassifier

predict(features)[source]

Predict the label values from the feature values.

Parameters:

features (ndarray) – Array of input feature values.

Returns:

Floating-point label prediction or array of label predictions.

Return type:

float or ndarray

predict_oof(features, labels)[source]

Predict the out-of-folds (OOF) labels using a stratified k-fold cross-validation.

Parameters:

oof_splits (int) – Number of splits for the stratified cross-validation.

Returns:

Array with the out-of-folds label predictions.

Return type:

ndarray

inspect(features, labels, oof_folds)[source]

.

evaluate(features, labels)[source]

.

set_file_paths(base_path)[source]

Set the paths for model, configuration and hyperparameter files.

Parameters:

base_path (string) – Base path from which to access the model files.

read_model_from_file()[source]

Read the k-nearest neighbors outcome prediction model from the model file path.

Returns:

Instance of the class KNeighborsClassifier, which holds methods to make predictions from the k-nearest neighbors model.

Return type:

object of class KNeighborsClassifier

write_model_to_file(prediction_model)[source]

Write the k-nearest neighbors outcome prediction model to the model file path.

Parameters:

prediction_model (object of class KNeighborsClassifier) – Instance of the class KNeighborsClassifier, which holds methods to make predictions from the k-nearest neighbors model.

read_configuration_from_file()[source]

Read the configuration dictionary from the configuration file path.

Returns:

Dictionary with information for the modeling, i.e., the dataset, the preprocessing steps, and the hyperparameter search space.

Return type:

dict

write_configuration_to_file(configuration)[source]

Write the configuration dictionary to the configuration file path.

Parameters:

configuration (dict) – Dictionary with information for the modeling, i.e., the dataset, the preprocessing steps, and the hyperparameter search space.

read_hyperparameters_from_file()[source]

Read the k-nearest neighbors outcome prediction model hyperparameters from the hyperparameter file path.

Returns:

Dictionary with the hyperparameter names and values for the k-nearest neighbors outcome prediction model.

Return type:

dict

write_hyperparameters_to_file(hyperparameters)[source]

Write the hyperparameter dictionary to the hyperparameter file path.

Parameters:

hyperparameters (dict) – Dictionary with the hyperparameter names and values for the k-nearest neighbors outcome prediction model.

class pyanno4rt.learning_model.frequentist.LogisticRegressionModel(model_label, model_folder_path, dataset, preprocessing_steps, tune_space, tune_evaluations, tune_score, inspect_model, evaluate_model, display_options)[source]

Logistic regression model class.

This class enables building an individual preprocessing pipeline, fit the logistic regression model from the input data, inspect the model, make predictions with the model, and assess the predictive performance using multiple evaluation metrics.

The training process includes sequential model-based hyperparameter optimization with tree-structured Parzen estimators and stratified k-fold cross-validation for the objective function evaluation. Cross-validation is also applied to (optionally) inspect the validation feature importances and to generate out-of-folds predictions as a full reconstruction of the input labels for generalization assessment.

Parameters:
  • model_label (str) – Label for the logistic regression model to be used for file naming.

  • dataset (dict) – Dictionary with the raw data set, the label viewpoint, the label bounds, the feature values and names, and the label values and names after modulation. In a compact way, this represents the input data for the logistic regression model.

  • preprocessing_steps (tuple) –

    Sequence of labels associated with preprocessing algorithms which make up the preprocessing pipeline for the logistic regression model. Current available algorithm labels are:

    • transformers : ‘Identity’, ‘StandardScaler’, ‘Whitening’.

  • tune_space (dict) –

    Search space for the Bayesian hyperparameter optimization, including

    • ’C’ : inverse of the regularization strength;

    • ’penalty’ : norm of the penalty function;

    • ’tol’ : tolerance for stopping criteria;

    • ’class_weight’ : weights associated with the classes.

  • tune_evaluations (int) – Number of evaluation steps (trials) for the Bayesian hyperparameter optimization.

  • tune_score (string) –

    Scoring function for the evaluation of the hyperparameter set candidates. Current available scorers are:

    • ’log_loss’ : negative log-likelihood score;

    • ’roc_auc_score’ : area under the ROC curve score.

  • tune_splits (int) – Number of splits for the stratified cross-validation within each hyperparameter optimization step.

  • inspect_model (bool) – Indicator for the inspection of the model, e.g. the feature importances.

  • evaluate_model (bool) – Indicator for the evaluation of the model, e.g. the model KPIs.

  • oof_splits (int) – Number of splits for the stratified cross-validation within the out-of-folds evaluation step of the logistic regression model.

preprocessor

Instance of the class DataPreprocessor, which holds methods to build the preprocessing pipeline, fit with the input features, transform the features, and derive the gradient of the preprocessing algorithms w.r.t the features.

Type:

object of class DataPreprocessor

features

Values of the input features.

Type:

ndarray

labels

Values of the input labels.

Type:

ndarray

configuration

Dictionary with information for the modeling, i.e., the dataset, the preprocessing, and the hyperparameter search space.

Type:

dict

model_path

Path for storing and retrieving the logistic regression model.

Type:

string

configuration_path

Path for storing and retrieving the configuration dictionary.

Type:

string

hyperparameter_path

Path for storing and retrieving the hyperparameter dictionary.

Type:

string

updated_model

Indicator for the update status of the model, triggers recalculating the model inspection and model evaluation classes.

Type:

bool

prediction_model

Instance of the class LogisticRegression, which holds methods to make predictions from the logistic regression model.

Type:

object of class LogisticRegression

inspector

Instance of the class ModelInspector, which holds methods to compute model inspection values, e.g. feature importances.

Type:

object of class ModelInspector

training_prediction

Array with the label predictions on the input data.

Type:

ndarray

oof_prediction

Array with the out-of-folds predictions on the input data.

Type:

ndarray

evaluator

Instance of the class ModelEvaluator, which holds methods to compute the evaluation metrics for a given array with label predictions.

Type:

object of class ModelEvaluator

Overview

Methods

preprocess(features)

Preprocess the input feature vector with the built pipeline.

get_model(features, labels)

Get the logistic regression outcome prediction model by reading from the model file path, the datahub, or by training.

tune_hyperparameters(features, labels)

Tune the hyperparameters of the logistic regression model via sequential model-based optimization using the tree-structured Parzen estimator. As a variation, the objective function is evaluated based on a stratified k-fold cross-validation.

train(features, labels)

Train the logistic regression outcome prediction model.

predict(features)

Predict the label values from the feature values.

predict_oof(features, labels)

Predict the out-of-folds (OOF) labels using a stratified k-fold cross-validation.

inspect(features, labels, oof_folds)

.

evaluate(features, labels)

.

set_file_paths(base_path)

Set the paths for model, configuration and hyperparameter files.

read_model_from_file()

Read the logistic regression outcome prediction model from the model file path.

write_model_to_file(prediction_model)

Write the logistic regression outcome prediction model to the model file path.

read_configuration_from_file()

Read the configuration dictionary from the configuration file path.

write_configuration_to_file(configuration)

Write the configuration dictionary to the configuration file path.

read_hyperparameters_from_file()

Read the logistic regression outcome prediction model hyperparameters from the hyperparameter file path.

write_hyperparameters_to_file(hyperparameters)

Write the hyperparameter dictionary to the hyperparameter file path.

Members

preprocess(features)[source]

Preprocess the input feature vector with the built pipeline.

Parameters:

features (ndarray) – Array of input feature values.

Returns:

Array of transformed feature values.

Return type:

ndarray

get_model(features, labels)[source]

Get the logistic regression outcome prediction model by reading from the model file path, the datahub, or by training.

Returns:

Instance of the class LogisticRegression, which holds methods to make predictions from the logistic regression model.

Return type:

object of class LogisticRegression

tune_hyperparameters(features, labels)[source]

Tune the hyperparameters of the logistic regression model via sequential model-based optimization using the tree-structured Parzen estimator. As a variation, the objective function is evaluated based on a stratified k-fold cross-validation.

Returns:

tuned_hyperparameters – Dictionary with the hyperparameter names and values tuned via Bayesian hyperparameter optimization.

Return type:

dict

train(features, labels)[source]

Train the logistic regression outcome prediction model.

Returns:

prediction_model – Instance of the class LogisticRegression, which holds methods to make predictions from the logistic regression model.

Return type:

object of class LogisticRegression

predict(features)[source]

Predict the label values from the feature values.

Parameters:

features (ndarray) – Array of input feature values.

Returns:

Floating-point label prediction or array of label predictions.

Return type:

float or ndarray

predict_oof(features, labels)[source]

Predict the out-of-folds (OOF) labels using a stratified k-fold cross-validation.

Parameters:

oof_splits (int) – Number of splits for the stratified cross-validation.

Returns:

Array with the out-of-folds label predictions.

Return type:

ndarray

inspect(features, labels, oof_folds)[source]

.

evaluate(features, labels)[source]

.

set_file_paths(base_path)[source]

Set the paths for model, configuration and hyperparameter files.

Parameters:

base_path (string) – Base path from which to access the model files.

read_model_from_file()[source]

Read the logistic regression outcome prediction model from the model file path.

Returns:

Instance of the class LogisticRegression, which holds methods to make predictions from the logistic regression model.

Return type:

object of class LogisticRegression

write_model_to_file(prediction_model)[source]

Write the logistic regression outcome prediction model to the model file path.

Parameters:

prediction_model (object of class LogisticRegression) – Instance of the class LogisticRegression, which holds methods to make predictions from the logistic regression model.

read_configuration_from_file()[source]

Read the configuration dictionary from the configuration file path.

Returns:

Dictionary with information for the modeling, i.e., the dataset, the preprocessing steps, and the hyperparameter search space.

Return type:

dict

write_configuration_to_file(configuration)[source]

Write the configuration dictionary to the configuration file path.

Parameters:

configuration (dict) – Dictionary with information for the modeling, i.e., the dataset, the preprocessing steps, and the hyperparameter search space.

read_hyperparameters_from_file()[source]

Read the logistic regression outcome prediction model hyperparameters from the hyperparameter file path.

Returns:

Dictionary with the hyperparameter names and values for the logistic regression outcome prediction model.

Return type:

dict

write_hyperparameters_to_file(hyperparameters)[source]

Write the hyperparameter dictionary to the hyperparameter file path.

Parameters:

hyperparameters (dict) – Dictionary with the hyperparameter names and values for the logistic regression outcome prediction model.

class pyanno4rt.learning_model.frequentist.NaiveBayesModel(model_label, model_folder_path, dataset, preprocessing_steps, tune_space, tune_evaluations, tune_score, inspect_model, evaluate_model, display_options)[source]

Naive Bayes model class.

This class enables building an individual preprocessing pipeline, fit the naive Bayes model from the input data, inspect the model, make predictions with the model, and assess the predictive performance using multiple evaluation metrics.

The training process includes sequential model-based hyperparameter optimization with tree-structured Parzen estimators and stratified k-fold cross-validation for the objective function evaluation. Cross-validation is also applied to (optionally) inspect the validation feature importances and to generate out-of-folds predictions as a full reconstruction of the input labels for generalization assessment.

Parameters:
  • model_label (string) – Label for the naive Bayes model to be used for file naming.

  • dataset (dict) – Dictionary with the raw data set, the label viewpoint, the label bounds, the feature values and names, and the label values and names after modulation. In a compact way, this represents the input data for the naive Bayes model.

  • preprocessing_steps (tuple) –

    Sequence of labels associated with preprocessing algorithms which make up the preprocessing pipeline for the naive Bayes model. Current available algorithm labels are:

    • transformers : ‘Equalizer’, ‘StandardScaler’, ‘Whitening’.

  • tune_space (dict) –

    Search space for the Bayesian hyperparameter optimization, including

    • ’priors’ : prior probabilities of the classes;

    • ’var_smoothing’ : additional variance for calculation stability.

  • tune_evaluations (int) – Number of evaluation steps (trials) for the Bayesian hyperparameter optimization.

  • tune_score (string) –

    Scoring function for the evaluation of the hyperparameter set candidates. Current available scorers are:

    • ’log_loss’ : negative log-likelihood score;

    • ’roc_auc_score’ : area under the ROC curve score.

  • tune_splits (int) – Number of splits for the stratified cross-validation within each hyperparameter optimization step.

  • inspect_model (bool) – Indicator for the inspection of the model, e.g. the feature importances.

  • evaluate_model (bool) – Indicator for the evaluation of the model, e.g. the model KPIs.

  • oof_splits (int) – Number of splits for the stratified cross-validation within the out-of-folds evaluation step of the naive Bayes model.

preprocessor

Instance of the class DataPreprocessor, which holds methods to build the preprocessing pipeline, fit with the input features, transform the features, and derive the gradient of the preprocessing algorithms w.r.t the features.

Type:

object of class DataPreprocessor

features

Values of the input features.

Type:

ndarray

labels

Values of the input labels.

Type:

ndarray

configuration

Dictionary with information for the modeling, i.e., the dataset, the preprocessing, and the hyperparameter search space.

Type:

dict

model_path

Path for storing and retrieving the naive Bayes model.

Type:

string

configuration_path

Path for storing and retrieving the configuration dictionary.

Type:

string

hyperparameter_path

Path for storing and retrieving the hyperparameter dictionary.

Type:

string

updated_model

Indicator for the update status of the model, triggers recalculating the model inspection and model evaluation classes.

Type:

bool

prediction_model

Instance of the class GaussianNB, which holds methods to make predictions from the naive Bayes model.

Type:

object of class GaussianNB

inspector

Instance of the class ModelInspector, which holds methods to compute model inspection values, e.g. feature importances.

Type:

object of class ModelInspector

training_prediction

Array with the label predictions on the input data.

Type:

ndarray

oof_prediction

Array with the out-of-folds predictions on the input data.

Type:

ndarray

evaluator

Instance of the class ModelEvaluator, which holds methods to compute the evaluation metrics for a given array with label predictions.

Type:

object of class ModelEvaluator

Notes

Currently, the preprocessing pipeline for the model is restricted to transformations of the input feature values, e.g. scaling, dimensionality reduction or feature engineering. Transformations which affect the input labels in the same way, e.g. resampling or outlier removal, are not yet possible.

Overview

Methods

preprocess(features)

Preprocess the input feature vector with the built pipeline.

get_model(features, labels)

Get the naive Bayes outcome prediction model by reading from the model file path, the datahub, or by training.

tune_hyperparameters(features, labels)

Tune the hyperparameters of the naive Bayes model via sequential model-based optimization using the tree-structured Parzen estimator. As a variation, the objective function is evaluated based on a stratified k-fold cross-validation.

train(features, labels)

Train the naive Bayes outcome prediction model.

predict(features)

Predict the label values from the feature values.

predict_oof(features, labels)

Predict the out-of-folds (OOF) labels using a stratified k-fold cross-validation.

inspect(features, labels, oof_folds)

.

evaluate(features, labels)

.

set_file_paths(base_path)

Set the paths for model, configuration and hyperparameter files.

read_model_from_file()

Read the naive Bayes outcome prediction model from the model file path.

write_model_to_file(prediction_model)

Write the naive Bayes outcome prediction model to the model file path.

read_configuration_from_file()

Read the configuration dictionary from the configuration file path.

write_configuration_to_file(configuration)

Write the configuration dictionary to the configuration file path.

read_hyperparameters_from_file()

Read the naive Bayes outcome prediction model hyperparameters from the hyperparameter file path.

write_hyperparameters_to_file(hyperparameters)

Write the hyperparameter dictionary to the hyperparameter file path.

Members

preprocess(features)[source]

Preprocess the input feature vector with the built pipeline.

Parameters:

features (ndarray) – Array of input feature values.

Returns:

Array of transformed feature values.

Return type:

ndarray

get_model(features, labels)[source]

Get the naive Bayes outcome prediction model by reading from the model file path, the datahub, or by training.

Returns:

Instance of the class GaussianNB, which holds methods to make predictions from the naive Bayes model.

Return type:

object of class GaussianNB

tune_hyperparameters(features, labels)[source]

Tune the hyperparameters of the naive Bayes model via sequential model-based optimization using the tree-structured Parzen estimator. As a variation, the objective function is evaluated based on a stratified k-fold cross-validation.

Returns:

tuned_hyperparameters – Dictionary with the hyperparameter names and values tuned via Bayesian hyperparameter optimization.

Return type:

dict

train(features, labels)[source]

Train the naive Bayes outcome prediction model.

Returns:

prediction_model – Instance of the class GaussianNB, which holds methods to make predictions from the naive Bayes model.

Return type:

object of class GaussianNB

predict(features)[source]

Predict the label values from the feature values.

Parameters:

features (ndarray) – Array of input feature values.

Returns:

Floating-point label prediction or array of label predictions.

Return type:

float or ndarray

predict_oof(features, labels)[source]

Predict the out-of-folds (OOF) labels using a stratified k-fold cross-validation.

Parameters:

oof_splits (int) – Number of splits for the stratified cross-validation.

Returns:

Array with the out-of-folds label predictions.

Return type:

ndarray

inspect(features, labels, oof_folds)[source]

.

evaluate(features, labels)[source]

.

set_file_paths(base_path)[source]

Set the paths for model, configuration and hyperparameter files.

Parameters:

base_path (string) – Base path from which to access the model files.

read_model_from_file()[source]

Read the naive Bayes outcome prediction model from the model file path.

Returns:

Instance of the class GaussianNB, which holds methods to make predictions from the naive Bayes model.

Return type:

object of class GaussianNB

write_model_to_file(prediction_model)[source]

Write the naive Bayes outcome prediction model to the model file path.

Parameters:

prediction_model (object of class GaussianNB) – Instance of the class GaussianNB, which holds methods to make predictions from the naive Bayes model.

read_configuration_from_file()[source]

Read the configuration dictionary from the configuration file path.

Returns:

Dictionary with information for the modeling, i.e., the dataset, the preprocessing steps, and the hyperparameter search space.

Return type:

dict

write_configuration_to_file(configuration)[source]

Write the configuration dictionary to the configuration file path.

Parameters:

configuration (dict) – Dictionary with information for the modeling, i.e., the dataset, the preprocessing steps, and the hyperparameter search space.

read_hyperparameters_from_file()[source]

Read the naive Bayes outcome prediction model hyperparameters from the hyperparameter file path.

Returns:

Dictionary with the hyperparameter names and values for the naive Bayes outcome prediction model.

Return type:

dict

write_hyperparameters_to_file(hyperparameters)[source]

Write the hyperparameter dictionary to the hyperparameter file path.

Parameters:

hyperparameters (dict) – Dictionary with the hyperparameter names and values for the naive Bayes outcome prediction model.

class pyanno4rt.learning_model.frequentist.NeuralNetworkModel(model_label, model_folder_path, dataset, preprocessing_steps, architecture, max_hidden_layers, tune_space, tune_evaluations, tune_score, inspect_model, evaluate_model, display_options)[source]

Neural network model class.

This class enables building an individual preprocessing pipeline, fit the neural network model from the input data, inspect the model, make predictions with the model, and assess the predictive performance using multiple evaluation metrics.

The training process includes sequential model-based hyperparameter optimization with tree-structured Parzen estimators and stratified k-fold cross-validation for the objective function evaluation. Cross-validation is also applied to (optionally) inspect the validation feature importances and to generate out-of-folds predictions as a full reconstruction of the input labels for generalization assessment.

Parameters:
  • model_label (string) – Label for the neural network model to be used for file naming.

  • dataset (dict) – Dictionary with the raw data set, the label viewpoint, the label bounds, the feature values and names, and the label values and names after modulation. In a compact way, this represents the input data for the neural network model.

  • preprocessing_steps (tuple) –

    Sequence of labels associated with preprocessing algorithms which make up the preprocessing pipeline for the neural network model. Current available algorithm labels are:

    • transformers : ‘Equalizer’, ‘StandardScaler’, ‘Whitening’.

  • architecture ({'input-convex', 'standard'}) –

    Type of architecture for the neural network model. Current available architectures are:

    • ’input-convex’ : builds the input-convex network architecture;

    • ’standard’ : builds the standard feed-forward network architecture.

  • max_hidden_layers (int) – Maximum number of hidden layers for the neural network model.

  • tune_space (dict) –

    Search space for the Bayesian hyperparameter optimization, including

    • ’input_neuron_number’ : number of neurons for the input layer;

    • ’input_activation’ : activation function for the input layer (‘elu’, ‘exponential’, ‘gelu’, ‘linear’, ‘leaky_relu’, ‘relu’, ‘softmax’, ‘softplus’, ‘swish’);

    • ’hidden_neuron_number’ : number of neurons for the hidden layer(s);

    • ’hidden_activation’ : activation function for the hidden layer(s) (‘elu’, ‘gelu’, ‘linear’, ‘leaky_relu’, ‘relu’, ‘softmax’, ‘softplus’, ‘swish’);

    • ’input_dropout_rate’ : dropout rate for the input layer;

    • ’hidden_dropout_rate’ : dropout rate for the hidden layer(s);

    • ’batch_size’ : batch size;

    • ’learning_rate’ : learning rate

    • ’optimizer’ : algorithm for the optimization of the network (‘Adam’, ‘Ftrl’, ‘SGD’);

    • ’loss’ : loss function for the optimization of the network (‘BCE’, ‘FocalBCE’, ‘KLD’).

  • tune_evaluations (int) – Number of evaluation steps (trials) for the Bayesian hyperparameter optimization.

  • tune_score (string) –

    Scoring function for the evaluation of the hyperparameter set candidates. Current available scorers are:

    • ’log_loss’ : negative log-likelihood score;

    • ’roc_auc_score’ : area under the ROC curve score.

  • tune_splits (int) – Number of splits for the stratified cross-validation within each hyperparameter optimization step.

  • inspect_model (bool) – Indicator for the inspection of the model, e.g. the feature importances.

  • inspect_model – Indicator for the inspection of the model, e.g. the feature importances.

  • evaluate_model (bool) – Indicator for the evaluation of the model, e.g. the model KPIs.

  • oof_splits (int) – Number of splits for the stratified cross-validation within the out-of-folds evaluation step of the logistic regression model.

preprocessor

Instance of the class DataPreprocessor, which holds methods to build the preprocessing pipeline, fit with the input features, transform the features, and derive the gradient of the preprocessing algorithms w.r.t the features.

Type:

object of class DataPreprocessor

features

Values of the input features.

Type:

ndarray

labels

Values of the input labels.

Type:

ndarray

configuration

Dictionary with information for the modeling, i.e., the dataset, the preprocessing, and the hyperparameter search space.

Type:

dict

model_path

Path for storing and retrieving the neural network model.

Type:

string

configuration_path

Path for storing and retrieving the configuration dictionary.

Type:

string

hyperparameter_path

Path for storing and retrieving the hyperparameter dictionary.

Type:

string

updated_model

Indicator for the update status of the model, triggers recalculating the model inspection and model evaluation classes.

Type:

bool

prediction_model

Instance of the class Functional, which holds methods to make predictions from the neural network model.

Type:

object of class Functional

optimization_model

Instance of the class Functional, equivalent to prediction_model, but skips the sigmoid output activation.

Type:

object of class Functional

inspector

Instance of the class ModelInspector, which holds methods to compute model inspection values, e.g. feature importances.

Type:

object of class ModelInspector

training_prediction

Array with the label predictions on the input data.

Type:

ndarray

oof_prediction

Array with the out-of-folds predictions on the input data.

Type:

ndarray

evaluator

Instance of the class ModelEvaluator, which holds methods to compute the evaluation metrics for a given array with label predictions.

Type:

object of class ModelEvaluator

Notes

Currently, the preprocessing pipeline for the model is restricted to transformations of the input feature values, e.g. scaling, dimensionality reduction or feature engineering. Transformations which affect the input labels in the same way, e.g. resampling or outlier removal, are not yet possible.

Overview

Methods

preprocess(features)

Preprocess the input feature vector with the built pipeline.

get_prediction_model(features, labels)

Get the neural network outcome prediction model by reading from the model file path, the datahub, or by training.

get_optimization_model(features, labels)

Get the neural network outcome optimization model.

build_network(input_shape, output_shape, hyperparameters, squash_output)

Build the neural network architecture with the functional API.

compile_and_fit(prediction_model, features, labels, hyperparameters)

Compile and fit the neural network outcome prediction model to the input data.

tune_hyperparameters(features, labels)

Tune the hyperparameters of the neural network model via sequential model-based optimization using the tree-structured Parzen estimator. As a variation, the objective function is evaluated based on a stratified k-fold cross-validation.

train(features, labels)

Train the neural network outcome prediction model.

predict(features, squash_output)

Predict the label values from the feature values.

predict_oof(features, labels)

Predict the out-of-folds (OOF) labels using a stratified k-fold cross-validation.

inspect(features, labels, oof_folds)

.

evaluate(features, labels)

.

set_file_paths(base_path)

Set the paths for model, configuration and hyperparameter files.

read_model_from_file()

Read the neural network outcome prediction model from the model file path.

write_model_to_file(prediction_model)

Write the neural network outcome prediction model to the model file path.

read_configuration_from_file()

Read the configuration dictionary from the configuration file path.

write_configuration_to_file(configuration)

Write the configuration dictionary to the configuration file path.

read_hyperparameters_from_file()

Read the neural network outcome prediction model hyperparameters from the hyperparameter file path.

write_hyperparameters_to_file(hyperparameters)

Write the hyperparameter dictionary to the hyperparameter file path.

Members

preprocess(features)[source]

Preprocess the input feature vector with the built pipeline.

Parameters:

features (ndarray) – Array of input feature values.

Returns:

Array of transformed feature values.

Return type:

ndarray

get_prediction_model(features, labels)[source]

Get the neural network outcome prediction model by reading from the model file path, the datahub, or by training.

Returns:

Instance of the class Functional, which holds methods to make predictions from the neural network model.

Return type:

object of class Functional

get_optimization_model(features, labels)[source]

Get the neural network outcome optimization model.

Returns:

Instance of the class Functional, which holds methods to make predictions from the neural network model.

Return type:

object of class Functional

build_network(input_shape, output_shape, hyperparameters, squash_output)[source]

Build the neural network architecture with the functional API.

Parameters:
  • input_shape (int) – Shape of the input features.

  • output_shape (int) – Shape of the output labels.

  • hyperparameters (dict) – Dictionary with the hyperparameter names and values for the neural network outcome prediction model.

  • squash_output (bool) – Indicator for the use of a sigmoid activation function in the output layer.

Returns:

Instance of the class Functional, which holds methods to make predictions from the neural network model.

Return type:

object of class ‘Functional’

compile_and_fit(prediction_model, features, labels, hyperparameters)[source]

Compile and fit the neural network outcome prediction model to the input data.

Parameters:
  • prediction_model (object of class Functional) – Instance for the provision of the neural network architecture.

  • features (tf.float64) – Casted array of input feature values.

  • labels (tf.float64) – Casted array of input label values.

  • hyperparameters (dict) – Dictionary with the hyperparameter names and values for the neural network outcome prediction model.

Returns:

prediction_model – Instance of the class Functional, which holds methods to make predictions from the neural network model.

Return type:

object of class Functional

tune_hyperparameters(features, labels)[source]

Tune the hyperparameters of the neural network model via sequential model-based optimization using the tree-structured Parzen estimator. As a variation, the objective function is evaluated based on a stratified k-fold cross-validation.

Returns:

tuned_hyperparameters – Dictionary with the hyperparameter names and values tuned via Bayesian hyperparameter optimization.

Return type:

dict

train(features, labels)[source]

Train the neural network outcome prediction model.

Returns:

prediction_model – Instance of the class Functional, which holds methods to make predictions from the neural network model.

Return type:

object of class Functional

predict(features, squash_output=True)[source]

Predict the label values from the feature values.

Parameters:
  • features (ndarray) – Array of input feature values.

  • squash_output (bool) – Indicator for the use of a sigmoid activation function in the output layer.

Returns:

Floating-point label prediction or array of label predictions.

Return type:

float or ndarray

predict_oof(features, labels)[source]

Predict the out-of-folds (OOF) labels using a stratified k-fold cross-validation.

Parameters:

oof_splits (int) – Number of splits for the stratified cross-validation.

Returns:

Array with the out-of-folds label predictions.

Return type:

ndarray

inspect(features, labels, oof_folds)[source]

.

evaluate(features, labels)[source]

.

set_file_paths(base_path)[source]

Set the paths for model, configuration and hyperparameter files.

Parameters:

base_path (string) – Base path from which to access the model files.

read_model_from_file()[source]

Read the neural network outcome prediction model from the model file path.

Returns:

Instance of the class Functional, which holds methods to make predictions from the neural network model.

Return type:

object of class Functional

write_model_to_file(prediction_model)[source]

Write the neural network outcome prediction model to the model file path.

Parameters:

prediction_model (object of class Functional) – Instance of the class Functional, which holds methods to make predictions from the neural network model.

read_configuration_from_file()[source]

Read the configuration dictionary from the configuration file path.

Returns:

Dictionary with information for the modeling, i.e., the dataset, the preprocessing steps, and the hyperparameter search space.

Return type:

dict

write_configuration_to_file(configuration)[source]

Write the configuration dictionary to the configuration file path.

Parameters:

configuration (dict) – Dictionary with information for the modeling, i.e., the dataset, the preprocessing steps, and the hyperparameter search space.

read_hyperparameters_from_file()[source]

Read the neural network outcome prediction model hyperparameters from the hyperparameter file path.

Returns:

Dictionary with the hyperparameter names and values for the neural network outcome prediction model.

Return type:

dict

write_hyperparameters_to_file(hyperparameters)[source]

Write the hyperparameter dictionary to the hyperparameter file path.

Parameters:

hyperparameters (dict) – Dictionary with the hyperparameter names and values for the neural network outcome prediction model.

class pyanno4rt.learning_model.frequentist.RandomForestModel(model_label, model_folder_path, dataset, preprocessing_steps, tune_space, tune_evaluations, tune_score, inspect_model, evaluate_model, display_options)[source]

Random forest model class.

This class enables building an individual preprocessing pipeline, fit the random forest model from the input data, inspect the model, make predictions with the model, and assess the predictive performance using multiple evaluation metrics.

The training process includes sequential model-based hyperparameter optimization with tree-structured Parzen estimators and stratified k-fold cross-validation for the objective function evaluation. Cross-validation is also applied to (optionally) inspect the validation feature importances and to generate out-of-folds predictions as a full reconstruction of the input labels for generalization assessment.

Parameters:
  • model_label (string) – Label for the random forest model to be used for file naming.

  • dataset (dict) – Dictionary with the raw data set, the label viewpoint, the label bounds, the feature values and names, and the label values and names after modulation. In a compact way, this represents the input data for the random forest model.

  • preprocessing_steps (tuple) –

    Sequence of labels associated with preprocessing algorithms which make up the preprocessing pipeline for the random forest model. Current available algorithm labels are:

    • transformers : ‘Equalizer’, ‘StandardScaler’, ‘Whitening’.

  • tune_space (dict) –

    Search space for the Bayesian hyperparameter optimization, including

    • ’n_estimators’ : number of trees in the forest;

    • ’criterion’ : measure for the quality of a split;

    • ’max_depth’ : maximum depth of each tree;

    • ’min_samples_split’ : minimum number of samples required for splitting each node;

    • ’min_samples_leaf’ : minimum number of samples required at each node;

    • ’min_weight_fraction_leaf’ : minimum weighted fraction of the weights sum required at each node;

    • ’max_features’ : maximum number of features taken into account when looking for the best split at each node;

    • ’bootstrap’ : indicator for the use of bootstrap samples to build the trees;

    • ’class_weight’ : weights associated with the classes;

    • ’ccp_alpha’ : complexity parameter for minimal cost-complexity pruning.

  • tune_evaluations (int) – Number of evaluation steps (trials) for the Bayesian hyperparameter optimization.

  • tune_score (string) –

    Scoring function for the evaluation of the hyperparameter set candidates. Current available scorers are:

    • ’log_loss’ : negative log-likelihood score;

    • ’roc_auc_score’ : area under the ROC curve score.

  • tune_splits (int) – Number of splits for the stratified cross-validation within each hyperparameter optimization step.

  • inspect_model (bool) – Indicator for the inspection of the model, e.g. the feature importances.

  • evaluate_model (bool) – Indicator for the evaluation of the model, e.g. the model KPIs.

  • oof_splits (int) – Number of splits for the stratified cross-validation within the out-of-folds evaluation step of the random forest model.

preprocessor

Instance of the class DataPreprocessor, which holds methods to build the preprocessing pipeline, fit with the input features, transform the features, and derive the gradient of the preprocessing algorithms w.r.t the features.

Type:

object of class DataPreprocessor

features

Values of the input features.

Type:

ndarray

labels

Values of the input labels.

Type:

ndarray

configuration

Dictionary with information for the modeling, i.e., the dataset, the preprocessing, and the hyperparameter search space.

Type:

dict

model_path

Path for storing and retrieving the random forest model.

Type:

string

configuration_path

Path for storing and retrieving the configuration dictionary.

Type:

string

hyperparameter_path

Path for storing and retrieving the hyperparameter dictionary.

Type:

string

updated_model

Indicator for the update status of the model, triggers recalculating the model inspection and model evaluation classes.

Type:

bool

prediction_model

Instance of the class RandomForestClassifier, which holds methods to make predictions from the random forest model.

Type:

object of class RandomForestClassifier

inspector

Instance of the class ModelInspector, which holds methods to compute model inspection values, e.g. feature importances.

Type:

object of class ModelInspector

training_prediction

Array with the label predictions on the input data.

Type:

ndarray

oof_prediction

Array with the out-of-folds predictions on the input data.

Type:

ndarray

evaluator

Instance of the class ModelEvaluator, which holds methods to compute the evaluation metrics for a given array with label predictions.

Type:

object of class ModelEvaluator

Notes

Currently, the preprocessing pipeline for the model is restricted to transformations of the input feature values, e.g. scaling, dimensionality reduction or feature engineering. Transformations which affect the input labels in the same way, e.g. resampling or outlier removal, are not yet possible.

Overview

Methods

preprocess(features)

Preprocess the input feature vector with the built pipeline.

get_model(features, labels)

Get the random forest outcome prediction model by reading from the model file path, the datahub, or by training.

tune_hyperparameters(features, labels)

Tune the hyperparameters of the random forest model via sequential model-based optimization using the tree-structured Parzen estimator. As a variation, the objective function is evaluated based on a stratified k-fold cross-validation.

train(features, labels)

Train the random forest outcome prediction model.

predict(features)

Predict the label values from the feature values.

predict_oof(features, labels)

Predict the out-of-folds (OOF) labels using a stratified k-fold cross-validation.

inspect(features, labels, oof_folds)

.

evaluate(features, labels)

.

set_file_paths(base_path)

Set the paths for model, configuration and hyperparameter files.

read_model_from_file()

Read the random forest outcome prediction model from the model file path.

write_model_to_file(prediction_model)

Write the random forest outcome prediction model to the model file path.

read_configuration_from_file()

Read the configuration dictionary from the configuration file path.

write_configuration_to_file(configuration)

Write the configuration dictionary to the configuration file path.

read_hyperparameters_from_file()

Read the random forest outcome prediction model hyperparameters from the hyperparameter file path.

write_hyperparameters_to_file(hyperparameters)

Write the hyperparameter dictionary to the hyperparameter file path.

Members

preprocess(features)[source]

Preprocess the input feature vector with the built pipeline.

Parameters:

features (ndarray) – Array of input feature values.

Returns:

Array of transformed feature values.

Return type:

ndarray

get_model(features, labels)[source]

Get the random forest outcome prediction model by reading from the model file path, the datahub, or by training.

Returns:

Instance of the class RandomForestClassifier, which holds methods to make predictions from the random forest model.

Return type:

object of class RandomForestClassifier

tune_hyperparameters(features, labels)[source]

Tune the hyperparameters of the random forest model via sequential model-based optimization using the tree-structured Parzen estimator. As a variation, the objective function is evaluated based on a stratified k-fold cross-validation.

Returns:

tuned_hyperparameters – Dictionary with the hyperparameter names and values tuned via Bayesian hyperparameter optimization.

Return type:

dict

train(features, labels)[source]

Train the random forest outcome prediction model.

Returns:

prediction_model – Instance of the class RandomForestClassifier, which holds methods to make predictions from the random forest model.

Return type:

object of class RandomForestClassifier

predict(features)[source]

Predict the label values from the feature values.

Parameters:

features (ndarray) – Array of input feature values.

Returns:

Floating-point label prediction or array of label predictions.

Return type:

float or ndarray

predict_oof(features, labels)[source]

Predict the out-of-folds (OOF) labels using a stratified k-fold cross-validation.

Parameters:

oof_splits (int) – Number of splits for the stratified cross-validation.

Returns:

Array with the out-of-folds label predictions.

Return type:

ndarray

inspect(features, labels, oof_folds)[source]

.

evaluate(features, labels)[source]

.

set_file_paths(base_path)[source]

Set the paths for model, configuration and hyperparameter files.

Parameters:

base_path (string) – Base path from which to access the model files.

read_model_from_file()[source]

Read the random forest outcome prediction model from the model file path.

Returns:

Instance of the class RandomForestClassifier, which holds methods to make predictions from the random forest model.

Return type:

object of class RandomForestClassifier

write_model_to_file(prediction_model)[source]

Write the random forest outcome prediction model to the model file path.

Parameters:

prediction_model (object of class RandomForestClassifier) – Instance of the class RandomForestClassifier, which holds methods to make predictions from the random forest model.

read_configuration_from_file()[source]

Read the configuration dictionary from the configuration file path.

Returns:

Dictionary with information for the modeling, i.e., the dataset, the preprocessing steps, and the hyperparameter search space.

Return type:

dict

write_configuration_to_file(configuration)[source]

Write the configuration dictionary to the configuration file path.

Parameters:

configuration (dict) – Dictionary with information for the modeling, i.e., the dataset, the preprocessing steps, and the hyperparameter search space.

read_hyperparameters_from_file()[source]

Read the random forest outcome prediction model hyperparameters from the hyperparameter file path.

Returns:

Dictionary with the hyperparameter names and values for the random forest outcome prediction model.

Return type:

dict

write_hyperparameters_to_file(hyperparameters)[source]

Write the hyperparameter dictionary to the hyperparameter file path.

Parameters:

hyperparameters (dict) – Dictionary with the hyperparameter names and values for the random forest outcome prediction model.

class pyanno4rt.learning_model.frequentist.SupportVectorMachineModel(model_label, model_folder_path, dataset, preprocessing_steps, tune_space, tune_evaluations, tune_score, inspect_model, evaluate_model, display_options)[source]

Support vector machine model class.

This class enables building an individual preprocessing pipeline, fit the support vector machine model from the input data, inspect the model, make predictions with the model, and assess the predictive performance using multiple evaluation metrics.

The training process includes sequential model-based hyperparameter optimization with tree-structured Parzen estimators and stratified k-fold cross-validation for the objective function evaluation. Cross-validation is also applied to (optionally) inspect the validation feature importances and to generate out-of-folds predictions as a full reconstruction of the input labels for generalization assessment.

Parameters:
  • model_label (string) – Label for the support vector machine model to be used for file naming.

  • dataset (dict) – Dictionary with the raw data set, the label viewpoint, the label bounds, the feature values and names, and the label values and names after modulation. In a compact way, this represents the input data for the support vector machine model.

  • preprocessing_steps (tuple) –

    Sequence of labels associated with preprocessing algorithms which make up the preprocessing pipeline for the support vector machine model. Current available algorithm labels are:

    • transformers : ‘Equalizer’, ‘StandardScaler’, ‘Whitening’.

  • tune_space (dict) –

    Search space for the Bayesian hyperparameter optimization, including

    • ’C’ : inverse of the regularization strength;

    • ’kernel’ : kernel type for the support vector machine;

    • ’degree’ : degree of the polynomial kernel function;

    • ’gamma’ : kernel coefficient for RBF, polynomial and sigmoid kernel;

    • ’tol’ : tolerance for stopping criteria;

    • ’class_weight’ : weights associated with the classes.

  • tune_evaluations (int) – Number of evaluation steps (trials) for the Bayesian hyperparameter optimization.

  • tune_score (string) –

    Scoring function for the evaluation of the hyperparameter set candidates. Current available scorers are:

    • ’log_loss’ : negative log-likelihood score;

    • ’roc_auc_score’ : area under the ROC curve score.

  • tune_splits (int) – Number of splits for the stratified cross-validation within each hyperparameter optimization step.

  • inspect_model (bool) – Indicator for the inspection of the model, e.g. the feature importances.

  • evaluate_model (bool) – Indicator for the evaluation of the model, e.g. the model KPIs.

  • oof_splits (int) – Number of splits for the stratified cross-validation within the out-of-folds evaluation step of the support vector machine model.

preprocessor

Instance of the class DataPreprocessor, which holds methods to build the preprocessing pipeline, fit with the input features, transform the features, and derive the gradient of the preprocessing algorithms w.r.t the features.

Type:

object of class DataPreprocessor

features

Values of the input features.

Type:

ndarray

labels

Values of the input labels.

Type:

ndarray

configuration

Dictionary with information for the modeling, i.e., the dataset, the preprocessing, and the hyperparameter search space.

Type:

dict

model_path

Path for storing and retrieving the support vector machine model.

Type:

string

configuration_path

Path for storing and retrieving the configuration dictionary.

Type:

string

hyperparameter_path

Path for storing and retrieving the hyperparameter dictionary.

Type:

string

updated_model

Indicator for the update status of the model, triggers recalculating the model inspection and model evaluation classes.

Type:

bool

prediction_model

Instance of the class SVC, which holds methods to make predictions from the support vector machine model.

Type:

object of class SVC

inspector

Instance of the class ModelInspector, which holds methods to compute model inspection values, e.g. feature importances.

Type:

object of class ModelInspector

training_prediction

Array with the label predictions on the input data.

Type:

ndarray

oof_prediction

Array with the out-of-folds predictions on the input data.

Type:

ndarray

evaluator

Instance of the class ModelEvaluator, which holds methods to compute the evaluation metrics for a given array with label predictions.

Type:

object of class ModelEvaluator

Notes

Currently, the preprocessing pipeline for the model is restricted to transformations of the input feature values, e.g. scaling, dimensionality reduction or feature engineering. Transformations which affect the input labels in the same way, e.g. resampling or outlier removal, are not yet possible.

Overview

Methods

preprocess(features)

Preprocess the input feature vector with the built pipeline.

get_model(features, labels)

Get the support vector machine outcome prediction model by reading from the model file path, the datahub, or by training.

tune_hyperparameters(features, labels)

Tune the hyperparameters of the support vector machine model via sequential model-based optimization using the tree-structured Parzen estimator. As a variation, the objective function is evaluated based on a stratified k-fold cross-validation.

train(features, labels)

Train the support vector machine outcome prediction model.

predict(features)

Predict the label values from the feature values.

predict_oof(features, labels)

Predict the out-of-folds (OOF) labels using a stratified k-fold cross-validation.

inspect(features, labels, oof_folds)

.

evaluate(features, labels)

.

set_file_paths(base_path)

Set the paths for model, configuration and hyperparameter files.

read_model_from_file()

Read the support vector machine outcome prediction model from the model file path.

write_model_to_file(prediction_model)

Write the support vector machine outcome prediction model to the model file path.

read_configuration_from_file()

Read the configuration dictionary from the configuration file path.

write_configuration_to_file(configuration)

Write the configuration dictionary to the configuration file path.

read_hyperparameters_from_file()

Read the support vector machine outcome prediction model hyperparameters from the hyperparameter file path.

write_hyperparameters_to_file(hyperparameters)

Write the hyperparameter dictionary to the hyperparameter file path.

Members

preprocess(features)[source]

Preprocess the input feature vector with the built pipeline.

Parameters:

features (ndarray) – Array of input feature values.

Returns:

Array of transformed feature values.

Return type:

ndarray

get_model(features, labels)[source]

Get the support vector machine outcome prediction model by reading from the model file path, the datahub, or by training.

Returns:

Instance of the class SVC, which holds methods to make predictions from the support vector machine model.

Return type:

object of class SVC

tune_hyperparameters(features, labels)[source]

Tune the hyperparameters of the support vector machine model via sequential model-based optimization using the tree-structured Parzen estimator. As a variation, the objective function is evaluated based on a stratified k-fold cross-validation.

Returns:

tuned_hyperparameters – Dictionary with the hyperparameter names and values tuned via Bayesian hyperparameter optimization.

Return type:

dict

train(features, labels)[source]

Train the support vector machine outcome prediction model.

Returns:

prediction_model – Instance of the class SVC, which holds methods to make predictions from the support vector machine model.

Return type:

object of class SVC

predict(features)[source]

Predict the label values from the feature values.

Parameters:

features (ndarray) – Array of input feature values.

Returns:

Floating-point label prediction or array of label predictions.

Return type:

float or ndarray

predict_oof(features, labels)[source]

Predict the out-of-folds (OOF) labels using a stratified k-fold cross-validation.

Parameters:

oof_splits (int) – Number of splits for the stratified cross-validation.

Returns:

Array with the out-of-folds label predictions.

Return type:

ndarray

inspect(features, labels, oof_folds)[source]

.

evaluate(features, labels)[source]

.

set_file_paths(base_path)[source]

Set the paths for model, configuration and hyperparameter files.

Parameters:

base_path (string) – Base path from which to access the model files.

read_model_from_file()[source]

Read the support vector machine outcome prediction model from the model file path.

Returns:

Instance of the class SVC, which holds methods to make predictions from the support vector machine model.

Return type:

object of class SVC

write_model_to_file(prediction_model)[source]

Write the support vector machine outcome prediction model to the model file path.

Parameters:

prediction_model (object of class SVC) – Instance of the class SVC, which holds methods to make predictions from the support vector machine model.

read_configuration_from_file()[source]

Read the configuration dictionary from the configuration file path.

Returns:

Dictionary with information for the modeling, i.e., the dataset, the preprocessing steps, and the hyperparameter search space.

Return type:

dict

write_configuration_to_file(configuration)[source]

Write the configuration dictionary to the configuration file path.

Parameters:

configuration (dict) – Dictionary with information for the modeling, i.e., the dataset, the preprocessing steps, and the hyperparameter search space.

read_hyperparameters_from_file()[source]

Read the support vector machine outcome prediction model hyperparameters from the hyperparameter file path.

Returns:

Dictionary with the hyperparameter names and values for the support vector machine outcome prediction model.

Return type:

dict

write_hyperparameters_to_file(hyperparameters)[source]

Write the hyperparameter dictionary to the hyperparameter file path.

Parameters:

hyperparameters (dict) – Dictionary with the hyperparameter names and values for the support vector machine outcome prediction model.