pyanno4rt.learning_model.dataset._tabular_data_generator
Tabular dataset generation.
Overview
Tabular dataset generation class. |
Classes
- class pyanno4rt.learning_model.dataset._tabular_data_generator.TabularDataGenerator(model_label, feature_filter, label_name, label_bounds, time_variable_name, label_viewpoint, tune_splits, oof_splits)[source]
Tabular dataset generation class.
This class provides methods to load, decompose, modulate and binarize a tabular base dataset.
- Parameters:
model_label (str) – Label for the machine learning model.
feature_filter (dict) – Dictionary with a list of feature names and a value from {‘retain’, ‘remove’} as an indicator for retaining/removing the features prior to model fitting.
label_name (str) – Name of the label variable.
label_bounds (list) – Bounds for the label values to binarize into positive (value lies inside the bounds) and negative class (value lies outside the bounds).
time_variable_name (str) – Name of the time-after-radiotherapy variable (unit should be days).
label_viewpoint ({'early', 'late', 'long-term', 'longitudinal', 'profile'}) – Time of observation for the presence of tumor control and/or normal tissue complication events.
tune_splits (int) – Number of splits for the stratified cross-validation within each model hyperparameter optimization step.
oof_splits (int) – Number of splits for the stratified cross-validation within the out-of-folds model evaluation step.
- model_label
See ‘Parameters’.
- Type:
str
- feature_filter
See ‘Parameters’.
- Type:
dict
- label_name
See ‘Parameters’.
- Type:
str
- label_bounds
See ‘Parameters’.
- Type:
list
- time_variable_name
See ‘Parameters’.
- Type:
str
- label_viewpoint
See ‘Parameters’.
- Type:
{‘early’, ‘late’, ‘long-term’, ‘longitudinal’, ‘profile’}
- tune_splits
See ‘Parameters’.
- Type:
int
- oof_splits
See ‘Parameters’.
- Type:
int
Overview
Methods generate(data_path)Generate the data information.
decompose(dataset, feature_filter, label_name, time_variable_name)Decompose the base tabular dataset.
modulate(data_information, label_viewpoint)Modulate the data information.
binarize(data_information, label_bounds)Binarize the data information.
add_fold_numbers(data_information, tune_splits, oof_splits)Add the stratified cross-validation fold numbers.
Members
- generate(data_path)[source]
Generate the data information.
- Parameters:
data_path (str) – Path to the data set used for fitting the machine learning model.
- Returns:
Dictionary with the decomposed, modulated and binarized data information.
- Return type:
dict
- decompose(dataset, feature_filter, label_name, time_variable_name)[source]
Decompose the base tabular dataset.
- Parameters:
dataset (
DataFrame) – Dataframe with the feature and label names/values.feature_filter (dict) – Dictionary with a list of feature names and a value from {‘retain’, ‘remove’} as an indicator for retaining/removing the features prior to model fitting.
label_name (str) – Name of the label variable.
time_variable_name (str) – Name of the time-after-radiotherapy variable (unit should be days).
- Returns:
Dictionary with the decomposed data information.
- Return type:
dict
- modulate(data_information, label_viewpoint)[source]
Modulate the data information.
- Parameters:
data_information (dict) – Dictionary with the decomposed data information.
label_viewpoint ({'early', 'late', 'long-term', 'longitudinal', 'profile'}) – Time of observation for the presence of tumor control and/or normal tissue complication events.
- Returns:
Dictionary with the modulated data information.
- Return type:
dict
- binarize(data_information, label_bounds)[source]
Binarize the data information.
- Parameters:
data_information (dict) – Dictionary with the decomposed data information.
label_bounds (list) – Bounds for the label values to binarize into positive (value lies inside the bounds) and negative class (value lies outside the bounds).
- Returns:
Dictionary with the binarized data information.
- Return type:
dict
- add_fold_numbers(data_information, tune_splits, oof_splits)[source]
Add the stratified cross-validation fold numbers.
- Parameters:
data_information (dict) – Dictionary with the preprocessed data information.
tune_splits (int) – Number of splits for the stratified cross-validation within each model hyperparameter optimization step.
oof_splits (int) – Number of splits for the stratified cross-validation within the out-of-folds model evaluation step.
- Returns:
Dictionary with the stratified cross-validation fold numbers.
- Return type:
dict