domainlab.utils package¶

Submodules¶

domainlab.utils.flows_gen_img_model module¶

class domainlab.utils.flows_gen_img_model.FlowGenImgs(model, device)[source]¶

Bases: object

gen_img_loader(loader, device, path, domain)[source]¶: gen image for the first batch of input loader

gen_img_xyd(img, vec_y, vec_d, device, path, folder_na)[source]¶

domainlab.utils.flows_gen_img_model.fun_gen(model, device, node, args, subfolder_na, output_folder_na='gen')[source]¶

domainlab.utils.generate_benchmark_plots module¶

generate the benchmark plots by calling the gen_bencmark_plots(…) function

domainlab.utils.generate_benchmark_plots.boxplot(dataframe_in, obj, file=None)[source]¶

generate the boxplots dataframe_in: dataframe containing the data with columns

[param_idx, task , algo, epos, te_d, seed, params, obj1, …, obj2]

obj: objective to be considered in the plot (needs to be contained in dataframe_in) file: foldername to save the plots (if None, the plot will not be saved)

domainlab.utils.generate_benchmark_plots.boxplot_stochastic(dataframe_in, obj, file=None)[source]¶

generate boxplot for stochastic variation dataframe_in: dataframe containing the data with columns

[param_idx, task , algo, epos, te_d, seed, params, obj1, …, obj2]

obj: objective to be considered in the plot (needs to be contained in dataframe_in) file: foldername to save the plots (if None, the plot will not be saved)

domainlab.utils.generate_benchmark_plots.boxplot_systematic(dataframe_in, obj, file=None)[source]¶

generate boxplot for ssystemtic variation dataframe_in: dataframe containing the data with columns

[param_idx, task , algo, epos, te_d, seed, params, obj1, …, obj2]

obj: objective to be considered in the plot (needs to be contained in dataframe_in) file: foldername to save the plots (if None, the plot will not be saved)

domainlab.utils.generate_benchmark_plots.gen_benchmark_plots(agg_results: str, output_dir: str, use_param_index: bool = True)[source]¶

generate the benchmark plots from a csv file containing the aggregated restults. The csv file must have the columns: [param_index, task, algo, epos, te_d, seed, params, …] all columns after seed are intrepreted as objectives of the results, they can e.g. be acc, precision, recall, specificity, f1, auroc.

agg_results: path to the csv file output_dir: path to a folder which shall contain the results skip_gen: Skips the actual plotting, used to speed up testing.

domainlab.utils.generate_benchmark_plots.gen_plots(dataframe: DataFrame, output_dir: str, use_param_index: bool)[source]¶: dataframe: dataframe with columns [‘param_index’,’task’,’ algo’,’ epos’,’ te_d’,’ seed’,’ params’,’ acc’,’precision’,…]

domainlab.utils.generate_benchmark_plots.max_0_x(x_arg)[source]¶: max(0, x_arg)

domainlab.utils.generate_benchmark_plots.radar_plot(dataframe_in, file=None, distinguish_hyperparam=True)[source]¶

dataframe_in: dataframe containing the data with columns: [algo, epos, te_d, seed, params, obj1, …, obj2]

file: filename to save the plots (if None, the plot will not be saved) distinguish_param_setups: if True the plot will not only distinguish between models,

but also between the parameter setups

domainlab.utils.generate_benchmark_plots.round_vals_in_dict(df_column_in, use_param_index)[source]¶

replaces the dictionary by a string containing only the significant digits of the hyperparams or (if use_param_index = True) by the parameter index df_column_in: columns of the dataframe containing the param index and the dictionary of

hyperparams in the form [param_index, params]

use_param_index: usage of param_index instead of exact values

domainlab.utils.generate_benchmark_plots.scatterplot(dataframe_in, obj, file=None, kde=True, distinguish_hyperparam=False)[source]¶

dataframe: dataframe containing the data with columns: [algo, epos, te_d, seed, params, obj1, …, obj2]

obj1 & obj2: name of the objectives which shall be plotted against each other file: filename to save the plots (if None, the plot will not be saved) kde: if True the distribution of the points will be estimated and plotted as kde plot distinguish_param_setups: if True the plot will not only distinguish between models,

but also between the parameter setups

domainlab.utils.generate_benchmark_plots.scatterplot_matrix(dataframe_in, use_param_index, file=None, kind='reg', distinguish_param_setups=True)[source]¶

dataframe: dataframe containing the data with columns: [algo, epos, te_d, seed, params, obj1, …, obj2]

file: filename to save the plots (if None, the plot will not be saved) reg: if True a regression line will be plotted over the data distinguish_param_setups: if True the plot will not only distinguish between models,

but also between the parameter setups

domainlab.utils.get_git_tag module¶

domainlab.utils.get_git_tag.get_git_tag(print_diff=False)[source]¶

domainlab.utils.hyperparameter_gridsearch module¶

gridsearch for the hyperparameter space

def add_next_param_from_list is an recursive function to make cartesian product along all the scalar hyper-parameters, this resursive function is used in def grid_task

domainlab.utils.hyperparameter_gridsearch.add_next_param_from_list(param_grid: dict, grid: dict, grid_df: DataFrame)[source]¶

can be used in a recoursive fassion to add all combinations of the parameters in param_grid to grid_df param_grid: dictionary with all possible values for each parameter

{‘p1’: [1, 2, 3], ‘p2’: [0, 5], …}

grid: a grid which will build itself in the recursion, start with grid = {}: after one step grid = {p1: 1}

grid_df: dataframe which will save the finished grids task_name: task name also: G_MODEL_NA name

domainlab.utils.hyperparameter_gridsearch.add_references_and_check_constraints(grid_df_prior, grid_df, referenced_params, config, task_name)[source]¶: in the last step all parameters which are referenced need to be add to the grid. All gridpoints not satisfying the constraints are removed afterwards.

domainlab.utils.hyperparameter_gridsearch.add_shared_params_to_param_grids(shared_df, dict_param_grids, config)[source]¶: use the parameters in the dataframe of shared parameters and add them to the dictionary of parameters for the current task only the shared parameters specified in the config are respected shared_df: Dataframe of shared hyperparameters dict_param_grids: dictionary of the parameter grids config: config for the current task

domainlab.utils.hyperparameter_gridsearch.build_param_grid_of_shared_params(shared_df)[source]¶: go back from the data frame format of the shared hyperparamters to a list format

domainlab.utils.hyperparameter_gridsearch.grid_task(grid_df: DataFrame, task_name: str, config: dict, shared_df: DataFrame)[source]¶: create grid for one sampling task for a method and add it to the dataframe

domainlab.utils.hyperparameter_gridsearch.lognormal_grid(param_config)[source]¶: get a normal distributed grid given the specifications in the param_config param_config: config which needs to contain ‘num’, ‘mean’, ‘std’

domainlab.utils.hyperparameter_gridsearch.loguniform_grid(param_config)[source]¶: get a loguniform distributed grid given the specifications in the param_config param_config: config which needs to contain ‘num’, ‘max’, ‘min’

domainlab.utils.hyperparameter_gridsearch.normal_grid(param_config, lognormal=False)[source]¶: get a normal distributed grid given the specifications in the param_config param_config: config which needs to contain ‘num’, ‘mean’, ‘std’

domainlab.utils.hyperparameter_gridsearch.rais_error_if_num_not_specified(param_name: str, param_config: dict)[source]¶: for each parameter a number of grid points needs to be specified This function raises an error if this is not the case param_name: parameter name under consideration param_config: config of this parameter

domainlab.utils.hyperparameter_gridsearch.round_to_discreate_grid_normal(grid, param_config)[source]¶: round the values of the grid to the grid spacing specified in the config for normal and lognormal grids

domainlab.utils.hyperparameter_gridsearch.round_to_discreate_grid_uniform(grid, param_config)[source]¶: round the values of the grid to the grid spacing specified in the config for uniform and loguniform grids

domainlab.utils.hyperparameter_gridsearch.sample_grid(param_config)[source]¶: given the parameter config, this function samples all parameters which are distributed according the the categorical, uniform, loguniform, normal or lognormal distribution.

domainlab.utils.hyperparameter_gridsearch.sample_gridsearch(config: dict, dest: str | None = None) → DataFrame[source]¶

create the hyperparameters grid according to the given config, which should be the dictionary of the full benchmark config yaml. Result is saved to ‘output_dir/hyperparameters.csv’ of the config if not specified explicitly.

Note: Parts of the yaml content are executed. Thus use this only with trusted config files.

domainlab.utils.hyperparameter_gridsearch.uniform_grid(param_config)[source]¶: get a uniform distributed grid given the specifications in the param_config param_config: config which needs to contain ‘num’, ‘max’, ‘min’, ‘step’

domainlab.utils.hyperparameter_retrieval module¶

retrieval for hyperparameters

domainlab.utils.hyperparameter_retrieval.get_gamma_reg(args, component_name)[source]¶: Retrieves either a shared gamma regularization, or individual ones for each specified object

domainlab.utils.hyperparameter_sampling module¶

Samples the hyperparameters according to a benchmark configuration file.

# Structure of this file: - Class Hyperparameter # Inherited Classes # Functions to sample hyper-parameters and log into csv file

class domainlab.utils.hyperparameter_sampling.CategoricalHyperparameter(name: str, config: dict)[source]¶

Bases: Hyperparameter

A sampled hyperparameter, which is constraint to fixed, user given values and datatype

datatype()[source]¶: Returns the datatype of this parameter. This does not apply for references.

sample()[source]¶: Sample this parameter, respecting properties

class domainlab.utils.hyperparameter_sampling.Hyperparameter(name: str)[source]¶

Bases: object

Represents a hyperparameter. The datatype of .val is int if step and p1 is integer valued, else float.

p1: min or mean p2: max or scale reference: None or name of referenced hyperparameter

datatype()[source]¶: Returns the datatype of this parameter. This does not apply for references.

get_val()[source]¶: Returns the current value of the hyperparameter

sample()[source]¶: Sample this parameter, respecting properties

class domainlab.utils.hyperparameter_sampling.ReferenceHyperparameter(name: str, config: dict)[source]¶

Bases: Hyperparameter

Hyperparameter that references only a different one. Thus, this parameter is not sampled but set after sampling.

datatype()[source]¶: Returns the datatype of this parameter. This does not apply for references.

sample()[source]¶: Sample this parameter, respecting properties

class domainlab.utils.hyperparameter_sampling.SampledHyperparameter(name: str, config: dict)[source]¶

Bases: Hyperparameter

A numeric hyperparameter that shall be sampled

datatype()[source]¶: Returns the datatype of this parameter. This does not apply for references.

sample()[source]¶: Sample this parameter, respecting properties

domainlab.utils.hyperparameter_sampling.check_constraints(params: List[Hyperparameter], constraints) → bool[source]¶: Check if the constraints are fulfilled.

domainlab.utils.hyperparameter_sampling.create_samples_from_shared_samples(shared_samples: DataFrame, config: dict, task_name: str)[source]¶: add informations like task, G_MODEL_NA and constrainds to the shared samples Parameters: shared_samples: pd Dataframe with columns [G_METHOD_NA, G_MODEL_NA, ‘params’] config: dataframe with yaml configuration of the current task task_name: name of the current task

domainlab.utils.hyperparameter_sampling.get_hyperparameter(name: str, config: dict) → Hyperparameter[source]¶: Factory function. Instantiates the correct Hyperparameter

domainlab.utils.hyperparameter_sampling.get_shared_samples(shared_samples_full: DataFrame, shared_config_full: dict, task_config: dict)[source]¶

creates a dataframe with columns [task, G_MODEL_NA, params],

task and G_MODEL_NA are all for all rows, but params is filled with the shared parameters of shared_samples_full requested by task_config.

creates a shared config containing only information about the

shared hyperparameters requested by the task_config

domainlab.utils.hyperparameter_sampling.is_dict_with_key(input_dict, key) → bool[source]¶: Determines if the input argument is a dictionary and it has key

domainlab.utils.hyperparameter_sampling.sample_hyperparameters(config: dict, dest: str | None = None, sampling_seed: int | None = None) → DataFrame[source]¶

Samples the hyperparameters according to the given config, which should be the dictionary of the full benchmark config yaml. Result is saved to ‘output_dir/hyperparameters.csv’ of the config if not specified explicitly.

Note: Parts of the yaml content are executed. Thus use this only with trusted config files.

domainlab.utils.hyperparameter_sampling.sample_parameters(init_params: List[Hyperparameter], constraints, shared_config=None, shared_samples=None) → dict[source]¶

Tries to sample from the hyperparameter list.

Errors if in 10_0000 attempts no sample complying with the constraints is found.

domainlab.utils.hyperparameter_sampling.sample_task(num_samples: int, task_name: str, conf_samp: tuple, shared_conf_samp: tuple)[source]¶: Sample one task and add it to the dataframe

domainlab.utils.hyperparameter_sampling.sample_task_only_shared(num_samples, task_name, sample_df, config, shared_conf_samp)[source]¶: sample one task and add it to the dataframe for task descriptions which only contain shared hyperparameters

domainlab.utils.logger module¶

A logger for our software

class domainlab.utils.logger.Logger[source]¶

Bases: object

static logger class

static get_logger(logger_name='logger_4746', loglevel='INFO')[source]¶: returns a logger if no logger was created yet, it will create a logger with the name specified in logger_name with the level specified in loglevel. If the logger was created for the first time the arguments do not change anything at the behaviour anymore

logger = None¶

domainlab.utils.override_interface module¶

domainlab.utils.override_interface.override_interface(interface_class)[source]¶

overrides. :param interface_class: the interface class name, always specify this explicitly as otherwise interface_class is going to be the nearest function it decorate, and argument “method2override” of returned function “overrider” accept will be the current child class

class BaseClass()
class Child(BaseClass):
    @overrides(BaseClass)
    def fun(self):
        pass

domainlab.utils.perf module¶

Classification Performance

class domainlab.utils.perf.PerfClassif[source]¶

Bases: object

Classification Performance

classmethod cal_acc(model, loader_te, device)[source]¶

Parameters:

model –
loader_te –
device – for final test, GPU can be used

classmethod gen_fun_acc(dim_target)[source]¶

Parameters:: dim_target – class/domain label embeding dimension

classmethod get_list_pred_target(model_local, loader_te, device)[source]¶: isolate function to check if prediction persist each time loader is went through

domainlab.utils.perf_metrics module¶

Classification Performance

class domainlab.utils.perf_metrics.PerfMetricClassif(num_classes, agg_precision_recall_f1='macro')[source]¶

Bases: object

Classification Performance metrics

cal_metrics(model, loader_te, device)[source]¶

Parameters:

model –
loader_te –
device – for final test, GPU can be used

domainlab.utils.sanity_check module¶

This class is used to perform the sanity check on a task description

class domainlab.utils.sanity_check.SanityCheck(args, task)[source]¶

Bases: object

Performs a sanity check on the given args and the task when running dataset_sanity_check(self)

dataset_sanity_check()[source]¶: when we load data from folder or a file listing the path of observations, we want to check if the file we loaded are in accordance with our expectations This function dump a subsample of the dataset into hierarchical folder structure.

save_san_check_for_domain(sample_num, folder_name, d_dataset)[source]¶: saves a extraction of the dataset (d_dataset) into folder (folder_name) sample_num: int, number of images which are extracted from the dataset folder_name: string, destination for the saved images d_dataset: dataset

domainlab.utils.test_img module¶

domainlab.utils.test_img.mk_img(i_h, i_ch=3, batch_size=5)[source]¶

domainlab.utils.test_img.mk_rand_label_onehot(target_dim=10, batch_size=5)[source]¶

domainlab.utils.test_img.mk_rand_xyd(ims, y_dim, d_dim, batch_size)[source]¶

domainlab.utils.u_import module¶

domainlab.utils.u_import.import_path(path)[source]¶

domainlab.utils.u_import_net_module module¶

import external neural network implementation

domainlab.utils.u_import_net_module.build_external_obj_net_module_feat_extract(mpath, dim_y, remove_last_layer)[source]¶: The user provide a function to initiate an object of the neural network, which is fine for training but problematic for persistence of the trained model since it is created externally. :param mpath: path of external python file where the neural network architecture is defined :param dim_y: dimension of features

domainlab.utils.utils_class module¶

domainlab.utils.utils_class.store_args(method)[source]¶: Stores provided method args as instance attributes.

domainlab.utils.utils_classif module¶

domainlab.utils.utils_classif.get_label_na(tensor_ind, list_str_na)[source]¶: given list of label names in strings, map tensor of index to label names

domainlab.utils.utils_classif.logit2preds_vpic(logit)[source]¶

Logit:: batch of logit vector
Returns:: vector of one-hot, vector of probability, index, maximum probability

domainlab.utils.utils_classif.mk_dummy_label_list_str(prefix, dim)[source]¶: only used for testing, to generate list of class/domain label names

domainlab.utils.utils_cuda module¶

choose devices

domainlab.utils.utils_cuda.get_device(args)[source]¶: choose devices

domainlab.utils.utils_img_sav module¶

domainlab.utils.utils_img_sav.mk_fun_sav_img(path='.', nrow=8, folder_na='')[source]¶: create torchvision.utils image saver

domainlab.utils.utils_img_sav.sav_add_title(grid_img, path, title)[source]¶: add title and save image as matplotlib.pyplot

domainlab.utils package¶

Submodules¶

domainlab.utils.flows_gen_img_model module¶

domainlab.utils.generate_benchmark_plots module¶

domainlab.utils.get_git_tag module¶

domainlab.utils.hyperparameter_gridsearch module¶

domainlab.utils.hyperparameter_retrieval module¶

domainlab.utils.hyperparameter_sampling module¶

domainlab.utils.logger module¶

domainlab.utils.override_interface module¶

domainlab.utils.perf module¶

domainlab.utils.perf_metrics module¶

domainlab.utils.sanity_check module¶

domainlab.utils.test_img module¶

domainlab.utils.u_import module¶

domainlab.utils.u_import_net_module module¶

domainlab.utils.utils_class module¶

domainlab.utils.utils_classif module¶

domainlab.utils.utils_cuda module¶

domainlab.utils.utils_img_sav module¶

Module contents¶