Benchmarking with DomainLab

Documentation for Benchmark in Markdown

The package offers the ability to benchmark different user-defined experiments against each other, as well as against different hyperparameter settings and random seeds. The results are collected in a csv file, but also prepared in charts.

Within each benchmark, two aspects are considered:

  1. Stochastic variation: variation of the performance with respect to different random seeds.

  2. Sensitivity to selected hyperparameters: by sampling hyperparameters randomly, the performance with respect to different hyperparameter choices is investigated.

Dependencies installation

DomainLab relies on Snakemake for its benchmark functionality.

Unix installation

snakemake depends on pulp due to an upgrde of pulp, snakemake becomes unstable, so we recommed install the following version.

pip install snakemake==7.32.0
pip install pulp==2.7.0

Windows installation details

Benchmarking is currently not tested on Windows due to the dependency on Snakemake and datrie One could, however, try install minimal Snakemake via mamba create -c bioconda -c conda-forge -n snakemake snakemake-minimal to see if the following function still works

Setting up a benchmark

The benchmark is configured in a yaml file. We refer to doc_benchmark_yaml.md for a documented example.

Running a benchmark

For the execution of a benchmark we provide two scripts in our repository:

  • running the benchmark on a standalone machine (computation node): run_benchmark_standalone.sh

  • launching the benchmark on the login node of a slurm cluster (benchmark will be dispatched to computation node via DomainLab scripts): run_benchmark_slurm.sh

Benchmark on a standalone machine/computation node (with or without GPU)

To run the benchmark with a specific configuration on a standalone machine, inside the DomainLab folder, one can execute (we assume you have a machine with 4 cores or more)

# Note: this has only been tested on Linux based systems and may not work on Windows
./run_benchmark_standalone.sh ./examples/benchmark/demo_benchmark.yaml 0  0  2

where the first argument is the benchmark configuration file (mandatory), the second and the third arguments are the starting seeds for cuda and the hyperparameter sampling (both optional) and the fourth argument is the number of GPUs to use (optional). The number of GPUs defaults to one (if your machine does not have GPU, the last argument defaults to one as well and CPU is used).

In case of snakemake error, try rm -r .snakemake/

Benchmark on a HPC cluster with slurm

If you have access to an HPC cluster with slurm support: In a submission node, clone the DomainLab repository, cd into the repository and execute the following command:

Make sure to use tool like nohup or tmux to keep the following command active!

It is a good idea to use standalone script to test if the yaml file work or not before submit to slurm cluster.

# Note: this has only been tested on Linux based systems and may not work on Windows
./run_benchmark_slurm.sh ./examples/benchmark/demo_benchmark.yaml

Similar to the local version explained above, the user can also specify a random seed for hyperparameter sampling and pytorch.

Check errors for slurm runs

The following script will help to find out which job has failed and the error message, so that you could direct to the specific log file

bash ./sh_list_error.sh ./zoutput/benchmarks/[output folder of the sepcifed benchmark in the yaml file]/slurm_logs

Map between slurm job id and sampled hyperparameter index

suppose the slurm job id is 14144163, one could the corresponding log file in ./zoutput/[output folder of the sepcifed benchmark in the yaml file]/slurm_logs folder via find . | grep -i "14144163"

the results can be run_experiment-index=41-14144163.err where 41 is the hyperparameter index in zoutput/benchmarks/[name of the benchmark]/hyperparameters.csv.

Obtained results

All files created by this benchmark are saved in the given output directory (by default ./zoutput/benchmarks/[name of the benchmark defined in YAML file]). The sampled hyperparameters can be found in hyperparameters.csv. The yaml file is translated to config.txt with corresponding to commit in formation in commit.txt (do not update code during benchmark process so results can be reproducible with this commit information), corresponding to each line in hyperparameters.csv, there will be a csv file in directory rule_results.

Output folder structure

via tree -L 2 in zoutput/benchmarks/[name of the benchmark defined in configuration yaml file], one can get something like below

├── commit.txt
├── config.txt
├── [slurm_logs/]
├── graphics
│   ├── diva_fbopt_full
│   ├── radar_dist.png
│   ├── radar.png
│   ├── scatterpl
│   ├── sp_matrix_dist.png
│   ├── sp_matrix_dist_reg.png
│   ├── sp_matrix.png
│   ├── sp_matrix_reg.png
│   └── variational_plots
├── hyperparameters.csv
├── results.csv
└── rule_results
    ├── 0.csv
    ├── 1.csv
    ├── 2.csv
    ├── 3.csv
    ├── 4.csv
    ├── 5.csv
    ├── 6.csv
    └── 7.csv

where commit.txt contains commit information for reproducibility, config.txt is a json format of the configuration yaml file for reproducibility, graphics folder contains the visualization of benchmark results in various plots, specificly, we use graphics/variational_plot/acc/stochastic_variation.png, hyperparameters.csv contains all hyperparameters used for each method, results.csv is an aggregation of the csv files in rule_results, where the i.csv correspond to the parameter index in hyperparameters.csv

Please do not change anything in folder rule_results !

The performance of the different runs from directory rule_results will be aggregated after all jobs have been done, which can be found aggregated in results.csv, Moreover, there is the graphics subdirectory, in which the values from results.csv are visualized for interpretation.

In case that the benchmark is not entirely completed, the user can obtain partial results as explained below.

Obtain partial results

If the benchmark is not yet completed (still running or has some failed jobs, e.g. BrokenPipe Error due to multiprocessing in PIL image reading), the results.csv file containing the aggregated results will not be created. The user can then obtain the aggregated partial results with plots from the partially completed benchmark by running the following after cd into the DomainLab directory:

python main_out.py --agg_partial_bm OUTPUT_DIR

where OUTPUT_DIR specifying the benchmark output directory containing the partially completed benchmark, e.g. ./zoutput/benchmarks/demo_benchmark, where demo_benchmark is a name defined in the benchmark yaml file.

Alternatively, one could use

cat ./zoutput/benchmarks/[name of the benchmark]/rule_results/*.csv > result.csv

clean up the extra csv head generated and plot the csv using command below

Generate plots from .csv file

If the benchmark is not completed, the graphics subdirectory might not be created. The user can then manually create the graphics from the csv file of the aggregated partial results, which can be obtained as explained above. Here for, the user must cd into the DomainLab directory and run

python main_out.py --gen_plots CSV_FILE --outp_dir OUTPUT_DIR

where CSV_FILE specifies the path of the csv file of the aggregated results (e.g. ./zoutput/benchmarks/demo_benchmark/results.csv) and OUTPUT_DIR specifies the output directory of the partially completed benchmark (e.g. ./zoutput/benchmarks/demo_benchmark). Note that the cvs file must have the same form as the one generated by the fully executed benchmark, e.g.

param_index

method

algo

epos

te_d

seed

params

acc

precision

recall

specificity

f1

auroc

0

{‘param1’: p1, …}

1

{‘param1’: p2, …}