Guillaume Raffy
263474eb5a
- fixes to validate that `clusterbench-submit` work in a real situation (on alambix cluster): - fixed bug in hibridon benchmark where the requested ram was way too small - made sure the location of iprbench hardcoded virtual env path is writable - fixed bug in exception message - updated the `README.md` work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958] |
||
---|---|---|
ci | ||
iprbench | ||
test | ||
.gitignore | ||
README.md | ||
pyproject.toml | ||
setup.py |
README.md
iprbench
install iprbench
graffy@alambix-frontal:/opt/ipr/cluster/work.local/graffy/bug3372$ python3 -m venv iprbench.venv
graffy@alambix-frontal:/opt/ipr/cluster/work.local/graffy/bug3372$ source ./iprbench.venv/bin/activate
(iprbench.venv) graffy@alambix-frontal:/opt/ipr/cluster/work.local/graffy/bug3372$ pip install ./iprbench.git
run unit tests
20241018-16:56:05 graffy@graffy-ws2:~/work/starbench/iprbench.git$ python3 -m unittest
launch a benchmark on the current system
Here's a simple example to run the benchmark mamul1
on the current system:
(iprbench.venv) graffy@alambix50:/opt/ipr/cluster/work.local/graffy/bug3958/iprbench.git$ iprbench-run --benchmark-id 'mamul1' --config '{"fortran_compiler": "gfortran:<default>", "blas_library": "<default-libblas>:<default>", "matrix_size": 1024, "num_loops":10, "num_cores":2, "launcher": "manual"}' --results-dir /tmp/mamul1_out --target-system-type-id 'debian' --resultsdb-params '{"type": "tsv-files", "tsv_results_dir": "/tmp/mamul1_out/tsv"}'
DEBUG:root:extracting package iprbench.resources.mamul1 resource CMakeLists.txt to /tmp/mamul1_out/mamul1
DEBUG:root:extracting package iprbench.resources.mamul1 resource mamul1.F90 to /tmp/mamul1_out/mamul1
DEBUG:root:shell_command = "starbench --source-tree-provider '{"type": "existing-dir", "dir-path": "/tmp/mamul1_out/mamul1"}' --num-cores 2 --output-dir=/tmp/mamul1_out/output --cmake-path=/usr/bin/cmake --cmake-option=-DCMAKE_BUILD_TYPE=Release --cmake-option=-DCMAKE_Fortran_COMPILER=gfortran --cmake-option=-DBLA_VENDOR=OpenBLAS --benchmark-command='./mamul1 1024 10' --output-measurements=/tmp/mamul1_out/output/measurements.tsv"
creating build directory /tmp/mamul1_out/output/worker<worker_id>
executing the following command in parallel (2 parallel runs) : '['mkdir', '-p', '/tmp/mamul1_out/output/worker<worker_id>/build']'
mean duration : 0.002 s (2 runs)
configuring /tmp/mamul1_out/mamul1 into /tmp/mamul1_out/output/worker<worker_id>/build ...
executing the following command in parallel (2 parallel runs) : '['/usr/bin/cmake', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_Fortran_COMPILER=gfortran', '-DBLA_VENDOR=OpenBLAS', '/tmp/mamul1_out/mamul1']'
mean duration : 0.057 s (2 runs)
building /tmp/mamul1_out/output/worker<worker_id>/build ...
executing the following command in parallel (2 parallel runs) : '['make']'
mean duration : 0.368 s (2 runs)
benchmarking /tmp/mamul1_out/output/worker<worker_id>/build ...
executing the following command in parallel (2 parallel runs) : '['./mamul1', '1024', '10']'
mean duration : 1.933 s (2 runs)
duration : 1.933 s
DEBUG:root:output_measurements_file_path = /tmp/mamul1_out/output/measurements.tsv
DEBUG:root:row = Unnamed: 0 0.000000
run_id 0.000000
duration 1.932536
Name: 0, dtype: float64
DEBUG:root:row = Unnamed: 0 1.000000
run_id 1.000000
duration 1.933324
Name: 1, dtype: float64
total number of cores (including virtual cores) on this host : 24
DEBUG:root:table_file_path=/tmp/mamul1_out/tsv/mamul1.tsv
measurement_time ipr_bench_version host_id ... duration_stddev duration_min duration_max
0 2024-11-27 10:51:02.551947 0.0.13 <unknown> ... 0.000557 1.932536 1.933324
[1 rows x 18 columns]
Now that the run has completed succesfully, the results can be found in the folder /tmp/mamul1_out/tsv
that we chose:
(iprbench.venv) graffy@alambix50:/opt/ipr/cluster/work.local/graffy/bug3958/iprbench.git$ cat /tmp/mamul1_out/tsv/mamul1.tsv
measurement_time ipr_bench_version host_id host_fqdn user num_cpus cpu_model launcher fortran_compiler blas_library num_cores matrix_size num_loops duration_avg duration_med duration_stddev duration_min duration_max
2024-11-27 10:51:02.551947 0.0.13 <unknown> alambix50.ipr.univ-rennes.fr graffy 2 intel_xeon_x5650 manual gfortran:12.2.0 libopenblas-pthread:0.3.21 2 1024 10 1.93293 1.93293 0.0005572001435750071 1.932536 1.933324
launch benchmark jobs on alambix cluster
The following example command submits jobs (one job per cpu architecture) that execute the benchmark hibridon
on the cluster alambix
(only for the architecture intel_xeon_x5650
in this example).
(iprbench.venv) graffy@alambix50:/opt/ipr/cluster/work.local/graffy/bug3958/iprbench.git$ results_dir="$GLOBAL_WORK_DIR/graffy/iprbenchs/test_results/clusterbench_submit/$(date --iso-=seconds)"; clusterbench-submit --cluster-id 'alambix' --arch-regexp "intel_xeon_x5650.*" --benchmark-id 'hibridon' --config '{"fortran_compiler": "ifort:<default>", "blas_library": "intelmkl:<default>", "test_id": "arch4_quick", "hibridon_version": "a3bed1c3ccfbca572003020d3e3d3b1ff3934fad", "cmake_path": "cmake", "num_cores": 1, "launcher": "graffy.manual"}' --results-dir "${results_dir}" --resultsdb-params '{"type": "tsv-files", "tsv_results_dir": "'"$results_dir"'"}' --target-system-type-id "fr.univ-rennes.ipr.cluster-node"
INFO:root:available host groups: dict_keys(['intel_core_i5_8350u', 'intel_xeon_x5550', 'intel_xeon_x5650', 'intel_xeon_e5-2660', 'intel_xeon_e5-2660v2', 'intel_xeon_e5-2660v4', 'intel_xeon_gold_6140', 'intel_xeon_gold_6154', 'intel_xeon_gold_5220', 'intel_xeon_gold_6226r', 'intel_xeon_gold_6248r', 'intel_xeon_gold_6348', 'amd_epyc_7282', 'amd_epyc_7452'])
INFO:root:requested host groups: ['intel_xeon_x5650']
DEBUG:root:iprbench_venv_hardcoded_path = /tmp/user/59825/iprbench.venv
INFO:root:creating /opt/ipr/cluster/work.global/graffy/iprbenchs/test_results/clusterbench_submit/2024-11-26T18:15:14+01:00/iprbench.venv.tgz (the virtual environment that will be used in this bench by all its jobs at some point)
Collecting virtualenv-clone
Using cached virtualenv_clone-0.5.7-py3-none-any.whl (6.6 kB)
Installing collected packages: virtualenv-clone
Successfully installed virtualenv-clone-0.5.7
DEBUG:root:type of resultsdb_params = <class 'dict'>
DEBUG:root:resultsdb_params = {'type': 'tsv-files', 'tsv_results_dir': '/opt/ipr/cluster/work.global/graffy/iprbenchs/test_results/clusterbench_submit/2024-11-26T18:15:14+01:00'}
DEBUG:root:resultsdb_params = {"type": "tsv-files", "tsv_results_dir": "/opt/ipr/cluster/work.global/graffy/iprbenchs/test_results/clusterbench_submit/2024-11-26T18:15:14+01:00"}
DEBUG:root:tags_dict = {'<benchmark_id>': 'hibridon', '<starbench_job_path>': '/opt/ipr/cluster/work.global/graffy/iprbenchs/test_results/clusterbench_submit/2024-11-26T18:15:14+01:00/intel_xeon_x5650/starbench.job', '<iprbench_venv_hardcoded_path>': '/tmp/user/59825/iprbench.venv', '<iprbench_venv_archive_path>': '/opt/ipr/cluster/work.global/graffy/iprbenchs/test_results/clusterbench_submit/2024-11-26T18:15:14+01:00/iprbench.venv.tgz', '<benchmark_config>': '{\\"fortran_compiler\\": \\"ifort:<default>\\", \\"blas_library\\": \\"intelmkl:<default>\\", \\"test_id\\": \\"arch4_quick\\", \\"hibridon_version\\": \\"a3bed1c3ccfbca572003020d3e3d3b1ff3934fad\\", \\"cmake_path\\": \\"cmake\\", \\"num_cores\\": 12, \\"launcher\\": \\"graffy.manual\\"}', '<results_dir>': '/opt/ipr/cluster/work.global/graffy/iprbenchs/test_results/clusterbench_submit/2024-11-26T18:15:14+01:00', '<resultsdb_params>': '{\\"type\\": \\"tsv-files\\", \\"tsv_results_dir\\": \\"/opt/ipr/cluster/work.global/graffy/iprbenchs/test_results/clusterbench_submit/2024-11-26T18:15:14+01:00\\"}', '<num_cores>': '12', '<target_system_type_id>': 'fr.univ-rennes.ipr.cluster-node'}
DEBUG:root:ram_per_core = 1.073741824G
DEBUG:root:qsub_args = ['-pe', 'smp', '12', '-l', '"hostname=alambix50.ipr.univ-rennes.fr"', '-S', '/bin/bash', '-cwd', '-m', 'ae', '-l', 'mem_available=1.073741824G', '-j', 'y', '-N', 'hibridon_intel_xeon_x5650']
DEBUG:root:qsub_command = qsub -pe smp 12 -l "hostname=alambix50.ipr.univ-rennes.fr" -S /bin/bash -cwd -m ae -l mem_available=1.073741824G -j y -N hibridon_intel_xeon_x5650 /opt/ipr/cluster/work.global/graffy/iprbenchs/test_results/clusterbench_submit/2024-11-26T18:15:14+01:00/intel_xeon_x5650/starbench.job , working_dir=/opt/ipr/cluster/work.global/graffy/iprbenchs/test_results/clusterbench_submit/2024-11-26T18:15:14+01:00/intel_xeon_x5650
Your job 18879 ("hibridon_intel_xeon_x5650") has been submitted
The following command shows that the job is running
(iprbench.venv) graffy@alambix50:/opt/ipr/cluster/work.local/graffy/bug3958/iprbench.git$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
18879 0.65000 hibridon_i graffy r 11/26/2024 18:15:32 short.q@alambix50.ipr.univ-ren 12
the configuration of the benchmark (--config
) is defined to run the test arch4_quick
using the latest versions of ifort and mkl:
{
"fortran_compiler": "ifort:<default>",
"blas_library": "intelmkl:<default>",
"test_id": "arch4_quick",
"hibridon_version": "a3bed1c3ccfbca572003020d3e3d3b1ff3934fad",
"cmake_path": "cmake",
"num_cores": 1,
"launcher": "graffy.manual"
}
This will cause the benchmark to use the latest versions on ifort and mkl available on the cluster node that run the benchmark.
note: the value given to num_cores
is not important as clusterbench-submit
overwrites it with the number of cores of the cluster node that runs the benchmark.
the results database backend used in the benchmark (--resultsdb-params
) is:
{
"type": "tsv-files",
"tsv_results_dir": "/opt/ipr/cluster/work.global/graffy/iprbenchs/test_results/clusterbench_submit/2024-11-26T18:15:14+01:00"
}
This means that we want to register the results of the benchmark in the tsv (tab separated values) file /opt/ipr/cluster/work.global/graffy/iprbenchs/test_results/clusterbench_submit/2024-11-26T18:15:14+01:00/hibridon.tsv
. Please note that this result database backend is not really appropriate for clusterbench-submit
, as it suffers from racing conditions (sqlserver-viassh-database
would be a better alternative, but it requires a more complicate setup).
When the jobs successfully complete, they put their results of the benchmark in $results_dir
(eg /opt/ipr/cluster/work.global/graffy/iprbenchs/test_results/clusterbench_submit/2024-11-26T18:15:14+01:00
)
(iprbench.venv) graffy@alambix50:/opt/ipr/cluster/work.local/graffy/bug3958/iprbench.git$ cat $results_dir/hibridon.tsv
measurement_time ipr_bench_version host_id host_fqdn user num_cpus cpu_model launcher num_cores hibridon_version fortran_compilerblas_library test_id cmake_path duration_avg duration_med duration_stddev duration_min duration_max num_threads_per_run
2024-11-26 18:18:18.391137 0.0.13 <unknown> alambix50.ipr.univ-rennes.fr graffy 2 intel_xeon_x5650 graffy.manual 12 a3bed1c3ccfbca572003020d3e3d3b1ff3934fad ifort:2021.13.1 intelmkl:2024.2.1 arch4_quick cmake 3.7509884166666665 3.68795 0.1596150383672931 3.533953 4.036977 1
graph the results of benchmarks
showresults
is a command line tool that graphs the results after they've been downloaded from the results directory (for example /opt/ipr/cluster/work.global/graffy/hibridon/benchmarks/starbench/hibridon/2024-10-10T12:11:44+02:00
) to the hardcoded (at the moment) path /home/graffy/work/starbench/starbench.git/usecases/ipr/hibridon/results
20241010-16:30:54 graffy@graffy-ws2:~/work/starbench/iprbench.git$ showresults