- fixes to validate that `clusterbench-submit` work in a real situation (on alambix cluster):
- fixed bug in hibridon benchmark where the requested ram was way too small
- made sure the location of iprbench hardcoded virtual env path is writable
- fixed bug in exception message
- updated the `README.md`
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
- fixed bug that caused envmodules.py to misfunction in some situations: this version now copes with the fact that sometimes module outputs to stdout, and sometimes to stderr
- made test_ipcluster use the `tsv-files` resultsdb backend instead of the more difficulkt to use `sqlserver-viassh-database` because:
1. `sqlserver-viassh-database` requires a complex setup (keys) that makes it more difficult to use
2. `sqlserver-viassh-database` is already tested in another test focused on it
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
- made IprCluster target host work. This target host provides access to intel ifort and intel mkl packages, which are available as environment modules.
- added a unit test that validates the IprCluster (only works on ipr cluster node)
- also improved unit tests to use TMPDIR environment variable if present, in order to avoid permission issues in case the test is run by different users on the same system.
- made the benchmark display an error in case starbench fails
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
- the user can now choose the host type id. This mechanism will allow the benchmarks to be run on ipr cluster nodes, taking advantage of the specific use of environment modules to discover and activate packages.
- note: at the moment, the implementation of host type fr.univ-rennes.ipr.cluster-node is not yet finished
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
- added support for common parameters (parameters that are common to all benchmarks but which value have to be provided by the user)
- added the num_threads_per_run out parameter in hibench, in prevision to the addition of support for parallel scaling measurements in starbench
- added the autoparameter `user`
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
- added a the blas_library parameter to mamul1, for this:
- added support for the `default-<packagetype>` keyword as package_id, which makes the parameter system to find the blas flavour of the default blas.
- made the package default version retrieval more generic (replaces a gfortran specific code).
warning: these discovery mechanisms have only been implemented for debian hosts at the moment.
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
- added a the blas_library parameter to mamul1, for this:
- added support for the `default-<packagetype>` keyword as package_id, which makes the parameter system to find the blas flavour of the default blas.
- made the package default version retrieval more generic (replaces a gfortran specific code).
warning: these discovery mechanisms have only been implemented for debian hosts at the moment.
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
- improved hibench: this benchmark no longer requires a password for cloning hibridon, now (it's actually been the case for more than one year now) that the repository of hibridon is public.
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
- split tests into 3:
- test_benchmarks.py: tests all benchmarks with the most basic resultsdb backend
- test_resultsdb: tests all resultsdb backends with the most basic benchmark
- test_clusterbench: tests clusterbench_submit
- made tests more robust (deletes the results foder if it already exists)
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
note: as the production benchmarks are currently stored at IPR on a database server accessed via ssh, this allows iprbench to store its results to IPR benchmark database.
work rleated to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
note: as the production benchmarks are currently stored at IPR on a database server accessed via ssh, this allows iprbench to store its results to IPR benchmark database.
work rleated to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
these parameters are very much needed as they allow to identify the system on which the benchmark is run. Each performance benchmark highly depends on the host it's run on.
work rleated to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
- At the moment, the database backend used is a set of tsv files, but the system is flexible to accomodate other dabase backends (mariadb, sqlite, etc.).
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
This decoupling allows to write benchmarks as modules that can be used in various situations (from a benchmark job or directly from a user), but this design will allow automatic registering of the benchmark results in a user selectable form (sql database, stdout, etc.)
- separated `hibenchonphysix.py` into `clusterbench.py` (tool to run a benchmark on a cluster) and `hibench.py` (hibridon benchmark module) so that `clusterbench.py` no longer has a knowledge about hibridon.
- there are currently 2 ways to run a bechmark:
1. as a simple run through `clusterbench-run` command (which will eventually be renamed as iprbench-run since it might be completely independent from the concept of cluster)
2. as cluster jobs through `clusterbench-submit` command
- added unit test
- added another benchmark `mamul1` that is used as a unittest because it has 2 benefits over `hibench` benchmark:
1. it's standalone (no external resources needed)
2. it's quicker to execute
note: this refactoring work is not complete yet, but the concept proof is complete (the 2 unittests pass):
- still need to provide the user a way to switch between IpRCluster and DummyCluster(which is only intended to only be used for testing clusterbench))
- still need to run multiple configs of the same benchmark in one run (as hibenchonphysix did)
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958] and [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3372]
- although still hardcoded, made it easier to switch between alambix and physix clusters
- although still hardcoded, made it easier to switch to test mode (quick test)
- removed hardcoded value for all_hosts_groups as it is retreived from cluster node database
nb: changes made on 08/10/2024
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3372]