Commit Graph

103 Commits

Author SHA1 Message Date
Guillaume Raffy a8203589b7 added example use of clusterbench_submit for running a production hibridon benchmark on IPR's alambix cluster
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-12-10 18:57:54 +01:00
Guillaume Raffy ccb964bb44 v0.0.16
- added the output_url autoparameter. It allows to automatically register the location of the benchmark output files in case they're needed and preserved. This will help traceability

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-27 14:10:07 +01:00
Guillaume Raffy b85f381241 v0.0.15
- clusterbench-submit now appends the cluster job number to the launcher. This will improve traceability.

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-27 13:50:38 +01:00
Guillaume Raffy 263474eb5a v0.0.14
- fixes to validate that `clusterbench-submit` work in a real situation (on alambix cluster):
  - fixed bug in hibridon benchmark where the requested ram was way too small
  - made sure the location of iprbench hardcoded virtual env path is writable
  - fixed bug in exception message
- updated the `README.md`

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-27 11:02:54 +01:00
Guillaume Raffy e5fb47ba64 - fixed typo HibrindoBenchmark
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-26 17:04:12 +01:00
Guillaume Raffy 8d674055bc v0.0.13
- minor change (renamed example becnchmark name 'hibench' as 'hibridon')

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-26 16:56:11 +01:00
Guillaume Raffy e9bbc95632 v0.0.12
- fixed bug that caused envmodules.py to misfunction in some situations: this version now copes with the fact that sometimes module outputs to stdout, and sometimes to stderr
- made test_ipcluster use the `tsv-files` resultsdb backend instead of the more difficulkt to use `sqlserver-viassh-database` because:
1.  `sqlserver-viassh-database` requires a complex setup (keys) that makes it more difficult to use
2. `sqlserver-viassh-database` is already tested in another test focused on it

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-26 16:37:10 +01:00
Guillaume Raffy 866f7de27d added even more debug traces to debug a problem that only happens in ci (can't reproduce even when logged as jenkins-agent)
The error message is:
> ValueError: failed to find compilers/ifort/latest amongst available environment modules (set())

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-26 15:33:04 +01:00
Guillaume Raffy 4b6b6ad056 added more debug traces to debug a problem that only happens in ci (can't reproduce even when logged as jenkins-agent)
The error message is:
> ValueError: failed to find compilers/ifort/latest amongst available environment modules (set())

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-26 15:26:51 +01:00
Guillaume Raffy 06c360fa11 added debug traces to debug a problem that only happens in ci (can't reproduce even when logged as jenkins-agent)
The error message is:
> ValueError: failed to find compilers/ifort/latest amongst available environment modules (set())

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-26 15:18:09 +01:00
Guillaume Raffy 60be658312 improved traces to debug a problem that only happens in ci (can't reproduce even when logged as jenkins-agent)
The error message is:
> ValueError: failed to find compilers/ifort/latest amongst available environment modules (set())

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-26 15:10:56 +01:00
Guillaume Raffy 95cb65ef35 improved error message to debug a problem that only happens in ci (can't reproduce even when logged as jenkins-agent)
The error message is:
> ValueError: failed to find compilers/ifort/latest amongst available environment modules

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-26 15:04:02 +01:00
Guillaume Raffy d1aba97073 made sure the module command is available when running /ci/test_iprcluster.py
this should fix the continuous integration stage('testing (tests that only work on ipr cluster nodes)')

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-26 14:43:34 +01:00
Guillaume Raffy c39e31a1fc fixed regression introduced in my last commit
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-26 13:42:54 +01:00
Guillaume Raffy 7d019f74dd v0.0.11
- made IprCluster target host work. This target host provides access to intel ifort and intel mkl packages, which are available as environment modules.
- added a unit test that validates the IprCluster (only works on ipr cluster node)
- also improved unit tests to use TMPDIR environment variable if present, in order to avoid permission issues in case the test is run by different users on the same system.
- made the benchmark display an error in case starbench fails

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-26 13:33:12 +01:00
Guillaume Raffy 2f2ebdcf35 fixed minor styling issue 2024-11-25 14:52:33 +01:00
Guillaume Raffy 4ca2e02381 v0.0.10
-  the user can now choose the host type id. This mechanism will allow the benchmarks to be run on ipr cluster nodes, taking advantage of the specific use of environment modules to discover and activate packages.
- note: at the moment, the implementation of host type fr.univ-rennes.ipr.cluster-node is not yet finished

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-25 14:52:08 +01:00
Guillaume Raffy 026c5f6100 v0.0.9
-  added output parameters to mamul1 and hibench: in addition to the average duration, now outputs the min, max, media and standard deviation

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-22 16:03:29 +01:00
Guillaume Raffy ea28ea0293 fixed broken commit 7976d67860 2024-11-22 16:00:18 +01:00
Guillaume Raffy e550ba408f fixed broken commit 7976d67860 2024-11-22 14:25:19 +01:00
Guillaume Raffy eab903a7ae reverted my last commit which was a mistake 2024-11-22 14:24:25 +01:00
Guillaume Raffy 1b30c7f4a2 fix to last commit 2024-11-22 14:13:29 +01:00
Guillaume Raffy b8bb991abd forgot this in my last commit
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-22 13:43:54 +01:00
Guillaume Raffy 7976d67860 v0.0.8
- added support for common parameters (parameters that are common to all benchmarks but which value have to be provided by the user)
- added the num_threads_per_run out parameter in hibench, in prevision to the addition of support for parallel scaling measurements in starbench
- added the autoparameter `user`

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-22 13:40:22 +01:00
Guillaume Raffy e3295bf3da forgot this in my last commit
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-21 17:12:29 +01:00
Guillaume Raffy 33133d6828 v0.0.7
- added blas_library parameter to hibench benchmark, thus allowing the user to choose the blas library to use

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-21 17:04:55 +01:00
Guillaume Raffy f7d1946eb7 v0.0.6 (addendum)
- added a the blas_library parameter to mamul1, for this:
  - added support for the `default-<packagetype>` keyword as package_id, which makes the parameter system to find the blas flavour of the default blas.
  - made the package default version retrieval more generic (replaces a gfortran specific code).

warning: these discovery mechanisms have only been implemented for debian hosts at the moment.

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-21 16:45:22 +01:00
Guillaume Raffy 9d648b4fdc v0.0.6
- added a the blas_library parameter to mamul1, for this:
  - added support for the `default-<packagetype>` keyword as package_id, which makes the parameter system to find the blas flavour of the default blas.
  - made the package default version retrieval more generic (replaces a gfortran specific code).

warning: these discovery mechanisms have only been implemented for debian hosts at the moment.

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-21 08:29:45 +01:00
Guillaume Raffy 7fd25890ec v0.0.5
- added support for package params. As a result it is now possible to choose the fortran compiler version (implemented in mamul1 and hibench).

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-19 21:37:55 +01:00
Guillaume Raffy a5b5345d39 v0.0.4
- improved hibench: this benchmark no longer requires a password for cloning hibridon, now (it's actually been the case for more than one year now) that the repository of hibridon is public.

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-19 00:09:32 +01:00
Guillaume Raffy 628ac7352a v0.0.3
- added support for all ipr cluster cpus

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-18 23:04:29 +01:00
Guillaume Raffy 862a3911f0 v0.0.2
- added continuous integration using ipr jenkins

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-18 16:42:12 +01:00
Guillaume Raffy 0dd0195d45 made tests pass when using univ-rennes vpn
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-17 11:33:12 +01:00
Guillaume Raffy 6dded4234a refactored tests:
- split tests into 3:
  - test_benchmarks.py: tests all benchmarks with the most basic resultsdb backend
  - test_resultsdb: tests all resultsdb backends with the most basic benchmark
  - test_clusterbench: tests clusterbench_submit
- made tests more robust (deletes the results foder if it already exists)

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-17 11:31:48 +01:00
Guillaume Raffy 3d8a0ff4ad added the following automatic parameters (parameters common to all benchmarks):
- host_id (host serial number)
- num_cpus
- iprbench_version

(took inspiration from [https://git.ipr.univ-rennes.fr/cellinfo/maco/src/branch/main/Libs/Benchmark.bash])

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-13 17:42:01 +01:00
Guillaume Raffy 59a5109ed9 the user can now output the benchmark results to a remote sql database accessed via ssh
note: as the production benchmarks are currently stored at IPR on a database server accessed via ssh, this allows iprbench to store its results to IPR benchmark database.

work rleated to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-13 14:57:49 +01:00
Guillaume Raffy 86985d3de4 the user can now output the benchmark results to a remote sql database accessed via ssh
note: as the production benchmarks are currently stored at IPR on a database server accessed via ssh, this allows iprbench to store its results to IPR benchmark database.

work rleated to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-13 14:56:45 +01:00
Guillaume Raffy 5337c41645 fixed stupid copy paste error that caused all the fields to have the same type
work rleated to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-13 14:47:49 +01:00
Guillaume Raffy ca1f4c1a88 added autoparams `host_fqdn` and `cpu_model`
these parameters are very much needed as they allow to identify the system on which the benchmark is run. Each performance benchmark highly depends on the host it's run on.

work rleated to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-08 09:12:51 +01:00
Guillaume Raffy 2ee1988721 the user can now choose to save the benchmark results into a sql database (sqlite at the moment)
work rleated to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-11-05 15:39:00 +01:00
Guillaume Raffy 2a91af37ff added the ability for the user to choose the resultsdb backend.
This is needed to use a different backend for tests and for production

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-10-25 19:19:28 +02:00
Guillaume Raffy 12cc0c0c8a added a mechanism to record benchmark results into a database.
- At the moment, the database backend used is a set of tsv files, but the system is flexible to accomodate other dabase backends (mariadb, sqlite, etc.).

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-10-24 18:51:24 +02:00
Guillaume Raffy 25d2e489d5 fixed bug with num_cores that caused test_clusterbench to fail on hibench
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-10-23 16:55:17 +02:00
Guillaume Raffy 98940824e4 Made benchmark `hibench` work
- converted --cmake-path argument in clusterbench-submit into a parameter for benchmark hibench, as it's specific to some benchmarks only

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-10-23 14:36:22 +02:00
Guillaume Raffy 6000e20d6b the user of `iprbench-submit` can now choose the cluster he wants to use (previously it was hardcoded to the test cluster `DummyCluster`)
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-10-23 11:18:07 +02:00
Guillaume Raffy fe4d66fb52 moved Singleton class to a better place
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958]
2024-10-23 10:55:37 +02:00
Guillaume Raffy 6f84732cf6 mamul1 benchmark now gets its source files from iprbench resources rather than a hardcoded directory (which was obviously not satisfactory since iprbench only worked if installed in a specfic location).
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958] and [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3372]
2024-10-22 17:32:35 +02:00
Guillaume Raffy 011d4eddf9 refactored iprbench to separate ipr benchmark framework from the actual benchmarks
This decoupling allows to write benchmarks as modules that can be used in various situations (from a benchmark job or directly from a user), but this design will allow automatic registering of the benchmark results in a user selectable form (sql database, stdout, etc.)

- separated `hibenchonphysix.py` into `clusterbench.py` (tool to run a benchmark on a cluster) and `hibench.py` (hibridon benchmark module) so that `clusterbench.py` no longer has a knowledge about hibridon.
- there are currently 2 ways to run a bechmark:
  1. as a simple run through `clusterbench-run` command (which will eventually be renamed as iprbench-run since it might be completely independent from the concept of cluster)
  2. as cluster jobs  through `clusterbench-submit` command
- added unit test
- added another benchmark `mamul1` that is used as a unittest because it has 2 benefits over `hibench` benchmark:
   1. it's standalone (no external resources needed)
   2. it's quicker to execute

note: this refactoring work is not complete yet, but the concept  proof is complete (the 2 unittests pass):
- still need to provide the user a way to switch between IpRCluster and DummyCluster(which is only intended to only be used for testing clusterbench))
- still need to run multiple configs of the same benchmark in one run (as hibenchonphysix did)

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958] and [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3372]
2024-10-22 09:16:41 +02:00
Guillaume Raffy fe4a07a67e refactored all iprbench code found in `usecases/ipr/hibench` into a `iprbench` python package
The main motivation for this is to allow the code executed by jobs to benefit from multiple packages (eg iprbench, [stargemm](https://github.com/g-raffy/starbench), cocluto) to perform common missing tasks such as registering the results output in the iprbench database.

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3958] and [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3372]
2024-10-10 18:06:09 +02:00
Guillaume Raffy fb6f926cb1 improvements to hibenchonphysix:
- although still hardcoded, made it easier to switch between alambix and physix clusters
- although still hardcoded, made it easier to switch to test mode (quick test)
- removed hardcoded value for all_hosts_groups as it is retreived from cluster node database

nb: changes made on 08/10/2024

work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3372]
2024-10-09 15:16:02 +02:00