compute cluster queue manager: a wrapper for [Son of Grid Engine](https://wiki.archlinux.org/title/Son_of_Grid_Engine)'s
qmod application
|
|
||
|---|---|---|
| samples | ||
| src/quman | ||
| .gitignore | ||
| README.md | ||
| pyproject.toml | ||
README.md
queue manager
compute cluster queue manager: a wrapper for Son of Grid Engine's qmod application
The goals of this tool are:
- to allow the job scheduler manager to provide a reason that explains the reason when disabling the queue
- to provide a reference counting mechanism that allows to disable the same queue for multiple reasons. The queue becomes enabled only when all disabling reasons have been removed. For example, a queue can be disabled because both following reasons simultaneously:
- there is an undergoing automatic update
- the sys admin decides to disable the queue to change a faulty ram then when the automatic update system completes, it would ask for a queue enabling, but because the queue is still disabled for reason 2, the queue is not actually enabled
- to provide an agnostic abstraction layer able to interface any job manager (sge, slurm, etc.)
As a result, this tool can be used to provide information regrading the reason why a queue is disabled, but it also helps the sys admin to remember why a queue was disabled.
example
bob@bobland~> quman --get-disable-reasons main.q@alambix42.ipr.univ-rennes.fr
bob@bobland~> quman --disable-queue main.q@alambix42.ipr.univ-rennes.fr --message 'requires maintenance for ram replacement'
maco@alambix42~> quman --disable-queue main.q@alambix42.ipr.univ-rennes.fr --message 'requires a security update'
bob@bobland~> quman --get-disable-reasons main.q@alambix42.ipr.univ-rennes.fr
2024-03-13 17:54:18 bob@bobland requires maintenance for ram replacement
2024-03-14 08:42:23 maco@alambix42 requires a security update