compute cluster queue manager: a wrapper for [Son of Grid Engine](https://wiki.archlinux.org/title/Son_of_Grid_Engine)'s qmod application
Go to file
Guillaume Raffy e4a7f22f4a managed to get a functional toml project 2024-03-29 08:05:14 +01:00
samples managed to get a functional toml project 2024-03-29 08:05:14 +01:00
src/quman managed to get a functional toml project 2024-03-29 08:05:14 +01:00
.gitignore managed to get a functional toml project 2024-03-29 08:05:14 +01:00
README.md managed to get a functional toml project 2024-03-29 08:05:14 +01:00
pyproject.toml managed to get a functional toml project 2024-03-29 08:05:14 +01:00

README.md

queue manager

compute cluster queue manager: a wrapper for Son of Grid Engine's qmod application

The goals of this tool are:

  • to allow the job scheduler manager to provide a reason that explains the reason when disabling the queue
  • to provide a reference counting mechanism that allows to disable the same queue for multiple reasons. The queue becomes enabled only when all disabling reasons have been removed. For example, a queue can be disabled because both following reasons simultaneously:
    1. there is an undergoing automatic update
    2. the sys admin decides to disable the queue to change a faulty ram then when the automatic update system completes, it would ask for a queue enabling, but because the queue is still disabled for reason 2, the queue is not actually enabled
  • to provide an agnostic abstraction layer able to interface any job manager (sge, slurm, etc.)

As a result, this tool can be used to provide information regrading the reason why a queue is disabled, but it also helps the sys admin to remember why a queue was disabled.

example

bob@bobland~> quman --get-disable-reasons main.q@alambix42.ipr.univ-rennes.fr
bob@bobland~> quman --disable-queue main.q@alambix42.ipr.univ-rennes.fr --message 'requires maintenance for ram replacement'
maco@alambix42~> quman --disable-queue main.q@alambix42.ipr.univ-rennes.fr --message 'requires a security update'
bob@bobland~> quman --get-disable-reasons main.q@alambix42.ipr.univ-rennes.fr
2024-03-13 17:54:18 bob@bobland    requires maintenance for ram replacement
2024-03-14 08:42:23 maco@alambix42 requires a security update