quman/README.md

1.6 KiB

queue manager

compute cluster queue manager: a wrapper for Son of Grid Engine's qmod application

The goals of this tool are:

  • to allow the job scheduler manager to provide a reason that explains the reason when disabling the queue
  • to provide a reference counting mechanism that allows to disable the same queue for multiple reasons. The queue becomes enabled only when all disabling reasons have been removed. For example, a queue can be disabled because both following reasons simultaneously:
    1. there is an undergoing automatic update
    2. the sys admin decides to disable the queue to change a faulty ram then when the automatic update system completes, it would ask for a queue enabling, but because the queue is still disabled for reason 2, the queue is not actually enabled
  • to provide an agnostic abstraction layer able to interface any job manager (sge, slurm, etc.)

As a result, this tool can be used to provide information regrading the reason why a queue is disabled, but it also helps the sys admin to remember why a queue was disabled.

example

bob@bobland~> quman --get-disable-reasons main.q@alambix42.ipr.univ-rennes.fr
bob@bobland~> quman --disable-queue main.q@alambix42.ipr.univ-rennes.fr --message 'requires maintenance for ram replacement'
maco@alambix42~> quman --disable-queue main.q@alambix42.ipr.univ-rennes.fr --message 'requires a security update'
bob@bobland~> quman --get-disable-reasons main.q@alambix42.ipr.univ-rennes.fr
2024-03-13 17:54:18 bob@bobland    requires maintenance for ram replacement
2024-03-14 08:42:23 maco@alambix42 requires a security update