Commit Graph

104 Commits

Author SHA1 Message Date
Guillaume Raffy f36b2d9d9c cocluto v1.02 : adapted code to the recent addition of the guaranteed_util field in the machines table
fixes [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3792]
2024-03-13 15:50:47 +01:00
Guillaume Raffy 1ddf7270d4 fixed bug that caused the dependency pygraphviz to not be installed while being needed
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3790]
2024-03-13 14:26:41 +01:00
Guillaume Raffy 5e402a2e45 added type hinting to PowerDiagram
work related to [https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3790]
2024-03-12 18:15:04 +01:00
Guillaume Raffy e58f06e14a added missing requirement (pygraphviz is used by cocluto.PowerDiagram)
work related to https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3790
2024-03-12 18:14:07 +01:00
Guillaume Raffy 4853446fa0 fixed code incompatible with python3 in powerdiagram 2024-03-12 18:12:36 +01:00
Guillaume Raffy 891182587d fixed bug: made parseQstatOutput handle domains other than ipr.univ-rennes1.fr properly
before this fix, `parseQstatOutput` created a list of machines with a hardcoded ipr.univ-rennes1.fr domain, resulting in potentatially wrong fqdn

work related to https://bugzilla.ipr.univ-rennes.fr/show_bug.cgi?id=3693
2023-11-17 18:28:20 +01:00
Guillaume Raffy 5bf1cbe664 now alambix is also considered as a cluster 2023-11-14 18:39:05 +01:00
Guillaume Raffy 88fba0fdb0 fixed regression from commit 270304f58e
- this bug was found when working on Bug 3575 - la page physix status renvoit internal server error
- in order to ease the detection of such regressions, added a unittest that covers this code
2023-06-08 11:18:07 +02:00
Guillaume Raffy 14a58dcd0a fixed a missing import error (time) that was detected by one of the applications that used cocluto
in order to detect these kind of errors easily, I've added a unit test system in cocluto
2023-06-07 18:23:46 +02:00
Guillaume Raffy 270304f58e fixed pylint errors and cleaned up
work related to Bug 3315 - make simpaweb django app a packageable application
2023-05-23 17:27:12 +02:00
Guillaume Raffy 7a5d32dec0 fixed bug that caused TypeError: '<=' not supported between instances of 'NoneType' and 'int'
work related to Bug 3315 - make simpaweb django app a packageable application
2023-05-23 12:10:29 +02:00
Guillaume Raffy 654e449007 added logging mechanism to ease debugging
work related to Bug 3315 - make simpaweb django app a packageable application
2023-05-23 11:43:29 +02:00
Guillaume Raffy c387b9dce2 fixed warning 'RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used'
work related to Bug 3315 - make simpaweb django app a packageable application
2023-05-23 11:35:15 +02:00
Guillaume Raffy b5c2c886f1 migrated code using dict that was no longer compatible with python3
iteritems() was removed in python3, see https://wiki.python.org/moin/Python3.0#Built-In_Changes

work related to Bug 3315 - make simpaweb django app a packageable application
2023-05-23 11:29:59 +02:00
Guillaume Raffy 66fb58148d made SqlFile cope with older versions of python that don't like Path type as arguments to open 2022-02-16 18:57:39 +01:00
Guillaume Raffy 57fa689015 fixed bug of non closed database file 2022-02-16 17:47:01 +01:00
Guillaume Raffy 45f0a397a3 removed stdout pollution 2022-02-16 17:45:46 +01:00
Guillaume Raffy 428651fa8a fix bug that cause SvgFigureHandler to fail if the target diractory doesn't already exist 2022-02-16 16:26:09 +01:00
Guillaume Raffy 7ba561eb98 fixed bug that caused programs using cluster_stats to fail if the directory /tmp/cluster_stats is missing 2022-02-16 16:13:36 +01:00
Guillaume Raffy 41ef0f3616 minor improvements made while making cluster status page work with dummy cluster data 2022-01-31 21:59:47 +01:00
Guillaume Raffy a65c2fbc3f made cocluto work with simpaweb running on flask 2022-01-28 23:39:39 +01:00
Guillaume Raffy c7fa99064f made cocluto an installable package 2022-01-28 20:29:29 +01:00
Guillaume Raffy a0a7ebe2d8 changed directory layour to comply with https://python-packaging.readthedocs.io/en/latest/minimal.html 2022-01-28 18:14:09 +01:00
Guillaume Raffy ad8f6f383b Bug 3165 - la page ClusterEvolution ne fonctionne plus (failed to find in table orderings a value for ordering_date where ordering_id is ietr.order20210101)
- fixed bug : it was caused by the fact that the database has no information about ietr machines order. Fixed this by ignoring ietr machines because they're only part of ipr cluster temporarily and as such, they shouldn't appear in the stats.

also some improvements:
- improved the colors of the owners of the machines so that tey're visually close for the same organism (eg ipr owners have close colors). This change was made in june 2020 I think (for the questionnaire about it needs related to eskemmdata)
- added a graph to show the age of cores
- prevented overlapping of dates in the graph showing the evolution of gflops price
2021-06-07 11:42:03 +00:00
Guillaume Raffy 939ecda69d Bug 3098 - mettre à jour les graphiques montrant l'évolution du cluster
- improved readability by adding a color spacing between labs to help to easily identify all wners of the same group
2021-02-24 09:13:12 +00:00
Guillaume Raffy 08ac2506ae Bug 3098 - mettre à jour les graphiques montrant l'évolution du cluster
- improved readability by ordering owners; as a result, owners of the same lab are no longer separated
2021-02-23 15:57:54 +00:00
Guillaume Raffy faf6cd5d2a Bug 3098 - mettre à jour les graphiques montrant l'évolution du cluster
- improved the handling of colors. As a result, the number of owners is no longer restricted, as the color palette is now no longer restricted in size
- improved readability by putting the legend on the upper left part of the graphs, where it no longer hides the upper right part of the graph
- fixed clutter of years on the x axis which made it difficult to read the years. Now, there is a limit in the number of years displayed on the x axis
- improved the legend showng the owners, which now displays the lab in addition to the department
2021-02-23 15:40:01 +00:00
Guillaume Raffy 14e3e12571 fixed pep8 styling problems 2021-02-23 13:01:52 +00:00
Guillaume Raffy 78d7e02285 Bug 3072 - la consommation n'est pas assez précise dans le diagramme de câblage des serveurs : now the database contains the measured power consumption of machines and the power diagram uses it instead of the power consumption of a chassis, which is still used for an estimate in case the measured one is not available 2021-01-22 15:21:29 +00:00
Guillaume Raffy 82816e5193 just fixed pylint warnings and errors 2021-01-21 16:14:29 +00:00
Guillaume Raffy 160bc22c3d installed physix92 (bug 2997) and physix93-94 (bug 3000) 2020-10-30 18:15:20 +00:00
Guillaume Raffy 79741a248d Bug 2911 - répondre au questionnaire ur1 sur les besoins en informatique
- added a graph that show the age pyramid of the machines
2020-06-18 15:30:30 +00:00
Guillaume Raffy 6e35053852 fixed bug that caused the price of gflop to actually be the price of flop 2020-06-08 17:15:26 +00:00
Guillaume Raffy bc29b1186e fixes to ClusterStats while attempting to add a new graph show the age of machines
- made ClusterStats work on graffy-ws2 : made code compatible with python3 (still compatible with python2 though)
- added support for physix in addition to simpatix)
- fixed bug
2020-06-08 17:01:02 +00:00
Guillaume Raffy 23fa150cbe fixed bug that caused the value of the reporting_params attribute to become corrupt.
Fixes : Bug 2812 - la mise à jour de physix-master a échoué (parsing de qconf -sconf global)
2020-03-05 10:50:50 +00:00
Guillaume Raffy 347a0c034b ls-dyna is now installed on physix cluster
Ticket n°215786 : installer le logiciel Ls Dyna sur le cluster de calcul de l'IPR

also fixes Bug 2812 - la mise à jour de physix-master a échoué (parsing de qconf -sconf global)
2020-01-24 11:20:24 +00:00
Guillaume Raffy 98c2b62f78 Now the case where the priority of a job is negative is handled
Bug 2809 - la page cluster status ne fonctionne plus (assert in QStatParser.py)
2020-01-17 14:54:02 +00:00
Guillaume Raffy 5fda1388a4 fixed bug that was introduced a few months ago that made Util.py incompatible with python2. However, intranet.ipr.univ-rennes1.fr still uses python2 and as a result the web site was no longer functional once the we site was updated in Bug 2789 - Faire en sorte que la mise à jour jour du site web du cluster puisse se faire de graffy-ws2 2019-11-21 09:00:57 +00:00
Guillaume Raffy 788798c74c fixes made a few weeks ago to make maco's release machanism work from a linux machine (graffy-ws2)
- Bug 2664 - mettre graffy-ws2 en production
2019-11-14 12:08:01 +00:00
Guillaume Raffy e39a65c288 Made clusterconfigurator code comply (mostly) to pylint and mypy best practices while working on bug 2701.
As a result, clusterconfigurator no longer pesses with hardcoded sys.path, we now use PYTHONPATH, which is somewhat cleaner.

As mypy doesn't support python2, I had to migrate code from python2 to python3. Because some libraries are shared between all python code, python2 code using these libraries will be broken while all python code is not upgraded to python3.

Bug 2701 - mettre en place un mécanisme pour éviter le blocage des gpu par des jobs cpu
2019-07-10 08:34:42 +00:00
Guillaume Raffy 13d52fd4d4 cleanup 2019-06-27 09:26:25 +00:00
Guillaume Raffy ac76f82b67 Bug 2680 - Améliorer le support des alimentation redondantes dans PowerDiagram
- après avoir tenté de représenter le powerdiagram en deux modes simultanés (normal et worst_case), j'ai finalement opté pour une option qui permet de choisir un seul mode à la fois. Le mode worst case scenario est actif par défaut parce qu'il est le plus important : il permet de vérifier que le cablage est bien dimensionné en cas de coupure edf ou d'une alim défectueuse. Bien que le mode de calcul soit différent, on retrouve bien la même conso sur edf dans les deux modes.
- amélioration : ajout de la conso en W de chaque appareil (pratique pour vérifier s'il n'y a pas d'incohérence)
2019-06-25 14:23:15 +00:00
Guillaume Raffy 3e0a8ff373 Bug 2514 - mettre à jour le plan de câblage alimentation des serveurs communs
- désormais, les prises des appareils doivent être renseignés, sinon powerdiagram affiche les données qui lui manquent. Ca a notamment permis de détecter un oubli qui faisait qu'un des câbles était affiché plus gros que les autres (16A au lieu de 10A) sans raison apparente.
2019-06-25 07:13:08 +00:00
Guillaume Raffy 7c7d519eaf Bug 2680 - Améliorer le support des alimentation redondantes dans PowerDiagram
- amélioration du code pour que  les alimentations redondantes soient gérées correctement. Il en ressort que plus de câbles sont en surcharge.
- j'en ai profité pour afficher les alims secourues en vert et les non secourues en bleu.
2019-06-24 17:20:49 +00:00
Guillaume Raffy 9aaefd495f Bug 2514 - mettre à jour le plan de câblage alimentation des serveurs communs
amélioration: désormais le diagramme fait bien ressortir les câblages qui méritent de l'attention :
- en rouge les cables en surcharge (dont l'ampérage max peut dépasser la capacité)
- en noir les cables qui n'ont pas de charge

Pour que le rouge soit réservé aux cables à problème, les conventions de couleurs pour les cables non surchargés ont été modifieés : c'est désormais en teintes de bleu.
2019-06-24 12:54:27 +00:00
Guillaume Raffy 889712b356 Bug 2677 - améliorer la lisibilité du powerdiagram en regroupant les noeuds par rack
désormais, le powerdiagram est plus lisible car les noeuds sont regroupés par rack. J'aurais bien aimé que les serveurs soient ordonnés comme dans les racks mais graphviz ne sait pas bien mixer des positions forcées avec des positions libres (lorsque j'ai essayé de le faire, les positions fixées ne l'étaient plus)
2019-06-21 15:17:13 +00:00
Guillaume Raffy e102d33a9d Bug 2672 - le power diagram contient des erreurs (certains serveurs affichent une consommation de 0 W)
- correction du bug (les machines physix.._.. n'étaient pas gérées, alors que les machines simpatix.._.. l'étaient; du coup, j'ai rendu le code plus générique pour qu'il reconnaisse les machines [a-z]+.._..)
2019-06-19 07:31:41 +00:00
Guillaume Raffy 4b1bb04565 Bug 2453 - simpatix50 n'arrive pas à récuperer son adresse ip ipmi via dhcp
désactivation d'ipmi sur les r610 tant que l'on n'a pas trouvé la cause du problème.
2018-10-03 10:32:53 +00:00
Guillaume Raffy 991744baf1 Bug 2447 - les pages "cluster status" et "cluster user guide" affichent des quantités de mémoire erronées
- fixed a bug : now, the displayed memory is correct even if the mem_available sge attribute returns the memory in other units than gigabyte

- also, this commit includes an old improvement that makes executeCommandOn handle an optional user, which was previously slinetly ignored in the case of a local command
2018-09-25 13:48:47 +00:00
Guillaume Raffy a2be664bbe Bug 2412 - la page cluster/UserGuide ne fonctionne plus
- for the sake of debugging, improved the handling of missing data in SimpaDb.sql : now when a data is missing, the exception handler displays a message describing what's missing.
2018-08-28 13:27:29 +00:00