Functionality #17363
merge with the backupninja shared module
50%
Description
i've reviewed the upstream shared module for backupninja and it looks good!
a few things are not upstream:
- squeeze backup version (we should use the class parameter)
the whole $multiple_backups hack needs to be ported, probably by using an override for the '$home/rdiff-backup/' directory instead of the hack and fixing the monitoring script to avoid having to look in the "multibackups" (but simply look for rdiff-backup-data directories)- the real_hostname parameter for the rdiff define is dropped (but it seems unused)
the backup checking script can send directly to the nagios server and have parameters for delays, that would need to be ported as well- the /srv/backups default is changed
- the $backup_critical_threshold and $backup_warning_threshold
class parameters are only in lavamind's refactor branch: https://gitlab.com/shared-puppet-modules-group/backupninja/tree/march2015-refactor
donc en bref, next steps, pour le déploiement:
attendre une heure que puppet roule partoutrerouler puppet sur alexandrierm /etc/nagios3/conf.d/nagios_service.cfgrerouler puppet sur nagios0vérifier qu'on a pas des checks de backups en doublerouler /usr/local/checkbackups -s nagios0.koumbit.net -d /srv/backups et vérifier /var/log/nagios3/nagios.log pour des messages comme:
[1429740034] Warning: Passive check result was received for service 'backups' on host 'sciencepresse.koumbit.net', but the host could not be found!
pour le merge, il resterait à réviser le diff (voir sommaire ici), trouver une façon de désactiver les checks nagios de façon globale (site_backupninja?) et déployer le module upstream direct.
next steps:
- merge shared/master into master
- diff against our old master
- diff against shared/master
- merge lavamind's refactor into master so we can add parameters https://gitlab.com/shared-puppet-modules-group/backupninja/merge_requests/4
- merge our own pull request for good measure
- run this on shell, cache1, osiris, alexandrie, nagios, redo the above tests
yes, this is another deployment, so no fridays.
Related issues
History
#1 Updated by Antoine Beaupré about 11 years ago
- Status changed from New to In progress
i pushed a first attempt in the 'shared-merge' branch here. it still need to be changed to do all the above.
#2 Updated by Antoine Beaupré about 11 years ago
- Assignee set to Antoine Beaupré
i did a major refactor of the check script in the multi-backup branch, based on upstream. before we make a MR, we need to fix the other issues here...
it's too bad because i didn't start with the tiny cool things we have on our version so it's a all or nothing...
#3 Updated by Antoine Beaupré about 11 years ago
allo
je viens de déployer une nouvelle version du module backupninja qui
reconfigure la façon dont les backups sont faits sur les serveurs.
en théorie, il ne devrait y avoir pratiquement aucune changement: le
monitoring devrait continuer à fonctionner correctement et les backups
devraient se faire.
en pratique, j'ai dû interrompre plusieurs backups pour pouvoir faire le
déploiement et j'ai eu quelques problèmes dans le déploiement.
(en particulier, j'ai mis un fichier /etc/nologin sur alexandrie pour
empêcher les backups de se faire, mais ceci m'a empêché de me logguer
une fois que ma session desktop a crashé pour des raisons pas
reliée. après avoir échoué à me logguer sur la console, j'ai essayé de
faire un reboot, mais ceci n'a pas effacé le fichier
nologin. finalement, je me suis rappelé que "root" pouvait se connecter
et j'ai réussi à me brancher en console et enlever l'estie de
fichier. donc pratique à proscrire, particulièrement sur les vservers où
il y a pas de user root! alexandrie est en train de resyncer son raid à
cause du cold reboot.)
donc voilà, la situation actuelle est que le nouveau code est
déployé. il me reste à encore examiner les différences qui restent avec
le module shared et porter ça. il faut aussi attendre que puppet roule
partout pour que les bons checks nagios soient mis en place, et effacer
les anciens checks nagios qui sont maintenant en double.
mais ça avance, en bref. on a de quoi qu'on peut pusher. le gros merge
request upstream est ici:
https://gitlab.com/shared-puppet-modules-group/backupninja/merge_requests/5
a.
PS: en fouillant dans les backups de osiris, j'ai trouvé qu'on faisait
le backup de toutes les sessions de HAG, qui étaient pu
garbage-collected. donc on a un beau répertoire d'au dessus de 2 million
d'entrées sur notre serveur de fichiers. j'ai ouvert un ticket séparé
pour ça:
#6 Updated by Antoine Beaupré almost 11 years ago
- Description updated (diff)
bouger le dernier commentaire dans le sommaire
#7 Updated by Antoine Beaupré almost 11 years ago
- Description updated (diff)
j'ai recleané la config sur nagios0, il reste certains cas:
- mauvais noms de services ("backups" au lieu de "backups-something"): ceres, rtr0, rtr1
- backups en double: gpc, postgres1
#8 Updated by Antoine Beaupré almost 11 years ago
- mauvais noms de services ("backups" au lieu de "backups-something"): ceres, rtr0, rtr1 - ici c'est que ce ne sont pas des backups rdiff-backup! il faudrait arranger le script et les resources pour utiliser un nom plus descriptif!
backups en double: gpc, postgres1fixed.
#9 Updated by Antoine Beaupré almost 11 years ago
- Description updated (diff)
Antoine Beaupré a écrit :
- mauvais noms de services ("backups" au lieu de "backups-something"): ceres, rtr0, rtr1 - ici c'est que ce ne sont pas des backups rdiff-backup! il faudrait arranger le script et les resources pour utiliser un nom plus descriptif!
finalement, ceres ne configurait juste plus ses backups avec puppet, fixed.
rtr0 et rtr1 ne roulent pas puppet, on va les laisser tranquilles pour l'instant et skipper la partie des noms de services différents pour les autres types de backups.
le déploiement est presque fini, reste juste à checker nagios. j'ai mis à jour les trucs à porter comme étant done dans le sommaire aussi.
#10 Updated by Antoine Beaupré almost 11 years ago
- Description updated (diff)
voici les warnings qui restent:
[1429807219] Warning: Passive check result was received for service 'backups' on host 'alternc3.koumbit.net', but the host could not be found! [1429807228] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'db02.hahaha.com', but the host could not be found! [1429807230] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'democracy.aegirvps.net', but the host could not be found! [1429807231] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'dev0.aegir.koumbit.net', but the host could not be found! [1429807233] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'filer01.hahaha.com', but the host could not be found! [1429807234] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'filer2.office.koumbit.net', but the host could not be found! [1429807245] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'fqccl.koumbit.net', but the host could not be found! [1429807260] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'gobelet.hahaha.com', but the host could not be found! [1429807262] Warning: Passive check result was received for service 'backups' on host 'greenboard.koumbit.net', but the host could not be found! [1429807268] Warning: Passive check result was received for service 'backups' on host 'lodho.koumbit.net', but the host could not be found! [1429807271] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'micro.hahaha.com', but the host could not be found! [1429807274] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'netboot1.office.koumbit.net', but the host could not be found! [1429807283] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'percolab.koumbit.net', but the host could not be found! [1429807284] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'phap.koumbit.net', but the host could not be found! [1429807287] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'prod.origineight.aegirvps.net', but the host could not be found! [1429807292] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'rbo.hahaha.com', but the host could not be found! [1429807293] Warning: Passive check result was received for service 'backups-rdiff-backup-monthly' on host 'rcabm.koumbit.net', but the service could not be found! [1429807295] Warning: Passive check result was received for service 'backups' on host 'rhea.office.koumbit.net', but the host could not be found! [1429807301] Warning: Passive check result was received for service 'backups' on host 'rtr5-canix2.koumbit.net', but the host could not be found! [1429807301] Warning: Passive check result was received for service 'backups' on host 'rtr7.koumbit.net', but the host could not be found! [1429807302] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'sciencepresse.koumbit.net', but the host could not be found! [1429807303] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'soder.koumbit.net', but the service could not be found! [1429807304] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'sol.hahaha.com', but the host could not be found! [1429807309] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'voice3.office.koumbit.net', but the host could not be found! [1429807310] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'web01.hahaha.com', but the host could not be found! [1429807310] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'web02.hahaha.com', but the host could not be found! [1429807311] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'webmail0.koumbit.net', but the host could not be found! [1429807312] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'yvon.hahaha.com', but the host could not be found! [1429807312] Warning: Passive check result was received for service 'backups-rdiff-backup' on host 'zap.hahaha.com', but the host could not be found!
j'ai révisé les logs précédents et dans le logs de ... tous sont présents sauf greenboard.koumbit.net, mais je ne crois pas que ça soit une régression. donc déploiement done!
#11 Updated by Antoine Beaupré almost 11 years ago
- Description updated (diff)
redo another deployment plan, almost there!
#12 Updated by Antoine Beaupré almost 11 years ago
les machines de hahaha devraient pu faire de warning, voir #16443
#14 Updated by Gabriel Filion almost 11 years ago
- Related to Bug #17594: NSCA 2.9 (wheezy) ne supporte plus les fins de ligne dans l'output added
#17 Updated by Kienan Stewart over 5 years ago
- Assignee deleted (
Antoine Beaupré)
#18 Updated by Gabriel Filion over 5 years ago
- Status changed from In progress to Rejected
avec le passage à puppet 4 on utilise maintenant le module shared.