iCount failures and send begin/end notifications - reed-alert - Lightweight agentless alerting system for server Err bitreich.org 70 hgit clone git://bitreich.org/reed-alert/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/reed-alert/ URL:git://bitreich.org/reed-alert/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/reed-alert/ bitreich.org 70 1Log /scm/reed-alert/log.gph bitreich.org 70 1Files /scm/reed-alert/files.gph bitreich.org 70 1Refs /scm/reed-alert/refs.gph bitreich.org 70 1Tags /scm/reed-alert/tag bitreich.org 70 1README /scm/reed-alert/file/README.gph bitreich.org 70 1LICENSE /scm/reed-alert/file/LICENSE.gph bitreich.org 70 i--- Err bitreich.org 70 1commit f352b8458e9b406ce8795bf00c704c260c511cd6 /scm/reed-alert/commit/f352b8458e9b406ce8795bf00c704c260c511cd6.gph bitreich.org 70 1parent 1b2f15bf2974f893f7dd55ff6b4742dd0c0430d2 /scm/reed-alert/commit/1b2f15bf2974f893f7dd55ff6b4742dd0c0430d2.gph bitreich.org 70 hAuthor: Solene Rapenne URL:mailto:solene@perso.pw bitreich.org 70 iDate: Wed, 17 Jan 2018 20:38:54 +0100 Err bitreich.org 70 i Err bitreich.org 70 iCount failures and send begin/end notifications Err bitreich.org 70 i Err bitreich.org 70 iDiffstat: Err bitreich.org 70 i M README | 37 ++++++++++++++++++++++++++++--- Err bitreich.org 70 i M config.lisp.sample | 6 +++--- Err bitreich.org 70 i M example.lisp | 8 ++++---- Err bitreich.org 70 i M functions.lisp | 53 +++++++++++++++++++++++++++---- Err bitreich.org 70 i Err bitreich.org 70 i4 files changed, 88 insertions(+), 16 deletions(-) Err bitreich.org 70 i--- Err bitreich.org 70 1diff --git a/README b/README /scm/reed-alert/file/README.gph bitreich.org 70 i@@ -63,9 +63,29 @@ The configuration is explained below. Err bitreich.org 70 i The Notification System Err bitreich.org 70 i ======================= Err bitreich.org 70 i Err bitreich.org 70 i-When a check return an error, a previously defined notifier will be Err bitreich.org 70 i-called. The notifier is a shell command with a name. The shell command Err bitreich.org 70 i-can contains variables from reed-alert. Err bitreich.org 70 i+When a check return a failure, a previously defined notifier will be Err bitreich.org 70 i+called. This will be triggered only after reed-alert find **3** Err bitreich.org 70 i+failures (not more or less) in a row for this check, this is a default Err bitreich.org 70 i+value that can be changed per probe with the :try parameter as Err bitreich.org 70 i+explained later in this document. This is to prevent reed-alert to Err bitreich.org 70 i+spam notifications for a long time (number of failures very high, like Err bitreich.org 70 i+a disk space usage that can't be fixed before a long time) OR Err bitreich.org 70 i+preventing reed-alert to send notifications about a check on the edge Err bitreich.org 70 i+of the limit like a ping almost working but failing from time to time Err bitreich.org 70 i+or the load average around the limit. Err bitreich.org 70 i+ Err bitreich.org 70 i+reed-alert will use the notifier system when it reach its try number Err bitreich.org 70 i+and when the problem is fixed, so you know when it begins and when it Err bitreich.org 70 i+ends. Err bitreich.org 70 i+ Err bitreich.org 70 i+reed-alert keep tracks of the count of failures with one file per Err bitreich.org 70 i+probe failing in the "states" folder. To ensure unique filenames, the Err bitreich.org 70 i+following format is used (+ means it's concatenated) : Err bitreich.org 70 i+ Err bitreich.org 70 i+ alert-name + probe-name + hash of probe parameters Err bitreich.org 70 i+ Err bitreich.org 70 i+The notifier is a shell command with a name. The shell command can Err bitreich.org 70 i+contains variables from reed-alert. Err bitreich.org 70 i Err bitreich.org 70 i + %function% : the name of the probe Err bitreich.org 70 i + %date% : the current date with format YYYY/MM/DD hh:mm:ss Err bitreich.org 70 i@@ -76,6 +96,7 @@ can contains variables from reed-alert. Err bitreich.org 70 i + %level% : the type of notification used Err bitreich.org 70 i + %os% : the type of operating system (FreeBSD/Linux/OpenBSD) Err bitreich.org 70 i + %newline% : a newline character Err bitreich.org 70 i++ %state% : "start" / "end" when problem happen / is solved Err bitreich.org 70 i Err bitreich.org 70 i Err bitreich.org 70 i Example Probe 1: 'Check For Load Average' Err bitreich.org 70 i@@ -119,6 +140,16 @@ does. It can be put in every probe. Err bitreich.org 70 i :desc "STRING" Err bitreich.org 70 i Err bitreich.org 70 i Err bitreich.org 70 i+The :try Parameter Err bitreich.org 70 i+------------------ Err bitreich.org 70 i+The :try parameter allows you to change how many failure to wait Err bitreich.org 70 i+before the alert is triggered. By default, it's triggered after 3 Err bitreich.org 70 i+failures. Sometimes, when using ping for example, you want to be Err bitreich.org 70 i+notified when it fails a few cycles and not at first failure. Err bitreich.org 70 i+ Err bitreich.org 70 i+ :try INTEGER Err bitreich.org 70 i+ Err bitreich.org 70 i+ Err bitreich.org 70 i Overview Err bitreich.org 70 i -------- Err bitreich.org 70 i As of this commit, reed-alert ships with the following probes: Err bitreich.org 70 1diff --git a/config.lisp.sample b/config.lisp.sample /scm/reed-alert/file/config.lisp.sample.gph bitreich.org 70 i@@ -1,8 +1,8 @@ Err bitreich.org 70 i (load "functions.lisp") Err bitreich.org 70 i Err bitreich.org 70 i-(alert mail "echo -n 'Problem with %function% %date% %params%' | mail -s alarm mail@isp.net") Err bitreich.org 70 i-(alert sms "/home/user/sms.sh '%date% %function% %params% %hostname%") Err bitreich.org 70 i-(alert available-variables "REMINDER : %function% %params% %date% %hostname% %desc% %level% %os% %newline% %result%") Err bitreich.org 70 i+(alert mail "echo -n '[%state%] Problem with %function% %date% %params%' | mail -s '[%state%] alarm' mail@isp.net") Err bitreich.org 70 i+(alert sms "/home/user/sms.sh '%date% %state% %function% %params% %hostname%") Err bitreich.org 70 i+(alert available-variables "REMINDER : %function% %params% %date% %hostname% %desc% %level% %os% %newline% %result% %state%") Err bitreich.org 70 i (alert empty "") Err bitreich.org 70 i Err bitreich.org 70 i Err bitreich.org 70 1diff --git a/example.lisp b/example.lisp /scm/reed-alert/file/example.lisp.gph bitreich.org 70 i@@ -1,9 +1,9 @@ Err bitreich.org 70 i (load "functions.lisp") Err bitreich.org 70 i Err bitreich.org 70 i-(alert dont-use-it "REMINDER %function% %params% %date% %hostname% %desc% %level% %os% %newline% _ %space% %result%") Err bitreich.org 70 i+(alert dont-use-it "REMINDER %state% %function% %params% %date% %hostname% %desc% %level% %os% %newline% _ %space% %result%") Err bitreich.org 70 i (alert empty "") Err bitreich.org 70 i (alert mail "") Err bitreich.org 70 i-(alert peroket "echo 'problem at %date% with %function% %params%'") Err bitreich.org 70 i+(alert peroket "echo '%state% problem at %date% with %function% %params% : %result%'") Err bitreich.org 70 i (alert sms "echo -n '%date% %function% CRITICAL on %hostname%' | curl http://somewebservice") Err bitreich.org 70 i ;(alert mail "echo -n '%date% %hostname% had problem on %function% %newline% %params% values %result% %newline% Err bitreich.org 70 i ; %desc%' | mail -s '[Error] %function% - %hostname%' foo@bar.com") Err bitreich.org 70 i@@ -15,8 +15,8 @@ Err bitreich.org 70 i (=> peroket disk-usage :path "/tmp" :limit 0) ;; failure Err bitreich.org 70 i Err bitreich.org 70 i ;; check if :path file exists Err bitreich.org 70 i-(=> mail file-exists :path "/bsd.rd" :desc "OpenBSD kernel /bsd.rd") Err bitreich.org 70 i-(=> empty file-exists :path "/non-existant-file") ;; failure file not found Err bitreich.org 70 i+(=> mail file-exists :path "/bsd.rd" :desc "OpenBSD kernel /bsd.rd") Err bitreich.org 70 i+(=> empty file-exists :path "/non-existant-file" :try 1) ;; failure file not found Err bitreich.org 70 i Err bitreich.org 70 i ;; check if :path file exists and has been updated since :limit minutes Err bitreich.org 70 i (=> empty file-updated :path "/var/log/messages" :limit 400) Err bitreich.org 70 1diff --git a/functions.lisp b/functions.lisp /scm/reed-alert/file/functions.lisp.gph bitreich.org 70 i@@ -1,6 +1,8 @@ Err bitreich.org 70 i (require 'asdf) Err bitreich.org 70 i Err bitreich.org 70 i+(defparameter *tries* 3) Err bitreich.org 70 i (defparameter *alerts* '()) Err bitreich.org 70 i+(ensure-directories-exist "states/") Err bitreich.org 70 i Err bitreich.org 70 i (defun color(num1 num2) Err bitreich.org 70 i (format nil "~a[~a;~am" #\Escape num1 num2)) Err bitreich.org 70 i@@ -57,9 +59,10 @@ Err bitreich.org 70 i (push (list ',name ,string) Err bitreich.org 70 i *alerts*))) Err bitreich.org 70 i Err bitreich.org 70 i-(defun trigger-alert(level function params result) Err bitreich.org 70 i+(defun trigger-alert(level function params result state) Err bitreich.org 70 i (let* ((notifier-command (assoc level *alerts*)) Err bitreich.org 70 i (command-string (cadr notifier-command))) Err bitreich.org 70 i+ (setf command-string (replace-all command-string "%state%" (if (eql 'error state) "Start" "End"))) Err bitreich.org 70 i (setf command-string (replace-all command-string "%result%" (format nil "~a" result))) Err bitreich.org 70 i (setf command-string (replace-all command-string "%hostname%" (machine-instance))) Err bitreich.org 70 i (setf command-string (replace-all command-string "%os%" (software-type))) Err bitreich.org 70 i@@ -85,15 +88,53 @@ Err bitreich.org 70 i Err bitreich.org 70 i (defun =>(level fonction &rest params) Err bitreich.org 70 i (format t "[~a~a ~20A~a] ~45A" *yellow* level fonction *white* (getf params :desc params)) Err bitreich.org 70 i- (let ((hash (fnv-hash (format nil "~{~a~}" (nconc (list level fonction) (remove-if #'symbolp params))))) Err bitreich.org 70 i- (result (funcall fonction params))) Err bitreich.org 70 i+ (let* ((hash (fnv-hash (format nil "~{~a~}" (remove-if #'symbolp params)))) Err bitreich.org 70 i+ (result (funcall fonction params)) Err bitreich.org 70 i+ (filename (format nil "~a-~a-~a" level fonction hash)) Err bitreich.org 70 i+ (filepath (format nil "states/~a" filename))) Err bitreich.org 70 i (if (not (listp result)) Err bitreich.org 70 i (progn Err bitreich.org 70 i- (format t " => ~asuccess~a~%" *green* *white*) Err bitreich.org 70 i+ (if (probe-file filepath) Err bitreich.org 70 i+ ;; last time was a failure Err bitreich.org 70 i+ (progn Err bitreich.org 70 i+ (uiop:run-program (trigger-alert level fonction params t 'success) :output t) Err bitreich.org 70 i+ (delete-file filepath) Err bitreich.org 70 i+ (format t " => ~afailure => success~a~%" *green* *white*)) Err bitreich.org 70 i+ ;; last time was a success Err bitreich.org 70 i+ (format t " => ~asuccess~a~%" *green* *white*)) Err bitreich.org 70 i+ ;; we return t because it's ok Err bitreich.org 70 i t) Err bitreich.org 70 i+ Err bitreich.org 70 i (progn Err bitreich.org 70 i- (format t " => ~aerror~a~%" *red* *white*) Err bitreich.org 70 i- (uiop:run-program (trigger-alert level fonction params (cadr result)) :output t) Err bitreich.org 70 i+ (if (probe-file filepath) Err bitreich.org 70 i+ ;; error before Err bitreich.org 70 i+ ;; but how many ? Err bitreich.org 70 i+ (with-open-file (stream filepath :direction :input) Err bitreich.org 70 i+ (let ((tries (parse-integer (read-line stream 0 nil)))) Err bitreich.org 70 i+ (format t " => ~aerror (~a failures before)~a~%" *red* tries *white*) Err bitreich.org 70 i+ Err bitreich.org 70 i+ ;; more error than limit, send alert once Err bitreich.org 70 i+ (when (= tries (getf params :try *tries*)) Err bitreich.org 70 i+ (uiop:run-program (trigger-alert level fonction params (cadr result) 'error) :output t)) Err bitreich.org 70 i+ Err bitreich.org 70 i+ ;; increment the file Err bitreich.org 70 i+ (progn Err bitreich.org 70 i+ (with-open-file (stream-out filepath :direction :output Err bitreich.org 70 i+ :if-exists :supersede) Err bitreich.org 70 i+ (format stream-out "~a~%~a~%" (+ 1 tries) params))))) Err bitreich.org 70 i+ Err bitreich.org 70 i+ ;; file doesn't exist Err bitreich.org 70 i+ (with-open-file (stream-out filepath :direction :output Err bitreich.org 70 i+ :if-exists :supersede) Err bitreich.org 70 i+ (format t " => ~aerror (first failure)~a~%" *red* *white*) Err bitreich.org 70 i+ Err bitreich.org 70 i+ ;; maybe we would be warned at first error ? Err bitreich.org 70 i+ ;; code is duplicated from above because it Err bitreich.org 70 i+ ;; requires reading the non existent file Err bitreich.org 70 i+ (when (= 1 (getf params :try *tries*)) Err bitreich.org 70 i+ (uiop:run-program (trigger-alert level fonction params (cadr result) 'error) :output t)) Err bitreich.org 70 i+ Err bitreich.org 70 i+ (format stream-out "1~%~a~%" params))) Err bitreich.org 70 i nil)))) Err bitreich.org 70 i Err bitreich.org 70 i (load "probes.lisp") Err bitreich.org 70 .