iREADME - reed-alert - Lightweight agentless alerting system for server Err bitreich.org 70 hgit clone git://bitreich.org/reed-alert/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/reed-alert/ URL:git://bitreich.org/reed-alert/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/reed-alert/ bitreich.org 70 1Log /scm/reed-alert/log.gph bitreich.org 70 1Files /scm/reed-alert/files.gph bitreich.org 70 1Refs /scm/reed-alert/refs.gph bitreich.org 70 1Tags /scm/reed-alert/tag bitreich.org 70 1README /scm/reed-alert/file/README.gph bitreich.org 70 1LICENSE /scm/reed-alert/file/LICENSE.gph bitreich.org 70 i--- Err bitreich.org 70 iREADME (16626B) Err bitreich.org 70 i--- Err bitreich.org 70 i 1 Description Err bitreich.org 70 i 2 =========== Err bitreich.org 70 i 3 Err bitreich.org 70 i 4 reed-alert is a small and simple monitoring tool for your server, Err bitreich.org 70 i 5 written in Common LISP. Err bitreich.org 70 i 6 Err bitreich.org 70 i 7 reed-alert checks the status of various processes on a server and Err bitreich.org 70 i 8 triggers user defined notifications. Err bitreich.org 70 i 9 Err bitreich.org 70 i 10 Each triggered message is called an 'alert'. Err bitreich.org 70 i 11 Each check is called a 'probe'. Err bitreich.org 70 i 12 Each probe can be customized by different parameters. Err bitreich.org 70 i 13 Err bitreich.org 70 i 14 Err bitreich.org 70 i 15 Dependencies Err bitreich.org 70 i 16 ============ Err bitreich.org 70 i 17 Err bitreich.org 70 i 18 reed-alert is regularly tested on FreeBSD/OpenBSD/Linux and has been Err bitreich.org 70 i 19 tested with both **sbcl** and **ecl** - which should be available for Err bitreich.org 70 i 20 most distributions. Err bitreich.org 70 i 21 Err bitreich.org 70 i 22 (On OpenBSD you may prefer to use ecl because sbcl needs 'wxallowed' Err bitreich.org 70 i 23 on the partition where the binary is.) Err bitreich.org 70 i 24 Err bitreich.org 70 i 25 To make reed-alert's deployment easier I avoid using external Err bitreich.org 70 i 26 libraries. reed-alert only requires a Common LISP interpreter and a Err bitreich.org 70 i 27 its own files. Err bitreich.org 70 i 28 Err bitreich.org 70 i 29 A development to use quicklisp libraries to write more sophisticated Err bitreich.org 70 i 30 checks like "does this url contains a pattern ?" had begun and had Err bitreich.org 70 i 31 been abandoned, it has been decided to write shell command in the Err bitreich.org 70 i 32 probe **command** if the user need more elaborated checks. Err bitreich.org 70 i 33 Err bitreich.org 70 i 34 Err bitreich.org 70 i 35 Code-Readability Err bitreich.org 70 i 36 ================ Err bitreich.org 70 i 37 Err bitreich.org 70 i 38 Although the code is very rough for now, I think it's already fairly Err bitreich.org 70 i 39 understandable by people who do need this kind of tool. Err bitreich.org 70 i 40 Err bitreich.org 70 i 41 I will try to improve on the readability of the config file in future Err bitreich.org 70 i 42 commits. NOTE : declaration of notifiers is easier now. Err bitreich.org 70 i 43 Err bitreich.org 70 i 44 Err bitreich.org 70 i 45 Usage Err bitreich.org 70 i 46 ===== Err bitreich.org 70 i 47 Err bitreich.org 70 i 48 Install reed-alert Err bitreich.org 70 i 49 ------------------ Err bitreich.org 70 i 50 Err bitreich.org 70 i 51 $ cd reed-alert Err bitreich.org 70 i 52 $ make Err bitreich.org 70 i 53 $ sudo make install Err bitreich.org 70 i 54 $ /usr/local/bin/reed-alert ~/monitoring/my_config.lisp Err bitreich.org 70 i 55 Err bitreich.org 70 i 56 Err bitreich.org 70 i 57 Special folder Err bitreich.org 70 i 58 -------------- Err bitreich.org 70 i 59 Err bitreich.org 70 i 60 reed-alert will create a folder using the following path, in order to Err bitreich.org 70 i 61 save the probes states between each invocation. Err bitreich.org 70 i 62 Err bitreich.org 70 i 63 ~/.reed-alert/states/ Err bitreich.org 70 i 64 Err bitreich.org 70 i 65 If you delete it, you will lose the failures states of previous run. Err bitreich.org 70 i 66 Err bitreich.org 70 i 67 Err bitreich.org 70 i 68 Reed-alert start automation Err bitreich.org 70 i 69 --------------------------- Err bitreich.org 70 i 70 Err bitreich.org 70 i 71 You can use cron to start reed-alert every n minutes (or whatever time Err bitreich.org 70 i 72 range you want). The frequency depend on what you check, if you only Err bitreich.org 70 i 73 want to check the daily backup worked, running reed-alert once a day Err bitreich.org 70 i 74 is fine but if you need to monitor a critical service then every Err bitreich.org 70 i 75 minute seems more adapted. Err bitreich.org 70 i 76 Err bitreich.org 70 i 77 As always with cron jobs, be sure that either you call the interpreter Err bitreich.org 70 i 78 using its full path or that $PATH inside the crontab contains it. Err bitreich.org 70 i 79 Err bitreich.org 70 i 80 A cron job every minute using ecl would looks like this : Err bitreich.org 70 i 81 Err bitreich.org 70 i 82 */5 * * * * ( cd /opt/reed-alert/ && /usr/local/bin/ecl --shell server.lisp ) Err bitreich.org 70 i 83 Err bitreich.org 70 i 84 Err bitreich.org 70 i 85 Personal Configuration File Err bitreich.org 70 i 86 --------------------------- Err bitreich.org 70 i 87 You may want to rename **example-simple.lisp** to **config.lisp** in Err bitreich.org 70 i 88 order to create your own configuration file. Err bitreich.org 70 i 89 Err bitreich.org 70 i 90 The configuration is explained below. Err bitreich.org 70 i 91 Err bitreich.org 70 i 92 Err bitreich.org 70 i 93 The Notification System Err bitreich.org 70 i 94 ======================= Err bitreich.org 70 i 95 Err bitreich.org 70 i 96 When a check return a failure, a previously defined notifier will be Err bitreich.org 70 i 97 called. This will be triggered only after reed-alert find **3** Err bitreich.org 70 i 98 failures (not more or less, but this can be changed globally by Err bitreich.org 70 i 99 modifying *tries* variable) in a row for this check, this is a default Err bitreich.org 70 i 100 value that can be changed per probe with the :try parameter as Err bitreich.org 70 i 101 explained later in this document. This is to prevent reed-alert to Err bitreich.org 70 i 102 spam notifications for a long time (number of failures very high, like Err bitreich.org 70 i 103 a disk space usage that can't be fixed before a long time) OR Err bitreich.org 70 i 104 preventing reed-alert to send notifications about a check on the edge Err bitreich.org 70 i 105 of the limit like a ping almost working but failing from time to time Err bitreich.org 70 i 106 or the load average around the limit. Err bitreich.org 70 i 107 Err bitreich.org 70 i 108 reed-alert will use the notifier system when it reach its try number Err bitreich.org 70 i 109 and when the problem is fixed, so you know when it begins and when it Err bitreich.org 70 i 110 ends. Err bitreich.org 70 i 111 Err bitreich.org 70 i 112 It is possible to be reminded about a failure every n tries by setting Err bitreich.org 70 i 113 the keyword :reminder and using a number. This is useful if you want Err bitreich.org 70 i 114 to be reminded from time to time if a problem is not fixed, using some Err bitreich.org 70 i 115 alerts like mails can be easily overlooked or lost in a huge mail Err bitreich.org 70 i 116 amount. The :reminder is a setting per check. For a global reminder Err bitreich.org 70 i 117 setting, one can set *reminder* variable. Err bitreich.org 70 i 118 Err bitreich.org 70 i 119 reed-alert keep tracks of the count of failures with one file per Err bitreich.org 70 i 120 probe failing in the "states" folder. To ensure unique filenames, the Err bitreich.org 70 i 121 following format is used (+ means it's concatenated) : Err bitreich.org 70 i 122 Err bitreich.org 70 i 123 alert-name + probe-name + hash of probe parameters Err bitreich.org 70 i 124 Err bitreich.org 70 i 125 The notifier is a shell command with a name. The shell command can Err bitreich.org 70 i 126 contains variables from reed-alert. Err bitreich.org 70 i 127 Err bitreich.org 70 i 128 + %function% : the name of the probe Err bitreich.org 70 i 129 + %date% : the current date with format YYYY/MM/DD hh:mm:ss Err bitreich.org 70 i 130 + %params% : the parameters of the probe Err bitreich.org 70 i 131 + %hostname% : the hostname of the server Err bitreich.org 70 i 132 + %result% : the error returned (the value exceeding the limit, file not found) Err bitreich.org 70 i 133 + %desc : an arbitrary description naming a check, default to empty string Err bitreich.org 70 i 134 + %level% : the type of notification used Err bitreich.org 70 i 135 + %os% : the type of operating system (FreeBSD/Linux/OpenBSD) Err bitreich.org 70 i 136 + %newline% : a newline character Err bitreich.org 70 i 137 + %state% : "start" / "end" when problem happen / is solved Err bitreich.org 70 i 138 Err bitreich.org 70 i 139 Err bitreich.org 70 i 140 Example Probe 1: 'Check For Load Average' Err bitreich.org 70 i 141 --------------------------------------- Err bitreich.org 70 i 142 If you want to send a mail with a message like: Err bitreich.org 70 i 143 Err bitreich.org 70 i 144 "On 2016/10/06 11:11:12 server.foo.com has encountered a problem Err bitreich.org 70 i 145 during LOAD-AVERAGE-15 (:LIMIT 10) with a value of 30" Err bitreich.org 70 i 146 Err bitreich.org 70 i 147 Err bitreich.org 70 i 148 write the following at the top of the file and use **pretty-mail** in your checks: Err bitreich.org 70 i 149 Err bitreich.org 70 i 150 (alert pretty-mail "echo 'On %date% %hostname% has encountered a problem during %function% Err bitreich.org 70 i 151 %params% with a value of %result%' | mail yourmail@foo.bar") Err bitreich.org 70 i 152 Err bitreich.org 70 i 153 Example Probe 2: 'Don't do anything' Err bitreich.org 70 i 154 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Err bitreich.org 70 i 155 If you don't want anything to be done when an error occur, use the following : Err bitreich.org 70 i 156 Err bitreich.org 70 i 157 (alert nothing-to-send "") Err bitreich.org 70 i 158 Err bitreich.org 70 i 159 Example Probe 3: 'Send SMS' Err bitreich.org 70 i 160 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Err bitreich.org 70 i 161 You may want to use an external service to send a SMS, this is totally Err bitreich.org 70 i 162 possible as we rely on a shell command : Err bitreich.org 70 i 163 Err bitreich.org 70 i 164 (alert sms "echo 'error on %hostname : %function% %result%' Err bitreich.org 70 i 165 | curl -u login:pass http://api.sendsms.com/") Err bitreich.org 70 i 166 Err bitreich.org 70 i 167 Err bitreich.org 70 i 168 The Probes Err bitreich.org 70 i 169 ========== Err bitreich.org 70 i 170 Err bitreich.org 70 i 171 Probes are written in Common LISP. They are predefined checks. Err bitreich.org 70 i 172 Err bitreich.org 70 i 173 The :desc Parameter Err bitreich.org 70 i 174 ------------------- Err bitreich.org 70 i 175 The :desc parameter allows you to describe specifically what your check Err bitreich.org 70 i 176 does. It can be put in every probe. Err bitreich.org 70 i 177 Err bitreich.org 70 i 178 :desc "STRING" Err bitreich.org 70 i 179 Err bitreich.org 70 i 180 Err bitreich.org 70 i 181 The :try Parameter Err bitreich.org 70 i 182 ------------------ Err bitreich.org 70 i 183 The :try parameter allows you to change how many failure to wait Err bitreich.org 70 i 184 before the alert is triggered. By default, it's triggered after 3 Err bitreich.org 70 i 185 failures. Sometimes, when using ping for example, you want to be Err bitreich.org 70 i 186 notified when it fails a few cycles and not at first failure. Err bitreich.org 70 i 187 Err bitreich.org 70 i 188 :try INTEGER Err bitreich.org 70 i 189 Err bitreich.org 70 i 190 Err bitreich.org 70 i 191 Overview Err bitreich.org 70 i 192 -------- Err bitreich.org 70 i 193 As of this commit, reed-alert ships with the following probes: Err bitreich.org 70 i 194 Err bitreich.org 70 i 195 (1) number-of-processes Err bitreich.org 70 i 196 (2) pid-running Err bitreich.org 70 i 197 (3) disk-usage Err bitreich.org 70 i 198 (4) check-file-exists Err bitreich.org 70 i 199 (5) file-updated Err bitreich.org 70 i 200 (6) load-average-1 Err bitreich.org 70 i 201 (7) load-average-5 Err bitreich.org 70 i 202 (8) load-average-15 Err bitreich.org 70 i 203 (9) ping Err bitreich.org 70 i 204 (10) command Err bitreich.org 70 i 205 (11) service Err bitreich.org 70 i 206 (12) file-less-than Err bitreich.org 70 i 207 Err bitreich.org 70 i 208 Err bitreich.org 70 i 209 number-of-processes Err bitreich.org 70 i 210 ------------------- Err bitreich.org 70 i 211 Check if the actual number of processes of the system exceeds a specific limit. Err bitreich.org 70 i 212 Err bitreich.org 70 i 213 > Set the limit that will trigger an alert when exceeded. Err bitreich.org 70 i 214 :limit INTEGER Err bitreich.org 70 i 215 Err bitreich.org 70 i 216 Example: `(=> alert number-of-processes :limit 200)` Err bitreich.org 70 i 217 Err bitreich.org 70 i 218 Err bitreich.org 70 i 219 pid-running Err bitreich.org 70 i 220 ----------- Err bitreich.org 70 i 221 Check if the PID number found in a .pid file is alive. Err bitreich.org 70 i 222 Err bitreich.org 70 i 223 > Set the path of the pid file. If $USER doesn't have permission to open it, return "file not found". Err bitreich.org 70 i 224 :path "STRING" Err bitreich.org 70 i 225 Err bitreich.org 70 i 226 Example: `(=> alert pid-running :path "/var/run/nginx.pid")` Err bitreich.org 70 i 227 Err bitreich.org 70 i 228 Err bitreich.org 70 i 229 disk-usage Err bitreich.org 70 i 230 ---------- Err bitreich.org 70 i 231 Check if the disk-usage of a chosen partition does exceed a specific limit. Err bitreich.org 70 i 232 Err bitreich.org 70 i 233 > Set the mountpoint to check. Err bitreich.org 70 i 234 :path "STRING" Err bitreich.org 70 i 235 Err bitreich.org 70 i 236 > Set the limit that will trigger an alert when exceeded. Err bitreich.org 70 i 237 :limit INTEGER Err bitreich.org 70 i 238 Err bitreich.org 70 i 239 Example: `(=> alert disk-usage :path "/tmp" :limit 50)` Err bitreich.org 70 i 240 Err bitreich.org 70 i 241 Err bitreich.org 70 i 242 check-file-exists Err bitreich.org 70 i 243 ----------- Err bitreich.org 70 i 244 Check if a file exists. Err bitreich.org 70 i 245 Err bitreich.org 70 i 246 > Set the path of the file to check. Err bitreich.org 70 i 247 :path "STRING" Err bitreich.org 70 i 248 Err bitreich.org 70 i 249 Example: `(=> alert check-file-exists :path "/var/postgresql/standby")` Err bitreich.org 70 i 250 Err bitreich.org 70 i 251 Err bitreich.org 70 i 252 file-updated Err bitreich.org 70 i 253 ------------ Err bitreich.org 70 i 254 Check if a file exists and has been updated since a defined time. Err bitreich.org 70 i 255 Err bitreich.org 70 i 256 > Set the path of the file to check. Err bitreich.org 70 i 257 :path "STRING" Err bitreich.org 70 i 258 Err bitreich.org 70 i 259 > Set the limit in minutes since the last modification time before triggering an alert. Err bitreich.org 70 i 260 :limit INTEGER Err bitreich.org 70 i 261 Err bitreich.org 70 i 262 Example: `(=> alert file-updated :path "/var/log/nginx/access.log" :limit 60)` Err bitreich.org 70 i 263 Err bitreich.org 70 i 264 Err bitreich.org 70 i 265 load-average-1 Err bitreich.org 70 i 266 -------------- Err bitreich.org 70 i 267 Check if the load average during the last minute exceeds a specific limit. Err bitreich.org 70 i 268 Err bitreich.org 70 i 269 > Set the limit not to exceed. Err bitreich.org 70 i 270 :limit INTEGER Err bitreich.org 70 i 271 Err bitreich.org 70 i 272 Example: `(=> alert load-average-1 :limit 2)` Err bitreich.org 70 i 273 Err bitreich.org 70 i 274 Err bitreich.org 70 i 275 load-average-5 Err bitreich.org 70 i 276 -------------- Err bitreich.org 70 i 277 Check if the load average during the last five minutes exceeds a specific limit. Err bitreich.org 70 i 278 Err bitreich.org 70 i 279 > Set the limit not to exceed. Err bitreich.org 70 i 280 :limit INTEGER Err bitreich.org 70 i 281 Err bitreich.org 70 i 282 Example: `(=> alert load-average-5 :limit 2)` Err bitreich.org 70 i 283 Err bitreich.org 70 i 284 Err bitreich.org 70 i 285 load-average-15 Err bitreich.org 70 i 286 --------------- Err bitreich.org 70 i 287 Check if the load average during the last fifteen minutes exceeds a specific limit. Err bitreich.org 70 i 288 Err bitreich.org 70 i 289 > Set the limit not to exceed. Err bitreich.org 70 i 290 :limit INTEGER Err bitreich.org 70 i 291 Err bitreich.org 70 i 292 Example: `(=> alert load-average-15 :limit 2)` Err bitreich.org 70 i 293 Err bitreich.org 70 i 294 Err bitreich.org 70 i 295 ping Err bitreich.org 70 i 296 ---- Err bitreich.org 70 i 297 Check if a remote host answers the 2 ICMP ping. Err bitreich.org 70 i 298 Err bitreich.org 70 i 299 > Set the host to ping. Return an error if ping command returns non-zero. Err bitreich.org 70 i 300 :host "STRING" (can be IP or hostname) Err bitreich.org 70 i 301 Err bitreich.org 70 i 302 Example: `(=> alert ping :host "8.8.8.8")` Err bitreich.org 70 i 303 Err bitreich.org 70 i 304 Err bitreich.org 70 i 305 command Err bitreich.org 70 i 306 ------- Err bitreich.org 70 i 307 Execute an arbitrary command which triggers an alert if it returns a non-zero value. Err bitreich.org 70 i 308 This may be the most useful probe because it let the user do any check needed. Err bitreich.org 70 i 309 Err bitreich.org 70 i 310 > Command to execute, accept commands with pipes. Err bitreich.org 70 i 311 :command "STRING" Err bitreich.org 70 i 312 Err bitreich.org 70 i 313 Example: `(=> alert command :command "tail -n 10 /var/log/messages | grep -v CRITICAL")` Err bitreich.org 70 i 314 Err bitreich.org 70 i 315 Err bitreich.org 70 i 316 service Err bitreich.org 70 i 317 ------- Err bitreich.org 70 i 318 Check if a service is started on the system. Err bitreich.org 70 i 319 Err bitreich.org 70 i 320 > Set the name of the service to test Err bitreich.org 70 i 321 :name STRING Err bitreich.org 70 i 322 Err bitreich.org 70 i 323 Example: `(=> alert service :name "mysql-server")` Err bitreich.org 70 i 324 Err bitreich.org 70 i 325 Err bitreich.org 70 i 326 file-less-than Err bitreich.org 70 i 327 -------------- Err bitreich.org 70 i 328 Check if a file has a size less than a specified limit. Err bitreich.org 70 i 329 Err bitreich.org 70 i 330 > Set the path of the file to check. Err bitreich.org 70 i 331 :path "STRING" Err bitreich.org 70 i 332 Err bitreich.org 70 i 333 > Set the limit in bytes before triggering an alert. Err bitreich.org 70 i 334 :limit INTEGER Err bitreich.org 70 i 335 Err bitreich.org 70 i 336 Example: `(=> alert file-less-than :path "/var/log/nginx.log" :limit 60)` Err bitreich.org 70 i 337 Err bitreich.org 70 i 338 Err bitreich.org 70 i 339 curl-http-status Err bitreich.org 70 i 340 ---------------- Err bitreich.org 70 i 341 Do a HTTP request and return an error if the return code isn't Err bitreich.org 70 i 342 200. Requires curl. Err bitreich.org 70 i 343 Err bitreich.org 70 i 344 > Set the url to request. Err bitreich.org 70 i 345 :url "STRING" Err bitreich.org 70 i 346 Err bitreich.org 70 i 347 > Set the time to wait before aborting. Err bitreich.org 70 i 348 :timeout INTEGER Err bitreich.org 70 i 349 Err bitreich.org 70 i 350 Err bitreich.org 70 i 351 ssl-expiration Err bitreich.org 70 i 352 -------------------- Err bitreich.org 70 i 353 Check if a remote SSL certificate expires in less than a specified Err bitreich.org 70 i 354 time. Requires openssl. Err bitreich.org 70 i 355 Err bitreich.org 70 i 356 > Set the hostname for the request. Err bitreich.org 70 i 357 :host "STRING" Err bitreich.org 70 i 358 Err bitreich.org 70 i 359 > Set the expiration time limit in seconds. Err bitreich.org 70 i 360 :seconds INTEGER Err bitreich.org 70 i 361 Err bitreich.org 70 i 362 > Set the port for the request (OPTIONAL). Err bitreich.org 70 i 363 :port INTEGER (default to 443) Err bitreich.org 70 i 364 Err bitreich.org 70 i 365 > Use starttls (OPTIONAL). Err bitreich.org 70 i 366 :starttls STRING Err bitreich.org 70 i 367 Err bitreich.org 70 i 368 Example: `(=> alert ssl-expiration :host "domain.local" :seconds (* 7 24 60 60)) Err bitreich.org 70 i 369 Example: `(=> alert ssl-expiration :host "domain.local" :seconds 86400 :port 6697) Err bitreich.org 70 i 370 Example: `(=> alert ssl-expiration :host "smtp.domain.local" :seconds 86400 :starttls "smtp" :port 25) Err bitreich.org 70 i 371 Err bitreich.org 70 i 372 Err bitreich.org 70 i 373 write-to-file Err bitreich.org 70 i 374 -------------------- Err bitreich.org 70 i 375 Write content to a file, create it if non existent. Err bitreich.org 70 i 376 Err bitreich.org 70 i 377 The purpose of this probe is to be used at the end of a reed-alert Err bitreich.org 70 i 378 script to update the modification time of a file, and use file-updated Err bitreich.org 70 i 379 on this file at the beginning of a script to monitor if reed-alert did Err bitreich.org 70 i 380 finish correctly on last run. Err bitreich.org 70 i 381 Err bitreich.org 70 i 382 > Set the path of the file. Err bitreich.org 70 i 383 :path "STRING" Err bitreich.org 70 i 384 Err bitreich.org 70 i 385 > Set the content of the file (OPTIONAL). Err bitreich.org 70 i 386 :text "STRING" (default to current time in seconds) Err bitreich.org 70 i 387 Err bitreich.org 70 i 388 Example: `(=> alert write-to-file :path "/tmp/reed-alert.txt")` Err bitreich.org 70 i 389 Example: `(=> alert write-to-file :path "/tmp/reed-alert.txt" :text "hello world")` Err bitreich.org 70 i 390 Err bitreich.org 70 i 391 Err bitreich.org 70 i 392 The configuration file Err bitreich.org 70 i 393 ====================== Err bitreich.org 70 i 394 Err bitreich.org 70 i 395 The configuration file is Common LISP code, so it's evaluated. It's Err bitreich.org 70 i 396 possible to write some logic within it. Err bitreich.org 70 i 397 Err bitreich.org 70 i 398 Err bitreich.org 70 i 399 Loops Err bitreich.org 70 i 400 ----- Err bitreich.org 70 i 401 It's possible to write loops if you don't want to repeat code Err bitreich.org 70 i 402 Err bitreich.org 70 i 403 (loop for host in '("bitreich.org" "dataswamp.org" "floodgap.com") Err bitreich.org 70 i 404 do Err bitreich.org 70 i 405 (=> mail ping :host host)) Err bitreich.org 70 i 406 Err bitreich.org 70 i 407 or another example Err bitreich.org 70 i 408 Err bitreich.org 70 i 409 (loop for service in '("smtpd" "nginx" "mysqld" "postgresql") Err bitreich.org 70 i 410 do Err bitreich.org 70 i 411 (=> mail service :name service)) Err bitreich.org 70 i 412 Err bitreich.org 70 i 413 and another example using rows from a file to check remote hosts Err bitreich.org 70 i 414 Err bitreich.org 70 i 415 (with-open-file (stream "hosts.txt") Err bitreich.org 70 i 416 (loop for line = (read-line stream nil) Err bitreich.org 70 i 417 while line Err bitreich.org 70 i 418 do Err bitreich.org 70 i 419 (=> mail ping :host line))) Err bitreich.org 70 i 420 Err bitreich.org 70 i 421 Err bitreich.org 70 i 422 Conditional Err bitreich.org 70 i 423 ----------- Err bitreich.org 70 i 424 It is also possible to achieve conditionals. There are two very useful Err bitreich.org 70 i 425 conditionals groups. Err bitreich.org 70 i 426 Err bitreich.org 70 i 427 Err bitreich.org 70 i 428 Dependency Err bitreich.org 70 i 429 ~~~~~~~~~~ Err bitreich.org 70 i 430 Sometimes it may be a good idea to stop some probes if a probe Err bitreich.org 70 i 431 fail. In a case where you need to check a path through a network, from Err bitreich.org 70 i 432 the nearest machine to the remote target. If we can't reach our local Err bitreich.org 70 i 433 router, probes requiring the router to work will trigger errors so we Err bitreich.org 70 i 434 should skip them. Err bitreich.org 70 i 435 Err bitreich.org 70 i 436 (stop-if-error Err bitreich.org 70 i 437 (=> mail ping :host "192.168.1.1" :desc "My local router") Err bitreich.org 70 i 438 (=> mail ping :host "89.89.89.89" :desc "My ISP DNS server") Err bitreich.org 70 i 439 (=> mail ping :host "kernel.org" :desc "Remote website")) Err bitreich.org 70 i 440 Err bitreich.org 70 i 441 Note : stop-if-error is an alias for the **and** function. Err bitreich.org 70 i 442 Err bitreich.org 70 i 443 Err bitreich.org 70 i 444 Escalation Err bitreich.org 70 i 445 ~~~~~~~~~~ Err bitreich.org 70 i 446 It could be a good idea to use different alerts Err bitreich.org 70 i 447 depending on how critical a check is, but sometimes, the critical Err bitreich.org 70 i 448 level may depend of the value of the error and/or the delay between Err bitreich.org 70 i 449 the detection and fixing it. You could want to receive a mail when Err bitreich.org 70 i 450 things need to be fixed on spare time, but mail another people if Err bitreich.org 70 i 451 things aren't fixed after some level. Err bitreich.org 70 i 452 Err bitreich.org 70 i 453 (escalation Err bitreich.org 70 i 454 (=> mail-me disk-usage :path "/" :limit 70) Err bitreich.org 70 i 455 (=> sms-me disk-usage :path "/" :limit 90) Err bitreich.org 70 i 456 (=> buzzer disk-usage :path "/" :limit 98)) Err bitreich.org 70 i 457 Err bitreich.org 70 i 458 In this example, we check the disk usage, I will get a mail through Err bitreich.org 70 i 459 "mail-me" alert if the disk usage go get more than 70%. Once it goes Err bitreich.org 70 i 460 that far, it will check if the disk usage gets more than 90%, if so, Err bitreich.org 70 i 461 I'll receive a sms through "sms-me" alert. And then, if it goes more Err bitreich.org 70 i 462 than 98%, the "buzzer" alert will make some bad noises in the room to Err bitreich.org 70 i 463 warn me about this. Err bitreich.org 70 i 464 Err bitreich.org 70 i 465 Note : escalation is an alias for the **or** function. Err bitreich.org 70 i 466 Err bitreich.org 70 i 467 Err bitreich.org 70 i 468 Extend with your own probes Err bitreich.org 70 i 469 =========================== Err bitreich.org 70 i 470 Err bitreich.org 70 i 471 It is likely that you want to write your own probes. While using the Err bitreich.org 70 i 472 command probe can be convenient, you may want to have a probe with Err bitreich.org 70 i 473 more parameters and better integration than the command probe. Err bitreich.org 70 i 474 Err bitreich.org 70 i 475 There are two methods for adding probes : Err bitreich.org 70 i 476 - in the configuration file before using it Err bitreich.org 70 i 477 - in a separated lisp file that you load from the configuration file Err bitreich.org 70 i 478 Err bitreich.org 70 i 479 If you want to reuse for multiples configuration files or servers, I Err bitreich.org 70 i 480 would recommend a separate file, otherwise, adding it at the top of Err bitreich.org 70 i 481 the configuration file can be convenient too. Err bitreich.org 70 i 482 Err bitreich.org 70 i 483 Err bitreich.org 70 i 484 Using a shell command Err bitreich.org 70 i 485 --------------------- Err bitreich.org 70 i 486 Err bitreich.org 70 i 487 A minimum of Common LISP comprehension is needed for this. But using Err bitreich.org 70 i 488 the easiest way to go by writing a probe using a command shell, the Err bitreich.org 70 i 489 declaration can be really simple. Err bitreich.org 70 i 490 Err bitreich.org 70 i 491 We are going to write a probe that will use curl to fetch an page and Err bitreich.org 70 i 492 then grep on the output to look for a pattern. The return code of grep Err bitreich.org 70 i 493 will be the return status of the probe, if grep finds the pattern, Err bitreich.org 70 i 494 it's a success, if not it's a failure. Err bitreich.org 70 i 495 Err bitreich.org 70 i 496 In the following code, the "create-probe" part is a macro that will Err bitreich.org 70 i 497 write most of the code for you. Then, we use "command-return-code" Err bitreich.org 70 i 498 function which will execute the shell command passed as a string (or Err bitreich.org 70 i 499 as a list) and return the correct values in case of success or Err bitreich.org 70 i 500 failure. Err bitreich.org 70 i 501 Err bitreich.org 70 i 502 (create-probe Err bitreich.org 70 i 503 check-http-pattern Err bitreich.org 70 i 504 (command-return-code (format nil "curl ~a | grep -i ~a" Err bitreich.org 70 i 505 (getf params :url) (getf params :pattern)))) Err bitreich.org 70 i 506 Err bitreich.org 70 i 507 If you don't know LISP, "format" function works like "printf", using Err bitreich.org 70 i 508 "~a" instead of "%s". This is the only required thing to know if you Err bitreich.org 70 i 509 want to reuse the previous code. Err bitreich.org 70 i 510 Err bitreich.org 70 i 511 Then we can call it like this : Err bitreich.org 70 i 512 Err bitreich.org 70 i 513 (=> notifier check-http-pattern :url "http://127.0.0.1" :pattern "Powered by cl-yag") Err bitreich.org 70 i 514 Err bitreich.org 70 i 515 Err bitreich.org 70 i 516 Using plain LISP Err bitreich.org 70 i 517 ---------------- Err bitreich.org 70 i 518 Err bitreich.org 70 i 519 We have seen previously how tocreate new probes from a shell command, Err bitreich.org 70 i 520 but one may want to do it in LISP, allowing to use full features of Err bitreich.org 70 i 521 the language and even some libraries to check values in a database for Err bitreich.org 70 i 522 example. I recommend to read the "probes.lisp" file, it's the best way Err bitreich.org 70 i 523 to learn how to write a new probe. But as an example, we will learn Err bitreich.org 70 i 524 from the easiest probe included : check-file-exists Err bitreich.org 70 i 525 Err bitreich.org 70 i 526 (create-probe Err bitreich.org 70 i 527 check-file-exists Err bitreich.org 70 i 528 (let ((result (probe-file (getf params :path)))) Err bitreich.org 70 i 529 (if result Err bitreich.org 70 i 530 t Err bitreich.org 70 i 531 (list nil "file not found")))) Err bitreich.org 70 i 532 Err bitreich.org 70 i 533 Like before, we use the "create-probe" macro and give a name to the Err bitreich.org 70 i 534 probe. Then, we have to write some code, in the current case, check if Err bitreich.org 70 i 535 the file exists. Finally, if it is a success, we have to return **t**, Err bitreich.org 70 i 536 if it fails we return a list containing **nil** and a value or a Err bitreich.org 70 i 537 string. The second element in the list will replaced %result% in the Err bitreich.org 70 i 538 notification command, so you can use something explicit, a Err bitreich.org 70 i 539 concatenation of a message with the return value etc..". Parameters Err bitreich.org 70 i 540 should be get with getf from **params** variable, allowing to use a Err bitreich.org 70 i 541 default value in case it's not defined in the configuration file. Err bitreich.org 70 .