root / plugins / logs / service_events @ 6c13e1d9
Historique | Voir | Annoter | Télécharger (14,1 ko)
| 1 |
#!/bin/bash |
|---|---|
| 2 |
|
| 3 |
set -e |
| 4 |
|
| 5 |
: << =cut |
| 6 |
|
| 7 |
=head1 DESCRIPTION |
| 8 |
|
| 9 |
service_events - Tracks the number of significant event occurrences per service |
| 10 |
|
| 11 |
This plugin is a riff on the loggrep family (\`loggrep\` and my own \`loggrepx_\`). |
| 12 |
However, rather than focusing on single log files, it focuses on providing |
| 13 |
insight into all "significant events" happening for a given service, which |
| 14 |
may be found across several log files. |
| 15 |
|
| 16 |
The idea is that any given service may produce events in various areas of |
| 17 |
operation. For example, while a typical web app might log runtime errors |
| 18 |
to it's app.log file, a filesystem change may prevent the whole app from |
| 19 |
even being bootstrapped, and this crucial error may be logged in an apache |
| 20 |
log or in syslog. |
| 21 |
|
| 22 |
This plugin attempts to give visibility into all such "important events" |
| 23 |
that may affect the proper functioning of a given service. It attempts to |
| 24 |
answer the question, "Is my service running normally?". |
| 25 |
|
| 26 |
Unfortunately, it won't help you trace down exactly where the events are |
| 27 |
coming from if you happen to be watching a number of different logs, but |
| 28 |
it will at least let you know that something is wrong and that action |
| 29 |
should be taken. To try to help with this, the plugin uses the extinfo |
| 30 |
field to list which logs currently have important events in them. |
| 31 |
|
| 32 |
The plugin can be included multiple times to create graphs for various |
| 33 |
differing kinds of services. For example, you may have both webservices |
| 34 |
and system cleanup services, and you want to keep an eye on them in |
| 35 |
different ways. |
| 36 |
|
| 37 |
You can accomplish this by linking the plugin twice with different names |
| 38 |
and providing different configuration for each instance. In general, you |
| 39 |
should think of a single instance of this plugin as representing a single |
| 40 |
class of services. |
| 41 |
|
| 42 |
|
| 43 |
=head1 CONFIGURATION |
| 44 |
|
| 45 |
Configuration for this plugin is admittedly complicated. What we're doing |
| 46 |
here is defining groups of logfiles that we're searching for various |
| 47 |
kinds of events. It is assumed that the _way_ we search for events in the |
| 48 |
logfiles is related to the type of logfile; thus, we associate match |
| 49 |
criteria with logfile groups. Then, we define services that we want to |
| 50 |
track, then mappings of logfile paths to those services. |
| 51 |
|
| 52 |
(Note that most instances will probably work best when run as root, since |
| 53 |
log files are usually (or at least should be) controlled with strict |
| 54 |
permissions.) |
| 55 |
|
| 56 |
Available config options include the following: |
| 57 |
|
| 58 |
Plugin-specific: |
| 59 |
|
| 60 |
env.<type>_logfiles - (reqd) Shell glob pattern defining logfiles of |
| 61 |
type <type> |
| 62 |
env.<type>_regex - (reqd) egrep pattern for finding events in logs |
| 63 |
of type <type> |
| 64 |
env.services - (optl) Space-separated list of service names |
| 65 |
env.services_autoconf - (optl) Shell glob pattern that expands to paths |
| 66 |
whose final member is the name of a service |
| 67 |
env.<service>_logbinding - (optl) egrep pattern for binding <service> to |
| 68 |
a given set of logfiles (based on path) |
| 69 |
env.<service>_warning - (optl) service-specific warning level override |
| 70 |
env.<service>_critical - (optl) service-specific critical level override |
| 71 |
|
| 72 |
Munin-standard: |
| 73 |
|
| 74 |
env.title - Graph title |
| 75 |
env.vlabel - Custom label for the vertical axis |
| 76 |
env.warning - Default warning level |
| 77 |
env.critical - Default critical level |
| 78 |
|
| 79 |
For plugin-specific options, the following rules apply: |
| 80 |
|
| 81 |
* <type> is any arbitrary string. It just has to match between <type>_logfiles |
| 82 |
and <type>_regex. Common values are "apache", "nginx", "apt", "syslog", etc. |
| 83 |
* <service> is a string derived by passing the service name through a filter |
| 84 |
that removes non-alphabet characters from the beginning and replaces all non- |
| 85 |
alpha-numeric characters with underscore (\`_\`). |
| 86 |
* logfiles are bound to services by matching <service>_logbinding on the full |
| 87 |
logfile path. For example, specifying my_site_logbinding=my-site would bind |
| 88 |
both /var/log/my-site/errors.log and /srv/www/my-site/logs/app.log to the |
| 89 |
defined my-site service. |
| 90 |
|
| 91 |
|
| 92 |
=head2 SERVICE AUTOCONF |
| 93 |
|
| 94 |
Because services are often dynamic and you don't want to have to manually update |
| 95 |
config every time you deploy a new service, you have the option of defining a |
| 96 |
glob pattern that resolves to a collection of paths whose endpoints are service |
| 97 |
names. Because of the way services are deployed in real life, it's fairly common |
| 98 |
that paths will exist on your system that can accommodate this. Most often it |
| 99 |
will be something like /srv/*/*, which would match all children in /srv/www/ and |
| 100 |
/srv/local/. |
| 101 |
|
| 102 |
If you choose not to use the autoconf feature, you MUST specify services as a |
| 103 |
space-separated list of service names in the \`services\` variable. |
| 104 |
|
| 105 |
|
| 106 |
=head2 EXAMPLE CONFIGS |
| 107 |
|
| 108 |
This example uses services autoconf: |
| 109 |
|
| 110 |
[service_events] |
| 111 |
user root |
| 112 |
env.services_autoconf /srv/*/* |
| 113 |
env.cfxsvc_logfiles /srv/*/*/logs/app.log |
| 114 |
env.cfxsvc_regex error|alert|crit|emerg |
| 115 |
env.phpfpm_logfiles /srv/*/*/logs/php-fpm*.log |
| 116 |
env.phpfpm_regex Fatal error |
| 117 |
env.apache_logfiles /srv/*/*/logs/errors.log |
| 118 |
env.apache_regex error|alert|crit|emerg |
| 119 |
env.warning 1 |
| 120 |
env.critical 5 |
| 121 |
env.my_special_service_warning 100 |
| 122 |
env.my_special_service_critical 300 |
| 123 |
|
| 124 |
This example DOESN'T use services autoconf: |
| 125 |
|
| 126 |
[service_events] |
| 127 |
user root |
| 128 |
env.services auth.example.com admin.example.com www.example.com |
| 129 |
env.auth_example_com_logbinding my-custom-binding[0-9]+ |
| 130 |
env.cfxsvc_logfiles /srv/*/*/logs/app.log |
| 131 |
env.cfxsvc_regex error|alert|crit|emerg |
| 132 |
env.phpfpm_logfiles /srv/*/*/logs/php-fpm*.log |
| 133 |
env.phpfpm_regex Fatal error |
| 134 |
env.apache_logfiles /srv/*/*/logs/errors.log |
| 135 |
env.apache_regex error|alert|crit|emerg |
| 136 |
env.warning 1 |
| 137 |
env.critical 5 |
| 138 |
env.auth_example_com_warning 100 |
| 139 |
env.auth_example_com_critical 300 |
| 140 |
env.www_example_com_warning 50 |
| 141 |
env.www_example_com_critical 100 |
| 142 |
|
| 143 |
This graph will ONLY ever show values for the three listed services, even |
| 144 |
if other services are installed whose logfiles match the logfiles search. |
| 145 |
|
| 146 |
Also notice that in this example, we've only listed a log binding for the |
| 147 |
auth service. The plugin will use the service name by default for any |
| 148 |
services that don't specify a log binding, so in this case, auth has a |
| 149 |
custom log binding, while all other services have log bindings equal to |
| 150 |
their names. |
| 151 |
|
| 152 |
|
| 153 |
=head1 AUTHOR |
| 154 |
|
| 155 |
Kael Shipman <kael.shipman@gmail.com> |
| 156 |
|
| 157 |
|
| 158 |
=head1 LICENSE |
| 159 |
|
| 160 |
MIT LICENSE |
| 161 |
|
| 162 |
Copyright 2018 Kael Shipman<kael.shipman@gmail.com> |
| 163 |
|
| 164 |
Permission is hereby granted, free of charge, to any person obtaining a |
| 165 |
copy of this software and associated documentation files (the "Software"), |
| 166 |
to deal in the Software without restriction, including without limitation |
| 167 |
the rights to use, copy, modify, merge, publish, distribute, sublicense, |
| 168 |
and/or sell copies of the Software, and to permit persons to whom the |
| 169 |
Software is furnished to do so, subject to the following conditions: |
| 170 |
|
| 171 |
The above copyright notice and this permission notice shall be included |
| 172 |
in all copies or substantial portions of the Software. |
| 173 |
|
| 174 |
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS |
| 175 |
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
| 176 |
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL |
| 177 |
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR |
| 178 |
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, |
| 179 |
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR |
| 180 |
OTHER DEALINGS IN THE SOFTWARE. |
| 181 |
|
| 182 |
|
| 183 |
=head1 MAGIC MARKERS |
| 184 |
|
| 185 |
#%# family=manual |
| 186 |
|
| 187 |
=cut |
| 188 |
|
| 189 |
|
| 190 |
# Get list of all currently set env variables |
| 191 |
vars="$(printenv | sed -r "s/^([^=]+).*/\1/g")" |
| 192 |
|
| 193 |
# Certain variables MUST be set; check that they are (using bitmask) |
| 194 |
setvars=0 |
| 195 |
reqvars=(_logfiles _regex) |
| 196 |
while read -u 3 -r v; do |
| 197 |
n=0 |
| 198 |
while [ $n -lt "${#reqvars[@]}" ]; do
|
| 199 |
if echo "$v" | grep -Eq "${reqvars[$n]}$"; then
|
| 200 |
!((setvars|=$(( 2 ** $n )) )) |
| 201 |
fi |
| 202 |
!((n++)) |
| 203 |
done |
| 204 |
done 3< <(echo "$vars") |
| 205 |
|
| 206 |
|
| 207 |
# Sum all required variables |
| 208 |
n=0 |
| 209 |
allvars=0 |
| 210 |
while [ $n -lt "${#reqvars[@]}" ]; do
|
| 211 |
!((allvars+=$(( 2 ** $n )))) |
| 212 |
!((n++)) |
| 213 |
done |
| 214 |
|
| 215 |
# And scream if something's not set |
| 216 |
if ! [ "$setvars" -eq "$allvars" ]; then |
| 217 |
>&2 echo "E: Missing some required variables:" |
| 218 |
>&2 echo |
| 219 |
n=0 |
| 220 |
i=1 |
| 221 |
while [ $n -lt "${#reqvars[@]}" ]; do
|
| 222 |
if [ $(( $setvars & $i )) -eq 0 ]; then |
| 223 |
>&2 echo " *${reqvars[$n]}"
|
| 224 |
fi |
| 225 |
i=$((i<<1)) |
| 226 |
!((n++)) |
| 227 |
done |
| 228 |
>&2 echo |
| 229 |
>&2 echo "Please read the docs." |
| 230 |
exit 1 |
| 231 |
fi |
| 232 |
|
| 233 |
# Check for more difficult variables |
| 234 |
if [ -z "$services" ] && [ -z "$services_autoconf" ]; then |
| 235 |
>&2 echo "E: You must pass either \$services or \$services_autoconf" |
| 236 |
exit 1 |
| 237 |
fi |
| 238 |
if [ -z "$services_autoconf" ] && ! echo "$vars" | grep -q "_logbinding"; then |
| 239 |
>&2 echo "E: You must pass either \$*_logbinding (for each service) or \$services_autoconf" |
| 240 |
exit 1 |
| 241 |
fi |
| 242 |
|
| 243 |
|
| 244 |
# Now go find all log files |
| 245 |
LOGFILES= |
| 246 |
declare -a LOGFILEMAP |
| 247 |
while read -u 3 -r v; do |
| 248 |
if echo "$v" | grep -Eq "_logfiles$"; then |
| 249 |
# Get the name associated with these logfiles |
| 250 |
logfiletype="${v%_logfiles}"
|
| 251 |
# This serves to expand globs while preserving spaces (and also appends the necessary newline) |
| 252 |
while IFS= read -u 4 -r -d$'\n' line; do |
| 253 |
LOGFILEMAP+=($logfiletype) |
| 254 |
LOGFILES="${LOGFILES}$line"$'\n'
|
| 255 |
done 4< <(IFS= ; for f in ${!v}; do echo "$f"; done)
|
| 256 |
fi |
| 257 |
done 3< <(echo "$vars") |
| 258 |
|
| 259 |
|
| 260 |
# Set some defaults and other values |
| 261 |
title="${title:-Important Events per Service}"
|
| 262 |
vlabel="${vlabel:-events}"
|
| 263 |
|
| 264 |
# If services_autoconf is passed, it is assumed to be a shell glob, the leaves of which are the services |
| 265 |
# This also autobinds the service, if not already bound |
| 266 |
if [ -n "$services_autoconf" ]; then |
| 267 |
declare -a services |
| 268 |
IFS= |
| 269 |
for s in $services_autoconf; do |
| 270 |
s="$(basename "$s")" |
| 271 |
services+=("$s")
|
| 272 |
done |
| 273 |
unset IFS |
| 274 |
else |
| 275 |
services=($services) |
| 276 |
fi |
| 277 |
|
| 278 |
|
| 279 |
# Import munin functions |
| 280 |
. "$MUNIN_LIBDIR/plugins/plugin.sh" |
| 281 |
|
| 282 |
|
| 283 |
# Now get to the real function definitions |
| 284 |
|
| 285 |
function config() {
|
| 286 |
echo "graph_title ${title}"
|
| 287 |
echo "graph_args --base 1000 -l 0" |
| 288 |
echo "graph_vlabel ${vlabel}"
|
| 289 |
echo "graph_category other" |
| 290 |
echo "graph_info Lists number of matching lines found in various logfiles associated with each service. Extinfo displays currently affected logs." |
| 291 |
|
| 292 |
local var_prefix |
| 293 |
while read -u 3 -r svc; do |
| 294 |
var_prefix="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')" |
| 295 |
echo "$var_prefix.label $svc" |
| 296 |
print_warning "$var_prefix" |
| 297 |
print_critical "$var_prefix" |
| 298 |
echo "$var_prefix.info Number of event occurrences for $svc" |
| 299 |
done 3< <(IFS=$'\n'; echo "${services[*]}")
|
| 300 |
} |
| 301 |
|
| 302 |
|
| 303 |
|
| 304 |
|
| 305 |
function fetch() {
|
| 306 |
# Load state |
| 307 |
touch "$MUNIN_STATEFILE" |
| 308 |
local curstate="$(cat "$MUNIN_STATEFILE")" |
| 309 |
local nextstate=() |
| 310 |
|
| 311 |
local n svcnm varnm service svc svc_counter logbinding logfile lognm logmatch prvlines curlines matches extinfo_var |
| 312 |
|
| 313 |
# Set service counters to 0 and set any logbindings that aren't yet set |
| 314 |
while read -u 3 -r svc; do |
| 315 |
svcnm="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')" |
| 316 |
typeset "${svcnm}_total=0"
|
| 317 |
|
| 318 |
varnm="${svcnm}_logbinding"
|
| 319 |
if [ -z "$(echo "$curstate" | grep "^${varnm}=" | cut -f 2 -d "=")" ]; then
|
| 320 |
typeset "$varnm=$svc" |
| 321 |
fi |
| 322 |
done 3< <(IFS=$'\n'; echo "${services[*]}")
|
| 323 |
|
| 324 |
n=0 |
| 325 |
while read -u 3 -r logfile; do |
| 326 |
# Handling trailing newline |
| 327 |
if [ -z "$logfile" ]; then |
| 328 |
continue |
| 329 |
fi |
| 330 |
|
| 331 |
# Find which service this logfile is associated with |
| 332 |
service= |
| 333 |
while read -u 4 -r svc; do |
| 334 |
logbinding="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')_logbinding" |
| 335 |
if echo "$logfile" | grep -Eq "${!logbinding}"; then
|
| 336 |
service="$svc" |
| 337 |
break |
| 338 |
fi |
| 339 |
done 4< <(IFS=$'\n'; echo "${services[*]}")
|
| 340 |
|
| 341 |
# Skip this log if it's not associated with any service |
| 342 |
if [ -z "$service" ]; then |
| 343 |
>&2 echo "W: No service associated with log $logfile. Skipping...." |
| 344 |
continue |
| 345 |
fi |
| 346 |
|
| 347 |
# Get shell-compatible names for service and logfile |
| 348 |
svcnm="$(echo "$service" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')" |
| 349 |
lognm="$(echo "$logfile" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')" |
| 350 |
|
| 351 |
# Get previous line count to determine whether or not the file may have been rotated (defaulting to 0) |
| 352 |
prvlines="$(echo "$curstate" | grep "^${lognm}_lines=" | cut -f 2 -d "=")"
|
| 353 |
prvlines="${prvlines:-0}"
|
| 354 |
|
| 355 |
# Get the current number of lines in the file (defaulting to 0 on error) |
| 356 |
curlines="$(wc -l < "$logfile")" |
| 357 |
curlines="${curlines:-0}"
|
| 358 |
|
| 359 |
# If the current line count is less than the previous line count, we've probably rotated. |
| 360 |
# Reset to 0. |
| 361 |
if [ "$curlines" -lt "$prvlines" ]; then |
| 362 |
prvlines=0 |
| 363 |
else |
| 364 |
prvlines=$((prvlines + 1)) |
| 365 |
fi |
| 366 |
|
| 367 |
# Get incidents starting at the line after the last line we've seen |
| 368 |
logmatch="${LOGFILEMAP[$n]}_regex"
|
| 369 |
matches="$(tail -n +"$prvlines" "$logfile" | grep -Ec "${!logmatch}" || true)"
|
| 370 |
|
| 371 |
# If there were matches, aggregate them and add this log to the extinfo for the service |
| 372 |
if [ "$matches" -gt 0 ]; then |
| 373 |
# Aggregate and add to the correct service counter |
| 374 |
svc_counter="${svcnm}_total"
|
| 375 |
!((matches+=${!svc_counter}))
|
| 376 |
typeset "$svc_counter=$matches" |
| 377 |
|
| 378 |
# Add this log to extinfo for service |
| 379 |
extinfo_var="${svcnm}_extinfo"
|
| 380 |
typeset "$extinfo_var=${!extinfo_var}$logfile, "
|
| 381 |
fi |
| 382 |
|
| 383 |
# Push onto next state |
| 384 |
nextstate+=("${lognm}_lines=$curlines")
|
| 385 |
|
| 386 |
!((n++)) |
| 387 |
done 3< <(echo "$LOGFILES") |
| 388 |
|
| 389 |
# Write state to munin statefile |
| 390 |
(IFS=$'\n'; echo "${nextstate[*]}" > "$MUNIN_STATEFILE")
|
| 391 |
|
| 392 |
# Now echo values |
| 393 |
while read -u 3 -r svc; do |
| 394 |
svcnm="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')" |
| 395 |
svc_counter="${svcnm}_total"
|
| 396 |
extinfo_var="${svcnm}_extinfo"
|
| 397 |
echo "${svcnm}.value ${!svc_counter}"
|
| 398 |
echo "${svcnm}.extinfo ${!extinfo_var}"
|
| 399 |
done 3< <(IFS=$'\n'; echo "${services[*]}")
|
| 400 |
|
| 401 |
return 0 |
| 402 |
} |
| 403 |
|
| 404 |
|
| 405 |
|
| 406 |
|
| 407 |
|
| 408 |
case "$1" in |
| 409 |
config) config ;; |
| 410 |
*) fetch ;; |
| 411 |
esac |
| 412 |
|
