Projet

Général

Profil

Paste
Télécharger au format
Statistiques
| Branche: | Révision:

root / plugins / logs / service_events @ 6c13e1d9

Historique | Voir | Annoter | Télécharger (14,1 ko)

1
#!/bin/bash
2

    
3
set -e
4

    
5
: << =cut
6

    
7
=head1 DESCRIPTION
8

    
9
service_events - Tracks the number of significant event occurrences per service
10

    
11
This plugin is a riff on the loggrep family (\`loggrep\` and my own \`loggrepx_\`).
12
However, rather than focusing on single log files, it focuses on providing
13
insight into all "significant events" happening for a given service, which
14
may be found across several log files.
15

    
16
The idea is that any given service may produce events in various areas of
17
operation. For example, while a typical web app might log runtime errors
18
to it's app.log file, a filesystem change may prevent the whole app from
19
even being bootstrapped, and this crucial error may be logged in an apache
20
log or in syslog.
21

    
22
This plugin attempts to give visibility into all such "important events"
23
that may affect the proper functioning of a given service. It attempts to
24
answer the question, "Is my service running normally?".
25

    
26
Unfortunately, it won't help you trace down exactly where the events are
27
coming from if you happen to be watching a number of different logs, but
28
it will at least let you know that something is wrong and that action
29
should be taken. To try to help with this, the plugin uses the extinfo
30
field to list which logs currently have important events in them.
31

    
32
The plugin can be included multiple times to create graphs for various
33
differing kinds of services. For example, you may have both webservices
34
and system cleanup services, and you want to keep an eye on them in
35
different ways.
36

    
37
You can accomplish this by linking the plugin twice with different names
38
and providing different configuration for each instance. In general, you
39
should think of a single instance of this plugin as representing a single
40
class of services.
41

    
42

    
43
=head1 CONFIGURATION
44

    
45
Configuration for this plugin is admittedly complicated. What we're doing
46
here is defining groups of logfiles that we're searching for various
47
kinds of events. It is assumed that the _way_ we search for events in the
48
logfiles is related to the type of logfile; thus, we associate match
49
criteria with logfile groups. Then, we define services that we want to
50
track, then mappings of logfile paths to those services.
51

    
52
(Note that most instances will probably work best when run as root, since
53
log files are usually (or at least should be) controlled with strict
54
permissions.)
55

    
56
Available config options include the following:
57

    
58
 Plugin-specific:
59

    
60
 env.<type>_logfiles      - (reqd) Shell glob pattern defining logfiles of
61
                            type <type>
62
 env.<type>_regex         - (reqd) egrep pattern for finding events in logs
63
                            of type <type>
64
 env.services             - (optl) Space-separated list of service names
65
 env.services_autoconf    - (optl) Shell glob pattern that expands to paths
66
                            whose final member is the name of a service
67
 env.<service>_logbinding - (optl) egrep pattern for binding <service> to
68
                            a given set of logfiles (based on path)
69
 env.<service>_warning    - (optl) service-specific warning level override
70
 env.<service>_critical   - (optl) service-specific critical level override
71

    
72
 Munin-standard:
73

    
74
 env.title                - Graph title
75
 env.vlabel               - Custom label for the vertical axis
76
 env.warning              - Default warning level
77
 env.critical             - Default critical level
78

    
79
For plugin-specific options, the following rules apply:
80

    
81
* <type> is any arbitrary string. It just has to match between <type>_logfiles
82
  and <type>_regex. Common values are "apache", "nginx", "apt", "syslog", etc.
83
* <service> is a string derived by passing the service name through a filter
84
  that removes non-alphabet characters from the beginning and replaces all non-
85
  alpha-numeric characters with underscore (\`_\`).
86
* logfiles are bound to services by matching <service>_logbinding on the full
87
  logfile path. For example, specifying my_site_logbinding=my-site would bind
88
  both /var/log/my-site/errors.log and /srv/www/my-site/logs/app.log to the
89
  defined my-site service.
90

    
91

    
92
=head2 SERVICE AUTOCONF
93

    
94
Because services are often dynamic and you don't want to have to manually update
95
config every time you deploy a new service, you have the option of defining a
96
glob pattern that resolves to a collection of paths whose endpoints are service
97
names. Because of the way services are deployed in real life, it's fairly common
98
that paths will exist on your system that can accommodate this. Most often it
99
will be something like /srv/*/*, which would match all children in /srv/www/ and
100
/srv/local/.
101

    
102
If you choose not to use the autoconf feature, you MUST specify services as a
103
space-separated list of service names in the \`services\` variable.
104

    
105

    
106
=head2 EXAMPLE CONFIGS
107

    
108
This example uses services autoconf:
109

    
110
	[service_events]
111
	user root
112
	env.services_autoconf /srv/*/*
113
	env.cfxsvc_logfiles /srv/*/*/logs/app.log
114
	env.cfxsvc_regex error|alert|crit|emerg
115
	env.phpfpm_logfiles /srv/*/*/logs/php-fpm*.log
116
	env.phpfpm_regex Fatal error
117
	env.apache_logfiles /srv/*/*/logs/errors.log
118
	env.apache_regex error|alert|crit|emerg
119
	env.warning 1
120
	env.critical 5
121
	env.my_special_service_warning 100
122
	env.my_special_service_critical 300
123

    
124
This example DOESN'T use services autoconf:
125

    
126
	[service_events]
127
	user root
128
	env.services auth.example.com admin.example.com www.example.com
129
    env.auth_example_com_logbinding my-custom-binding[0-9]+
130
	env.cfxsvc_logfiles /srv/*/*/logs/app.log
131
	env.cfxsvc_regex error|alert|crit|emerg
132
	env.phpfpm_logfiles /srv/*/*/logs/php-fpm*.log
133
	env.phpfpm_regex Fatal error
134
	env.apache_logfiles /srv/*/*/logs/errors.log
135
	env.apache_regex error|alert|crit|emerg
136
	env.warning 1
137
	env.critical 5
138
	env.auth_example_com_warning 100
139
	env.auth_example_com_critical 300
140
	env.www_example_com_warning 50
141
	env.www_example_com_critical 100
142

    
143
This graph will ONLY ever show values for the three listed services, even
144
if other services are installed whose logfiles match the logfiles search.
145

    
146
Also notice that in this example, we've only listed a log binding for the
147
auth service. The plugin will use the service name by default for any
148
services that don't specify a log binding, so in this case, auth has a
149
custom log binding, while all other services have log bindings equal to
150
their names.
151

    
152

    
153
=head1 AUTHOR
154

    
155
Kael Shipman <kael.shipman@gmail.com>
156

    
157

    
158
=head1 LICENSE
159

    
160
MIT LICENSE
161

    
162
Copyright 2018 Kael Shipman<kael.shipman@gmail.com>
163

    
164
Permission is hereby granted, free of charge, to any person obtaining a
165
copy of this software and associated documentation files (the "Software"),
166
to deal in the Software without restriction, including without limitation
167
the rights to use, copy, modify, merge, publish, distribute, sublicense,
168
and/or sell copies of the Software, and to permit persons to whom the
169
Software is furnished to do so, subject to the following conditions:
170

    
171
The above copyright notice and this permission notice shall be included
172
in all copies or substantial portions of the Software.
173

    
174
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
175
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
176
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
177
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
178
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
179
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
180
OTHER DEALINGS IN THE SOFTWARE.
181

    
182

    
183
=head1 MAGIC MARKERS
184

    
185
 #%# family=manual
186

    
187
=cut
188

    
189

    
190
# Get list of all currently set env variables
191
vars="$(printenv | sed -r "s/^([^=]+).*/\1/g")"
192

    
193
# Certain variables MUST be set; check that they are (using bitmask)
194
setvars=0
195
reqvars=(_logfiles _regex)
196
while read -u 3 -r v; do
197
    n=0
198
    while [ $n -lt "${#reqvars[@]}" ]; do
199
        if echo "$v" | grep -Eq "${reqvars[$n]}$"; then
200
            !((setvars|=$(( 2 ** $n )) ))
201
        fi
202
        !((n++))
203
    done
204
done 3< <(echo "$vars")
205

    
206

    
207
# Sum all required variables
208
n=0
209
allvars=0
210
while [ $n -lt "${#reqvars[@]}" ]; do
211
    !((allvars+=$(( 2 ** $n ))))
212
    !((n++))
213
done
214

    
215
# And scream if something's not set
216
if ! [ "$setvars" -eq "$allvars" ]; then
217
    >&2 echo "E: Missing some required variables:"
218
    >&2 echo
219
    n=0
220
    i=1
221
    while [ $n -lt "${#reqvars[@]}" ]; do
222
        if [ $(( $setvars & $i )) -eq 0 ]; then
223
            >&2 echo "   *${reqvars[$n]}"
224
        fi
225
        i=$((i<<1))
226
        !((n++))
227
    done
228
    >&2 echo
229
    >&2 echo "Please read the docs."
230
    exit 1
231
fi
232

    
233
# Check for more difficult variables
234
if [ -z "$services" ] && [ -z "$services_autoconf" ]; then
235
    >&2 echo "E: You must pass either \$services or \$services_autoconf"
236
    exit 1
237
fi
238
if [ -z "$services_autoconf" ] && ! echo "$vars" | grep -q "_logbinding"; then
239
    >&2 echo "E: You must pass either \$*_logbinding (for each service) or \$services_autoconf"
240
    exit 1
241
fi
242

    
243

    
244
# Now go find all log files
245
LOGFILES=
246
declare -a LOGFILEMAP
247
while read -u 3 -r v; do
248
    if echo "$v" | grep -Eq "_logfiles$"; then
249
        # Get the name associated with these logfiles
250
        logfiletype="${v%_logfiles}"
251
        # This serves to expand globs while preserving spaces (and also appends the necessary newline)
252
        while IFS= read -u 4 -r -d$'\n' line; do
253
            LOGFILEMAP+=($logfiletype)
254
            LOGFILES="${LOGFILES}$line"$'\n'
255
        done 4< <(IFS= ; for f in ${!v}; do echo "$f"; done)
256
    fi
257
done 3< <(echo "$vars")
258

    
259

    
260
# Set some defaults and other values
261
title="${title:-Important Events per Service}"
262
vlabel="${vlabel:-events}"
263

    
264
# If services_autoconf is passed, it is assumed to be a shell glob, the leaves of which are the services
265
# This also autobinds the service, if not already bound
266
if [ -n "$services_autoconf" ]; then
267
    declare -a services
268
    IFS=
269
    for s in $services_autoconf; do
270
        s="$(basename "$s")"
271
        services+=("$s")
272
    done
273
    unset IFS
274
else
275
    services=($services)
276
fi
277

    
278

    
279
# Import munin functions
280
. "$MUNIN_LIBDIR/plugins/plugin.sh"
281

    
282

    
283
# Now get to the real function definitions
284

    
285
function config() {
286
    echo "graph_title ${title}"
287
    echo "graph_args --base 1000 -l 0"
288
    echo "graph_vlabel ${vlabel}"
289
    echo "graph_category other"
290
    echo "graph_info Lists number of matching lines found in various logfiles associated with each service. Extinfo displays currently affected logs."
291

    
292
    local var_prefix
293
    while read -u 3 -r svc; do
294
        var_prefix="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')"
295
        echo "$var_prefix.label $svc"
296
        print_warning "$var_prefix"
297
        print_critical "$var_prefix"
298
        echo "$var_prefix.info Number of event occurrences for $svc"
299
    done 3< <(IFS=$'\n'; echo "${services[*]}")
300
}
301

    
302

    
303

    
304

    
305
function fetch() {
306
    # Load state
307
    touch "$MUNIN_STATEFILE"
308
    local curstate="$(cat "$MUNIN_STATEFILE")"
309
    local nextstate=()
310

    
311
    local n svcnm varnm service svc svc_counter logbinding logfile lognm logmatch prvlines curlines matches extinfo_var
312

    
313
    # Set service counters to 0 and set any logbindings that aren't yet set
314
    while read -u 3 -r svc; do
315
        svcnm="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')"
316
        typeset "${svcnm}_total=0"
317

    
318
        varnm="${svcnm}_logbinding"
319
        if [ -z "$(echo "$curstate" | grep "^${varnm}=" | cut -f 2 -d "=")" ]; then
320
            typeset "$varnm=$svc"
321
        fi
322
    done 3< <(IFS=$'\n'; echo "${services[*]}")
323

    
324
    n=0
325
    while read -u 3 -r logfile; do
326
        # Handling trailing newline
327
        if [ -z "$logfile" ]; then
328
            continue
329
        fi
330

    
331
        # Find which service this logfile is associated with
332
        service=
333
        while read -u 4 -r svc; do
334
            logbinding="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')_logbinding"
335
            if echo "$logfile" | grep -Eq "${!logbinding}"; then
336
                service="$svc"
337
                break
338
            fi
339
        done 4< <(IFS=$'\n'; echo "${services[*]}")
340

    
341
        # Skip this log if it's not associated with any service
342
        if [ -z "$service" ]; then
343
            >&2 echo "W: No service associated with log $logfile. Skipping...."
344
            continue
345
        fi
346

    
347
        # Get shell-compatible names for service and logfile
348
        svcnm="$(echo "$service" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')"
349
        lognm="$(echo "$logfile" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')"
350

    
351
        # Get previous line count to determine whether or not the file may have been rotated (defaulting to 0)
352
        prvlines="$(echo "$curstate" | grep "^${lognm}_lines=" | cut -f 2 -d "=")"
353
        prvlines="${prvlines:-0}"
354

    
355
        # Get the current number of lines in the file (defaulting to 0 on error)
356
        curlines="$(wc -l < "$logfile")"
357
        curlines="${curlines:-0}"
358

    
359
        # If the current line count is less than the previous line count, we've probably rotated.
360
        # Reset to 0.
361
        if [ "$curlines" -lt "$prvlines" ]; then
362
            prvlines=0
363
        else
364
            prvlines=$((prvlines + 1))
365
        fi
366

    
367
        # Get incidents starting at the line after the last line we've seen
368
        logmatch="${LOGFILEMAP[$n]}_regex"
369
        matches="$(tail -n +"$prvlines" "$logfile" | grep -Ec "${!logmatch}" || true)"
370

    
371
        # If there were matches, aggregate them and add this log to the extinfo for the service
372
        if [ "$matches" -gt 0 ]; then
373
            # Aggregate and add to the correct service counter
374
            svc_counter="${svcnm}_total"
375
            !((matches+=${!svc_counter}))
376
            typeset "$svc_counter=$matches"
377

    
378
            # Add this log to extinfo for service
379
            extinfo_var="${svcnm}_extinfo"
380
            typeset "$extinfo_var=${!extinfo_var}$logfile, "
381
        fi
382

    
383
        # Push onto next state
384
        nextstate+=("${lognm}_lines=$curlines")
385

    
386
        !((n++))
387
    done 3< <(echo "$LOGFILES")
388

    
389
    # Write state to munin statefile
390
    (IFS=$'\n'; echo "${nextstate[*]}" > "$MUNIN_STATEFILE")
391

    
392
    # Now echo values
393
    while read -u 3 -r svc; do
394
        svcnm="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')"
395
        svc_counter="${svcnm}_total"
396
        extinfo_var="${svcnm}_extinfo"
397
        echo "${svcnm}.value ${!svc_counter}"
398
        echo "${svcnm}.extinfo ${!extinfo_var}"
399
    done 3< <(IFS=$'\n'; echo "${services[*]}")
400

    
401
    return 0
402
}
403

    
404

    
405

    
406

    
407

    
408
case "$1" in
409
    config) config ;;
410
    *) fetch ;;
411
esac
412