Projet

Général

Profil

Paste
Télécharger au format
Statistiques
| Branche: | Révision:

root / plugins / emc / emc_vnx_block_lun_perfdata @ 2e02ddc6

Historique | Voir | Annoter | Télécharger (12,8 ko)

1
#!/bin/bash
2

    
3
: <<=cut
4

    
5
=head1 NAME 
6

    
7
 emc_vnx_block_lun_perfdata - Plugin to monitor Block statistics of EMC VNX 5300
8
 Unified Storage Processors
9

    
10
=head1 AUTHOR
11

    
12
 Evgeny Beysembaev <megabotva@gmail.com>
13

    
14
=head1 LICENSE
15

    
16
 GPLv2
17

    
18
=head1 MAGIC MARKERS
19

    
20
  #%# family=auto
21
  #%# capabilities=autoconf
22

    
23
=head1 DESCRIPTION
24

    
25
 The plugin monitors LUN of EMC Unified Storage FLARE SP's. Probably it can also
26
 be compatible with  other Clariion systems. It uses SSH to connect to Control 
27
 Stations, then remotely executes /nas/sbin/navicli and fetches and parses data 
28
 from it. Obviously, it's easy to reconfigure plugin not to use Control Stations'
29
 navicli in favor of using locally installed /opt/Navisphere's cli. There is no 
30
 difference which Storage Processor to use to gather data, so this plugin tries 
31
 both of them and uses the first active one. This plugin also automatically 
32
 chooses Primary Control Station from the list by calling /nasmcd/sbin/getreason 
33
 and /nasmcd/sbin/t2slot.
34
 
35
 I left some parts of this plugin as rudimental to make easy to reconfigure it 
36
 to draw more (or less) data.
37

    
38
=head1 COMPATIBILITY
39

    
40
 The plugin has been written for being compatible with EMC VNX5300 Storage 
41
 system, as this is the only EMC storage which i have. By the way, i am pretty 
42
 sure it can also work with other VNX1 storages, like VNX5100 and VNX5500, and 
43
 old-style Clariion systems.
44
 About VNX2 series, i don't know whether the plugin will be able to work with 
45
 them. Maybe it would need some corrections in command-line backend. The same 
46
 situation is with other EMC systems, so i encourage you to try and fix the 
47
 plugin. 
48
 
49
=head1 CONFIGURATION
50

    
51
=head2 Prerequisites
52

    
53
 First of all, be sure that statistics collection is turned on. You can do this
54
 by typing:
55
 navicli -h spa setstats -on
56
 on your Control Station or locally through /opt/Navisphere 
57

    
58
 Also, the plugin actively uses buggy "cdef" feature of Munin 2.0, and here we 
59
 can be hit by the following bugs:
60
 http://munin-monitoring.org/ticket/1017 - Here I have some workarounds in the 
61
 plugin, be sure that they are working.
62
 http://munin-monitoring.org/ticket/1352 - Metrics in my plugin can be much 
63
 longer than 15 characters.
64
 Without these workarounds "Load" and "Queue Length" would not work.
65

    
66
=head2 Installation
67

    
68
 The plugin uses SSH to connect to Control Stations. It's possible to use 
69
 'nasadmin' user, but it would be better if you create read-only global user by
70
 Unisphere Client. The user should have only Operator role. I created "operator"
71
 user but due to the fact that Control Stations already had one internal 
72
 "operator" user, the new one was called "operator1". So be careful.
73
 
74
 On munin-node side choose a user which will be used to connect through SSH. 
75
 Generally user "munin" is ok. Then, execute "sudo su munin -s /bin/bash", 
76
 "ssh-keygen" and "ssh-copy-id" to both Control Stations with newly created 
77
 user.
78
 
79
 Make a link from /usr/share/munin/plugins/emc_vnx_dm_basic_stats to 
80
 /etc/munin/plugins/emc_vnx_dm_basic_stats_<NAME>, where <NAME> is any 
81
 arbitrary name of your storage system. The plugin will return <NAME> in its 
82
 answer as "host_name" field. Assume your storage system is called "VNX5300".
83
 
84
 Make a configuration file at 
85
 /etc/munin/plugin-conf.d/emc_vnx_block_lun_perfdata_VNX5300:
86
 
87
 [emc_vnx_block_lun_perfdata_VNX5300]
88
 user munin					
89
 env.username operator1				
90
 env.cs_addr 192.168.1.1 192.168.1.2		
91

    
92
 Where: 
93
 user - SSH Client local user
94
 env.username - Remote user with Operator role
95
 env.cs_addr - Control Stations addresses
96

    
97
=head1 ERRATA
98

    
99
 It counts Queue Length in not fully correct way. We take parameters totally 
100
 from both SP's, but after we divide them independently by load of SPA and SPB.
101
 Anyway, in most AAA / ALUA cases the formula is correct.
102

    
103
=head1 HISTORY
104

    
105
 09.11.2016 - First Release
106
 26.12.2016 - Compatibility with Munin coding style
107

    
108
=cut
109

    
110
export LANG=C
111

    
112
. "$MUNIN_LIBDIR/plugins/plugin.sh"
113

    
114
TARGET=$(echo "${0##*/}" | cut -d _ -f 6)
115
# "All SP's we have"
116
SPALL="SPA SPB"
117
NAVICLI="/nas/sbin/navicli"
118

    
119
ssh_check_cmd() {
120
		ssh -q $username@$1 "/nasmcd/sbin/getreason | grep -w slot_\`/nasmcd/sbin/t2slot\` | cut -d- -f1"
121
}
122

    
123

    
124
check_conf () {
125
	if [ -z "$username" ]; then
126
		echo "No username ('username' environment variable)!"
127
		return 1
128
	fi
129

    
130
	if [ -z "$cs_addr" ]; then
131
		echo "No control station addresses ('cs_addr' environment variable)!"
132
		return 1
133
	fi
134

    
135
	#Choosing Cotrol Station. Code have to be "10"
136
	for CS in $cs_addr; do
137
		if [[ "10" -eq "$(ssh_check_cmd \"$CS\")" ]]; then
138
			PRIMARY_CS=$CS
139
			break
140
		fi
141
	done
142

    
143
	if [ -z "$PRIMARY_CS" ]; then
144
		echo "No alive primary Control Station from list \"$cs_addr\"";
145
		return 1
146
	fi
147
	return 0
148
}
149

    
150
if [ "$1" = "autoconf" ]; then
151
	check_conf_ans=$(check_conf)
152
		if [ $? -eq 0 ]; then
153
			echo "yes"
154
		else
155
			echo "no ($check_conf_ans)"
156
		fi
157
		exit 0
158
fi
159

    
160
check_conf 1>&2
161
if [[ $? -eq 1 ]]; then
162
	exit 1;
163
fi
164

    
165
SSH="ssh -q $username@$PRIMARY_CS "
166
get_working_sp() {
167
	local probe_sp
168
	for probe_sp in $SPALL; do
169
		if $SSH $NAVICLI -h $PROBESP >/dev/null 2>&1; then
170
			echo "$probe_sp"
171
			return 0
172
		fi
173
	done
174
}
175

    
176
StorageProcessor=$(get_working_sp)
177
[ -z "$StorageProcessor" ] && echo echo "No active Storage Processor found!" >&2 && exit 1
178

    
179
run_remote_navicli() {
180
    $SSH $NAVICLI -h "$StorageProcessor" "$@"
181
}
182

    
183
# Get Lun List
184
LUNLIST=$(run_remote_navicli "lun -list -drivetype | sed -ne 's/^Name:\ *//p')"
185

    
186
echo "host_name ${TARGET}"
187
echo
188

    
189
if [ "$1" = "config" ] ; then
190
	cat <<-EOF 
191
		multigraph emc_vnx_block_blocks
192
		graph_category disk
193
		graph_title EMC VNX 5300 LUN Blocks
194
		graph_vlabel Blocks Read (-) / Written (+)
195
		graph_args --base 1000
196
		
197
	EOF
198

    
199
	while read -r LUN ; do
200
		LUN="$(clean_fieldname "$LUN")"
201
		cat <<-EOF 
202
			${LUN}_read.label none
203
			${LUN}_read.graph no
204
			${LUN}_read.min 0
205
			${LUN}_read.draw AREA
206
			${LUN}_read.type COUNTER
207
			${LUN}_write.label $LUN Blocks
208
			${LUN}_write.negative ${LUN}_read
209
			${LUN}_write.type COUNTER
210
			${LUN}_write.min 0
211
			${LUN}_write.draw STACK
212
		EOF
213
	done <<< "$LUNLIST"
214

    
215
	cat <<-EOF
216

    
217
		multigraph emc_vnx_block_req
218
		graph_category disk
219
		graph_title EMC VNX 5300 LUN Requests
220
		graph_vlabel Requests: Read (-) / Write (+)
221
		graph_args --base 1000
222
		
223
	EOF
224
	while read -r LUN ; do
225
		LUN="$(clean_fieldname "$LUN")"
226
		cat <<-EOF
227
			${LUN}_readreq.label none
228
			${LUN}_readreq.graph no
229
			${LUN}_readreq.min 0
230
			${LUN}_readreq.type COUNTER
231
			${LUN}_writereq.label $LUN Requests
232
			${LUN}_writereq.negative ${LUN}_readreq
233
			${LUN}_writereq.type COUNTER
234
			${LUN}_writereq.min 0
235
		EOF
236
	done <<< "$LUNLIST"
237

    
238
	cat <<-EOF
239

    
240
		multigraph emc_vnx_block_ticks
241
		graph_category disk
242
		graph_title EMC VNX 5300 Counted Load per LUN
243
		graph_vlabel Load, % * Number of LUNs 
244
		graph_args --base 1000 -l 0 -r 
245
	EOF
246
	echo -n "graph_order "
247
	while read -r LUN ; do
248
		LUN="$(clean_fieldname "$LUN")"
249
		echo -n "${LUN}_busyticks ${LUN}_idleticks ${LUN}_bta=${LUN}_busyticks_spa ${LUN}_idleticks_spa ${LUN}_btb=${LUN}_busyticks_spb ${LUN}_idleticks_spb "
250
	done <<< "$LUNLIST"
251
	echo ""
252
	while read -r LUN ; do
253
		LUN="$(clean_fieldname "$LUN")"
254
		cat <<-EOF
255
			${LUN}_busyticks_spa.label $LUN Busy Ticks SPA
256
			${LUN}_busyticks_spa.type COUNTER
257
			${LUN}_busyticks_spa.graph no
258
			${LUN}_bta.label $LUN Busy Ticks SPA
259
			${LUN}_bta.graph no
260
			${LUN}_idleticks_spa.label $LUN Idle Ticks SPA
261
			${LUN}_idleticks_spa.type COUNTER
262
			${LUN}_idleticks_spa.graph no
263
			${LUN}_busyticks_spb.label $LUN Busy Ticks SPB
264
			${LUN}_busyticks_spb.type COUNTER
265
			${LUN}_busyticks_spb.graph no
266
			${LUN}_btb.label $LUN Busy Ticks SPB
267
			${LUN}_btb.graph no
268
			${LUN}_idleticks_spb.label $LUN Idle Ticks SPB
269
			${LUN}_idleticks_spb.type COUNTER
270
			${LUN}_idleticks_spb.graph no
271
			${LUN}_load_spa.label $LUN load SPA 
272
			${LUN}_load_spa.draw AREASTACK
273
			${LUN}_load_spb.label $LUN load SPB
274
			${LUN}_load_spb.draw AREASTACK
275
			${LUN}_load_spa.cdef 100,${LUN}_bta,${LUN}_busyticks_spa,${LUN}_idleticks_spa,+,/,*
276
			${LUN}_load_spb.cdef 100,${LUN}_btb,${LUN}_busyticks_spa,${LUN}_idleticks_spa,+,/,*
277
		EOF
278
	done <<< "$LUNLIST"
279

    
280
	cat <<-EOF
281
		
282
		multigraph emc_vnx_block_outstanding
283
		graph_category disk
284
		graph_title EMC VNX 5300 Sum of Outstanding Requests
285
		graph_vlabel Requests
286
		graph_args --base 1000
287
	EOF
288
	while read -r LUN ; do
289
		LUN="$(clean_fieldname "$LUN")"
290
		cat <<-EOF 
291
			${LUN}_outstandsum.label $LUN
292
			${LUN}_outstandsum.type COUNTER
293
		EOF
294
	done <<< "$LUNLIST"
295

    
296
	cat <<-EOF
297
		
298
		multigraph emc_vnx_block_nonzeroreq
299
		graph_category disk
300
		graph_title EMC VNX 5300 Non-Zero Request Count Arrivals
301
		graph_vlabel Count Arrivals
302
		graph_args --base 1000
303
	EOF
304
	while read -r LUN ; do
305
		LUN="$(clean_fieldname "$LUN")"
306
		cat <<-EOF
307
			${LUN}_nonzeroreq.label $LUN
308
			${LUN}_nonzeroreq.type COUNTER
309
		EOF
310
	done <<< "$LUNLIST"
311

    
312
	cat <<-EOF
313

    
314
		multigraph emc_vnx_block_trespasses
315
		graph_category disk
316
		graph_title EMC VNX 5300 Trespasses
317
		graph_vlabel Trespasses
318
	EOF
319
	while read -r LUN ; do
320
		LUN="$(clean_fieldname "$LUN")"
321
		cat <<-EOF
322
			${LUN}_implic_tr.label ${LUN} Implicit Trespasses
323
			${LUN}_explic_tr.label ${LUN} Explicit Trespasses
324
		EOF
325
	done <<< "$LUNLIST"
326

    
327
	cat <<-EOF
328

    
329
		multigraph emc_vnx_block_queue
330
		graph_category disk
331
		graph_title EMC VNX 5300 Counted Block Queue Length
332
		graph_vlabel Length
333
	EOF
334
	while read -r LUN ; do
335
		LUN="$(clean_fieldname "$LUN")"
336
		cat <<-EOF
337
			${LUN}_busyticks_spa.label ${LUN}
338
			${LUN}_busyticks_spa.graph no
339
			${LUN}_busyticks_spa.type COUNTER
340
			${LUN}_idleticks_spa.label ${LUN}
341
			${LUN}_idleticks_spa.graph no
342
			${LUN}_idleticks_spa.type COUNTER
343
			${LUN}_busyticks_spb.label ${LUN}
344
			${LUN}_busyticks_spb.graph no
345
			${LUN}_busyticks_spb.type COUNTER
346
			${LUN}_idleticks_spb.label ${LUN}
347
			${LUN}_idleticks_spb.graph no
348
			${LUN}_idleticks_spb.type COUNTER
349
			${LUN}_outstandsum.label ${LUN}
350
			${LUN}_outstandsum.graph no
351
			${LUN}_outstandsum.type COUNTER
352
			${LUN}_nonzeroreq.label ${LUN}
353
			${LUN}_nonzeroreq.graph no
354
			${LUN}_nonzeroreq.type COUNTER
355
			${LUN}_readreq.label ${LUN}
356
			${LUN}_readreq.graph no
357
			${LUN}_readreq.type COUNTER
358
			${LUN}_writereq.label ${LUN}
359
			${LUN}_writereq.graph no
360
			${LUN}_writereq.type COUNTER
361
		EOF
362
# Queue Length SPA = ((Sum of Outstanding Requests SPA - NonZero Request Count Arrivals SPA / 2)/(Host Read Requests SPA + Host Write Requests SPA))*
363
# (Busy Ticks SPA/(Busy Ticks SPA + Idle Ticks SPA)
364
# We count together SPA and SPB, although it is not fully corrext
365
		cat <<-EOF
366
			${LUN}_ql_l_a.label ${LUN} Queue Length SPA
367
			${LUN}_ql_l_a.cdef ${LUN}_outstandsum,${LUN}_nonzeroreq,2,/,-,${LUN}_readreq,${LUN}_writereq,+,/,${LUN}_busyticks_spa,*,${LUN}_busyticks_spa,${LUN}_idleticks_spa,+,/
368
			${LUN}_ql_l_b.label ${LUN} Queue Length SPB
369
			${LUN}_ql_l_b.cdef ${LUN}_outstandsum,${LUN}_nonzeroreq,2,/,-,${LUN}_readreq,${LUN}_writereq,+,/,${LUN}_busyticks_spb,*,${LUN}_busyticks_spb,${LUN}_idleticks_spb,+,/
370
		EOF
371
	done <<< "$LUNLIST"
372
exit 0
373
fi
374

    
375
#Preparing big complex command to SP's to have most work done remotely.
376
BIGSSHCMD="$SSH"
377
while read -r LUN ; do
378
		FILTERLUN="$(clean_fieldname "$LUN")"
379
	BIGSSHCMD+=$(run_remote_navicli "lun -list -name $LUN -perfData | 
380
		sed -ne 's/^Blocks Read\:\ */${FILTERLUN}_read.value /p; 
381
		s/^Blocks Written\:\ */${FILTERLUN}_write.value /p;
382
		s/Read Requests\:\ */${FILTERLUN}_readreq.value /p;
383
		s/Write Requests\:\ */${FILTERLUN}_writereq.value /p;
384
		s/Busy Ticks SP A\:\ */${FILTERLUN}_busyticks_spa.value /p;
385
		s/Idle Ticks SP A\:\ */${FILTERLUN}_idleticks_spa.value /p;
386
		s/Busy Ticks SP B\:\ */${FILTERLUN}_busyticks_spb.value /p;
387
		s/Idle Ticks SP B\:\ */${FILTERLUN}_idleticks_spb.value /p;
388
		s/Sum of Outstanding Requests\:\ */${FILTERLUN}_outstandsum.value /p;
389
		s/Non-Zero Request Count Arrivals\:\ */${FILTERLUN}_nonzeroreq.value /p;
390
		s/Implicit Trespasses\:\ */${FILTERLUN}_implic_tr.value /p;
391
		s/Explicit Trespasses\:\ */${FILTERLUN}_explic_tr.value /p;
392
		' ;")
393
done <<< "$LUNLIST"
394
ANSWER="$($BIGSSHCMD)"
395
echo "multigraph emc_vnx_block_blocks"
396
echo "$ANSWER" | grep "read\.\|write\."
397
echo -e "\nmultigraph emc_vnx_block_req"
398
echo "$ANSWER" | grep "readreq\.\|writereq\."
399

    
400
echo -e "\nmultigraph emc_vnx_block_ticks"
401
while read -r LUN ; do
402
	LUN="$(clean_fieldname "$LUN")"
403
	echo "${LUN}_load_spa.value 0"
404
	echo "${LUN}_load_spb.value 0"
405
done <<< "$LUNLIST"
406
echo "$ANSWER" | grep "busyticks_spa\.\|idleticks_spa\."
407
echo "$ANSWER" | grep "busyticks_spb\.\|idleticks_spb\."
408

    
409
echo -e "\nmultigraph emc_vnx_block_outstanding"
410
echo "$ANSWER" | grep "outstandsum\."
411

    
412
echo -e "\nmultigraph emc_vnx_block_nonzeroreq"
413
echo "$ANSWER" | grep "nonzeroreq\."
414

    
415
echo -e "\nmultigraph emc_vnx_block_trespasses"
416
echo "$ANSWER" | grep "implic_tr\.\|explic_tr\."
417

    
418
echo -e "\nmultigraph emc_vnx_block_queue"
419
# Queue Length
420
	echo "$ANSWER" | grep "busyticks"
421
	echo "$ANSWER" | grep "idleticks."
422
	echo "$ANSWER" | grep "outstandsum\."
423
	echo "$ANSWER" | grep "nonzeroreq\."
424
	echo "$ANSWER" | grep "readreq\."
425
	echo "$ANSWER" | grep "writereq\."
426
while read -r LUN ; do
427
	LUN="$(clean_fieldname "$LUN")"
428
	echo "${LUN}_ql_l_a.value 0 "
429
	echo "${LUN}_ql_l_b.value 0 "
430
done <<< "$LUNLIST"
431
exit 0