Projet

Général

Profil

Révision 10b1de81

ID10b1de81bbd2c92e9fb9e897e38231ff8903729a
Parent 758ca724
Enfant c53197ce

Ajouté par Nuno Fachada il y a environ 12 ans

Configurable warning and critical temperatures for GPUs

Voir les différences:

plugins/gpu/amd_gpu_
9 9
usually bundled with AMD GPU driver, to obtain information. To use this
10 10
plugin you have to make sure aticonfig will run without an active X
11 11
server (i.e. without anyone being logged in via the GUI). For more 
12
information on this visit this link: 
12
information about this issue visit the link below: 
13 13
http://www.mayankdaga.com/running-opencl-applications-remotely-on-amd-gpus/
14 14

  
15 15
=head1 CONFIGURATION
......
20 20
This plugin uses the following configuration variables:
21 21

  
22 22
 [amd_gpu_*]
23
  env.aticonfexec - Location of aticonfig executable.
24 23
  user root
24
  env.aticonfexec - Location of aticonfig executable.
25
  env.warning - Warning temperature
26
  env.critical - Critical temperature
25 27

  
26 28
=head2 DEFAULT CONFIGURATION
27 29

  
......
105 107
			while [ $nGpusCounter -lt $nGpus ]
106 108
			do
107 109
				gpuName=`echo "$nGpusOutput" | grep "* 0" | cut -f 1,3 --complement -d " "`
108
				echo "temp${nGpusCounter}.warning 75"
109
				echo "temp${nGpusCounter}.critical 95"
110
				echo "temp${nGpusCounter}.warning ${warning:-75}"
111
				echo "temp${nGpusCounter}.critical ${critical:-95}"
110 112
				echo "temp${nGpusCounter}.info Temperature information for $gpuName"
111 113
				echo "temp${nGpusCounter}.label Temperature ($gpuName)"
112 114
				: $(( nGpusCounter = $nGpusCounter + 1 ))
......
232 234
done
233 235

  
234 236
# TODO Follow multigraph suggestion from Flameeyes to look into multigraph plugins http://munin-monitoring.org/wiki/MultigraphSampleOutput, in order to reduce the amount of round trips to get the data.
235
# TODO Put warning and critical as vars in config with sensible defaults
236 237

  
237 238

  
238 239

  
plugins/gpu/nvidia_gpu_
17 17

  
18 18
 [nvidia_gpu_*]
19 19
  env.smiexec - Location of nvidia-smi executable.
20
  env.warning - Warning temperature
21
  env.critical - Critical temperature
20 22

  
21 23
=head2 DEFAULT CONFIGURATION
22 24

  
......
101 103
			while [ $nGpusCounter -lt $nGpus ]
102 104
			do
103 105
				gpuName=`echo "$nGpusOutput" | sed -n $(( $nGpusCounter + 1 ))p | cut -d \( -f 1`
104
				echo "temp${nGpusCounter}.warning 75"
105
				echo "temp${nGpusCounter}.critical 95"
106
				echo "temp${nGpusCounter}.warning ${warning:-75}"
107
				echo "temp${nGpusCounter}.critical ${critical:-95}"
106 108
				echo "temp${nGpusCounter}.info Temperature information for $gpuName"
107 109
				: $(( nGpusCounter = $nGpusCounter + 1 ))
108 110
			done 
......
205 207
done
206 208

  
207 209
# TODO Follow multigraph suggestion from Flameeyes to look into multigraph plugins http://munin-monitoring.org/wiki/MultigraphSampleOutput, in order to reduce the amount of round trips to get the data.
208
# TODO Put warning and critical as vars in config with sensible defaults
209

  
210 210
# TODO Nvidia only: Add unsupported output options from nvidia-smi for those who have that option (how to test?). Test if they are supported and put them in suggest (or not) in case they are supported (or not)
211 211

  
212 212

  

Formats disponibles : Unified diff