Révision 10b1de81
Configurable warning and critical temperatures for GPUs
| plugins/gpu/amd_gpu_ | ||
|---|---|---|
| 9 | 9 |
usually bundled with AMD GPU driver, to obtain information. To use this |
| 10 | 10 |
plugin you have to make sure aticonfig will run without an active X |
| 11 | 11 |
server (i.e. without anyone being logged in via the GUI). For more |
| 12 |
information on this visit this link:
|
|
| 12 |
information about this issue visit the link below:
|
|
| 13 | 13 |
http://www.mayankdaga.com/running-opencl-applications-remotely-on-amd-gpus/ |
| 14 | 14 |
|
| 15 | 15 |
=head1 CONFIGURATION |
| ... | ... | |
| 20 | 20 |
This plugin uses the following configuration variables: |
| 21 | 21 |
|
| 22 | 22 |
[amd_gpu_*] |
| 23 |
env.aticonfexec - Location of aticonfig executable. |
|
| 24 | 23 |
user root |
| 24 |
env.aticonfexec - Location of aticonfig executable. |
|
| 25 |
env.warning - Warning temperature |
|
| 26 |
env.critical - Critical temperature |
|
| 25 | 27 |
|
| 26 | 28 |
=head2 DEFAULT CONFIGURATION |
| 27 | 29 |
|
| ... | ... | |
| 105 | 107 |
while [ $nGpusCounter -lt $nGpus ] |
| 106 | 108 |
do |
| 107 | 109 |
gpuName=`echo "$nGpusOutput" | grep "* 0" | cut -f 1,3 --complement -d " "` |
| 108 |
echo "temp${nGpusCounter}.warning 75"
|
|
| 109 |
echo "temp${nGpusCounter}.critical 95"
|
|
| 110 |
echo "temp${nGpusCounter}.warning ${warning:-75}"
|
|
| 111 |
echo "temp${nGpusCounter}.critical ${critical:-95}"
|
|
| 110 | 112 |
echo "temp${nGpusCounter}.info Temperature information for $gpuName"
|
| 111 | 113 |
echo "temp${nGpusCounter}.label Temperature ($gpuName)"
|
| 112 | 114 |
: $(( nGpusCounter = $nGpusCounter + 1 )) |
| ... | ... | |
| 232 | 234 |
done |
| 233 | 235 |
|
| 234 | 236 |
# TODO Follow multigraph suggestion from Flameeyes to look into multigraph plugins http://munin-monitoring.org/wiki/MultigraphSampleOutput, in order to reduce the amount of round trips to get the data. |
| 235 |
# TODO Put warning and critical as vars in config with sensible defaults |
|
| 236 | 237 |
|
| 237 | 238 |
|
| 238 | 239 |
|
| plugins/gpu/nvidia_gpu_ | ||
|---|---|---|
| 17 | 17 |
|
| 18 | 18 |
[nvidia_gpu_*] |
| 19 | 19 |
env.smiexec - Location of nvidia-smi executable. |
| 20 |
env.warning - Warning temperature |
|
| 21 |
env.critical - Critical temperature |
|
| 20 | 22 |
|
| 21 | 23 |
=head2 DEFAULT CONFIGURATION |
| 22 | 24 |
|
| ... | ... | |
| 101 | 103 |
while [ $nGpusCounter -lt $nGpus ] |
| 102 | 104 |
do |
| 103 | 105 |
gpuName=`echo "$nGpusOutput" | sed -n $(( $nGpusCounter + 1 ))p | cut -d \( -f 1` |
| 104 |
echo "temp${nGpusCounter}.warning 75"
|
|
| 105 |
echo "temp${nGpusCounter}.critical 95"
|
|
| 106 |
echo "temp${nGpusCounter}.warning ${warning:-75}"
|
|
| 107 |
echo "temp${nGpusCounter}.critical ${critical:-95}"
|
|
| 106 | 108 |
echo "temp${nGpusCounter}.info Temperature information for $gpuName"
|
| 107 | 109 |
: $(( nGpusCounter = $nGpusCounter + 1 )) |
| 108 | 110 |
done |
| ... | ... | |
| 205 | 207 |
done |
| 206 | 208 |
|
| 207 | 209 |
# TODO Follow multigraph suggestion from Flameeyes to look into multigraph plugins http://munin-monitoring.org/wiki/MultigraphSampleOutput, in order to reduce the amount of round trips to get the data. |
| 208 |
# TODO Put warning and critical as vars in config with sensible defaults |
|
| 209 |
|
|
| 210 | 210 |
# TODO Nvidia only: Add unsupported output options from nvidia-smi for those who have that option (how to test?). Test if they are supported and put them in suggest (or not) in case they are supported (or not) |
| 211 | 211 |
|
| 212 | 212 |
|
Formats disponibles : Unified diff