NVIDIA GPU Graphs ALMOST Working (Empty Graphs)

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
cowboycraig
Posts: 6
Joined: Sun Oct 11, 2020 3:37 pm

NVIDIA GPU Graphs ALMOST Working (Empty Graphs)

Post by cowboycraig »

There is a great project https://sourceforge.net/projects/nvgpu-smi-snmp/ that pulls NVIDIA GPU data from nvgpu-smi and makes available to SNMPD. It is running and good to go:

I can grab all the NVIDIA IOD's via snmpwalk locally and remotely:
craig@pop-os:~/Documents/nvidia/nvgpu-smi-snmp-1.1$ snmpwalk -c public -v2c localhost 1.3.6.1.4.1.2021.13.42.2
iso.3.6.1.4.1.2021.13.42.2.1.1.0 = INTEGER: 0
iso.3.6.1.4.1.2021.13.42.2.1.2.0 = STRING: "GeForce GTX 1060 3GB"
iso.3.6.1.4.1.2021.13.42.2.1.3.0 = STRING: "86.06.59.00.69"
iso.3.6.1.4.1.2021.13.42.2.1.4.0 = STRING: "418.152.00"
iso.3.6.1.4.1.2021.13.42.2.1.8.0 = INTEGER: 3016
iso.3.6.1.4.1.2021.13.42.2.1.10.0 = INTEGER: 54
iso.3.6.1.4.1.2021.13.42.2.1.13.0 = INTEGER: 102
iso.3.6.1.4.1.2021.13.42.2.1.24.0 = INTEGER: 253
iso.3.6.1.4.1.2021.13.42.2.1.25.0 = INTEGER: 405
iso.3.6.1.4.1.2021.13.42.2.1.26.0 = STRING: "GPU-ae167601-2ba0-890c-7aca-aa14ba4deb47"
iso.3.6.1.4.1.2021.13.42.2.1.27.0 = INTEGER: 8
iso.3.6.1.4.1.2021.13.42.2.1.28.0 = INTEGER: 0
iso.3.6.1.4.1.2021.13.42.2.1.29.0 = INTEGER: 0
iso.3.6.1.4.1.2021.13.42.2.1.30.0 = INTEGER: 1815
iso.3.6.1.4.1.2021.13.42.2.1.31.0 = INTEGER: 12
iso.3.6.1.4.1.2021.13.42.2.1.32.0 = INTEGER: 120


All is configured in Cacti and seems like "Should be creating graphs" - but the graphs are blank.
Screen Shot 2020-10-11 at 2.09.40 PM.png
Screen Shot 2020-10-11 at 2.09.40 PM.png (572.86 KiB) Viewed 1702 times
Running Debug on the graphs gets the dreaded "Valid Data X"
Screen Shot 2020-10-11 at 2.09.59 PM.png
Screen Shot 2020-10-11 at 2.09.59 PM.png (586.23 KiB) Viewed 1702 times
Looking deeper "Data Source returned Bad Results for snmp_oid" But But But? It's good! Don't get why the result would be bad?

craig@pop-os:~$ snmpwalk -c public -v2c localhost .1.3.6.1.4.1.2021.13.42.2.1.10
iso.3.6.1.4.1.2021.13.42.2.1.10.0 = INTEGER: 54

craig@pop-os:~$ snmpwalk -c public -v2c localhost NV-CTRL-MIB::nvCtrlGPUCoreTemp
NV-CTRL-MIB::nvCtrlGPUCoreTemp.0 = INTEGER: 56
Screen Shot 2020-10-11 at 2.10.18 PM.png
Screen Shot 2020-10-11 at 2.10.18 PM.png (349.58 KiB) Viewed 1702 times
More details
Screen Shot 2020-10-11 at 2.11.02 PM.png
Screen Shot 2020-10-11 at 2.11.02 PM.png (337.19 KiB) Viewed 1702 times
Screen Shot 2020-10-11 at 2.11.29 PM.png
Screen Shot 2020-10-11 at 2.11.29 PM.png (286.59 KiB) Viewed 1702 times

When running the debug got this in cacti.log
10/11/2020 14:40:03 - DSDEBUG Bad Data Found for Data Source ID 6
10/11/2020 14:40:03 - DSDEBUG Bad Data Found for Data Source ID 7
10/11/2020 14:40:03 - DSDEBUG Bad Data Found for Data Source ID 10
10/11/2020 14:40:03 - DSDEBUG Bad Data Found for Data Source ID 8
10/11/2020 14:40:03 - DSDEBUG Bad Data Found for Data Source ID 9


When running this is the error over and over in cacti.log
10/11/2020 15:20:01 - POLLER: Poller[1] WARNING: Invalid Response(s), Errors[5] Device[1] Thread[1] DS[6, 7, 8, 9, 10]
10/11/2020 15:20:02 - SYSTEM STATS: Time:1.2265 Method:cmd.php Processes:1 Threads:0 Hosts:1 HostsPerProcess:1 DataSources:10 RRDsProcessed:10


This is all running on same machine. Very cool to get graphs of GPU utilization - Let's Rock This, Fix IT, and Share with the Cacti-Universe!

Thanks!
Craig
cowboycraig
Posts: 6
Joined: Sun Oct 11, 2020 3:37 pm

Re: NVIDIA GPU Graphs ALMOST Working (Empty Graphs)

Post by cowboycraig »

Help! : )
cigamit
Developer
Posts: 3367
Joined: Thu Apr 07, 2005 3:29 pm
Location: B/CS Texas
Contact:

Re: NVIDIA GPU Graphs ALMOST Working (Empty Graphs)

Post by cigamit »

What Cacti version? Everything looks correct. Can you post what your Data Template looks like (instead of the Data Source)?
cowboycraig
Posts: 6
Joined: Sun Oct 11, 2020 3:37 pm

Re: NVIDIA GPU Graphs ALMOST Working (Empty Graphs)

Post by cowboycraig »

cigamit wrote: Mon Oct 12, 2020 11:19 pm What Cacti version? Everything looks correct. Can you post what your Data Template looks like (instead of the Data Source)?
Version 1.2.10

Not using Spine.

Thanks for reply. Hope I am sending the right info requested.

All the graphs created GPU* are using the "SNMP - Generic OID Template"
Screen Shot 2020-10-12 at 10.46.47 PM.png
Screen Shot 2020-10-12 at 10.46.47 PM.png (394.34 KiB) Viewed 1684 times
The "SNMP - Generic OID Template" is all default
Screen Shot 2020-10-12 at 10.49.17 PM.png
Screen Shot 2020-10-12 at 10.49.17 PM.png (585.41 KiB) Viewed 1684 times
Screen Shot 2020-10-12 at 10.49.30 PM.png
Screen Shot 2020-10-12 at 10.49.30 PM.png (553.75 KiB) Viewed 1684 times
Screen Shot 2020-10-12 at 10.49.41 PM.png
Screen Shot 2020-10-12 at 10.49.41 PM.png (549.27 KiB) Viewed 1684 times
cowboycraig
Posts: 6
Joined: Sun Oct 11, 2020 3:37 pm

Re: NVIDIA GPU Graphs ALMOST Working (Empty Graphs)

Post by cowboycraig »

From the logs:

10/12/2020 23:50:02 - POLLER: Poller[1] WARNING: Invalid Response(s), Errors[5] Device[1] Thread[1] DS[6, 7, 8, 9, 10]
10/12/2020 23:50:02 - SYSTEM STATS: Time:1.2417 Method:cmd.php Processes:1 Threads:0 Hosts:1 HostsPerProcess:1 DataSources:10 RRDsProcessed:10

Wish there was a way to "SEE" what the "Invalid Response" is.

When doing an snmpwalk a Valid response seems the case :-?

craig@pop-os:~$ snmpwalk -c public -v1 localhost .1.3.6.1.4.1.2021.13.42.2.1.10
iso.3.6.1.4.1.2021.13.42.2.1.10.0 = INTEGER: 48
cigamit
Developer
Posts: 3367
Joined: Thu Apr 07, 2005 3:29 pm
Location: B/CS Texas
Contact:

Re: NVIDIA GPU Graphs ALMOST Working (Empty Graphs)

Post by cigamit »

I seem to recall an issue when using "SNMP - Generic OID Template" but don't remember what version it was in.

What I would do instead, is create a new "Data Source Template" using "Get SNMP Data" as the method and put the OID in there. Then change the Graph Template to use that Data Template instead. Delete your old graphs and recreate them. I just did it this way for some Tripplite PDU graphs without issue on 1.2.14 a few days ago.

Do you have your Logging set to MEDIUM? You should be getting the actual results in the logs if so.
cowboycraig
Posts: 6
Joined: Sun Oct 11, 2020 3:37 pm

Re: NVIDIA GPU Graphs ALMOST Working (Empty Graphs)

Post by cowboycraig »

cigamit wrote: Tue Oct 13, 2020 1:30 am I seem to recall an issue when using "SNMP - Generic OID Template" but don't remember what version it was in.

What I would do instead, is create a new "Data Source Template" using "Get SNMP Data" as the method and put the OID in there. Then change the Graph Template to use that Data Template instead. Delete your old graphs and recreate them. I just did it this way for some Tripplite PDU graphs without issue on 1.2.14 a few days ago.

Do you have your Logging set to MEDIUM? You should be getting the actual results in the logs if so.
Thanks, hacking at it now. Will reply when get something working.
cowboycraig
Posts: 6
Joined: Sun Oct 11, 2020 3:37 pm

Re: NVIDIA GPU Graphs ALMOST Working (Empty Graphs)

Post by cowboycraig »

cigamit wrote: Tue Oct 13, 2020 1:30 am I seem to recall an issue when using "SNMP - Generic OID Template" but don't remember what version it was in.

What I would do instead, is create a new "Data Source Template" using "Get SNMP Data" as the method and put the OID in there. Then change the Graph Template to use that Data Template instead. Delete your old graphs and recreate them. I just did it this way for some Tripplite PDU graphs without issue on 1.2.14 a few days ago.

Do you have your Logging set to MEDIUM? You should be getting the actual results in the logs if so.
Thanks so much for the help, have been hacking at this. Not quite getting it working. Could I trouble you to list the steps to make graphs this way?

THANKS!
Craig
Post Reply

Who is online

Users browsing this forum: No registered users and 3 guests