Two similar hosts, same SNMP query, NaNs on one!

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
Etzeitet
Posts: 2
Joined: Mon Jan 26, 2015 4:43 am

Two similar hosts, same SNMP query, NaNs on one!

Post by Etzeitet »

Hello

I have discovered an annoying issue with one of my Cacti hosts.

It is a Cisco Wireless Controller and some of the associated SNMP queries are not behaving. The graphs are showing as NaN. The query being used is identical to that being used by another Cisco WLC and the graphs for that host work fine. Both WLCs are identical models and running identical software versions.

I have tried removing the host and re-adding but the same thing happens.

Using RRDTOOL I get the following output for the BAD HOST:

Code: Select all

filename = "bad_host_apassoc_1413.rrd"
rrd_version = "0003"
step = 300
last_update = 1422266462
header_size = 2040
ds[apassoc].index = 0
ds[apassoc].type = "GAUGE"
ds[apassoc].minimal_heartbeat = 600
ds[apassoc].min = 0.0000000000e+00
ds[apassoc].max = 1.0000000000e+12
ds[apassoc].last_ds = "U"
ds[apassoc].value = NaN
ds[apassoc].unknown_sec = 62
rra[0].cf = "AVERAGE"
rra[0].rows = 600
rra[0].cur_row = 577
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[1].cf = "AVERAGE"
rra[1].rows = 700
rra[1].cur_row = 210
rra[1].pdp_per_row = 6
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = 0.0000000000e+00
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 775
rra[2].cur_row = 667
rra[2].pdp_per_row = 24
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = 0.0000000000e+00
rra[2].cdp_prep[0].unknown_datapoints = 0
rra[3].cf = "AVERAGE"
rra[3].rows = 797
rra[3].cur_row = 704
rra[3].pdp_per_row = 288
rra[3].xff = 5.0000000000e-01
rra[3].cdp_prep[0].value = 0.0000000000e+00
rra[3].cdp_prep[0].unknown_datapoints = 119
rra[4].cf = "MAX"
rra[4].rows = 600
rra[4].cur_row = 430
rra[4].pdp_per_row = 1
rra[4].xff = 5.0000000000e-01
rra[4].cdp_prep[0].value = NaN
rra[4].cdp_prep[0].unknown_datapoints = 0
rra[5].cf = "MAX"
rra[5].rows = 700
rra[5].cur_row = 293
rra[5].pdp_per_row = 6
rra[5].xff = 5.0000000000e-01
rra[5].cdp_prep[0].value = -inf
rra[5].cdp_prep[0].unknown_datapoints = 0
rra[6].cf = "MAX"
rra[6].rows = 775
rra[6].cur_row = 59
rra[6].pdp_per_row = 24
rra[6].xff = 5.0000000000e-01
rra[6].cdp_prep[0].value = -inf
rra[6].cdp_prep[0].unknown_datapoints = 0
rra[7].cf = "MAX"
rra[7].rows = 797
rra[7].cur_row = 218
rra[7].pdp_per_row = 288
rra[7].xff = 5.0000000000e-01
rra[7].cdp_prep[0].value = 0.0000000000e+00
rra[7].cdp_prep[0].unknown_datapoints = 119
The output for a working rrd file on the GOOD HOST

Code: Select all

filename = "good_host_apassoc_964.rrd"
rrd_version = "0003"
step = 300
last_update = 1422266755
header_size = 2040
ds[apassoc].index = 0
ds[apassoc].type = "GAUGE"
ds[apassoc].minimal_heartbeat = 600
ds[apassoc].min = 0.0000000000e+00
ds[apassoc].max = 1.0000000000e+12
ds[apassoc].last_ds = "32"
ds[apassoc].value = 1.7600000000e+03
ds[apassoc].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 600
rra[0].cur_row = 402
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[1].cf = "AVERAGE"
rra[1].rows = 700
rra[1].cur_row = 491
rra[1].pdp_per_row = 6
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = 3.1640000000e+01
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 775
rra[2].cur_row = 71
rra[2].pdp_per_row = 24
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = 3.1640000000e+01
rra[2].cdp_prep[0].unknown_datapoints = 0
rra[3].cf = "AVERAGE"
rra[3].rows = 797
rra[3].cur_row = 220
rra[3].pdp_per_row = 288
rra[3].xff = 5.0000000000e-01
rra[3].cdp_prep[0].value = 4.8022666667e+02
rra[3].cdp_prep[0].unknown_datapoints = 0
rra[4].cf = "MAX"
rra[4].rows = 600
rra[4].cur_row = 493
rra[4].pdp_per_row = 1
rra[4].xff = 5.0000000000e-01
rra[4].cdp_prep[0].value = NaN
rra[4].cdp_prep[0].unknown_datapoints = 0
rra[5].cf = "MAX"
rra[5].rows = 700
rra[5].cur_row = 393
rra[5].pdp_per_row = 6
rra[5].xff = 5.0000000000e-01
rra[5].cdp_prep[0].value = 3.1640000000e+01
rra[5].cdp_prep[0].unknown_datapoints = 0
rra[6].cf = "MAX"
rra[6].rows = 775
rra[6].cur_row = 292
rra[6].pdp_per_row = 24
rra[6].xff = 5.0000000000e-01
rra[6].cdp_prep[0].value = 3.1640000000e+01
rra[6].cdp_prep[0].unknown_datapoints = 0
rra[7].cf = "MAX"
rra[7].rows = 797
rra[7].cur_row = 444
rra[7].pdp_per_row = 288
rra[7].xff = 5.0000000000e-01
rra[7].cdp_prep[0].value = 3.2000000000e+01
rra[7].cdp_prep[0].unknown_datapoints = 0
In the BAD HOST RRD file I ran rrdtool fetch <file> AVERAGE. There is oddly one good entry - not sure why that worked.

Code: Select all

snip...
1422262500: -nan
1422262800: -nan
1422263100: -nan
1422263400: -nan
1422263700: -nan
1422264000: -nan
1422264300: -nan
1422264600: -nan
1422264900: -nan
1422265200: -nan
1422265500: -nan
1422265800: -nan
1422266100: -nan
1422266400: 0.0000000000e+00
1422266700: -nan
1422267000: -nan
1422267300: -nan
1422267600: -nan
1422267900: -nan
1422268200: -nan
In the logs I can see the following entries for the BAD HOST

Code: Select all

01/23/2015 04:11:03 PM - CMDPHP: Poller[0] Host[106] DS[1400] CMD: /usr/bin/php -q /var/www/html/cacti/scripts/Cisco_WLC_APs_Assoc.php 1.1.1.1 mycommunity 2 get apassoc 28:34:A2:0C:BB:90, output: 10
01/23/2015 04:11:03 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/rrdtool/bin/rrdtool update /var/www/html/cacti/rra/bad_host_apassoc_1400.rrd --template apassoc 1422029462:U
01/23/2015 04:11:03 PM - CMDPHP: Poller[0] Host[106] DS[1399] CMD: /usr/bin/php -q /var/www/html/cacti/scripts/Cisco_WLC_APs_Assoc.php 1.1.1.1 mycommunity 2 get apassoc 28:34:A2:7E:ED:80, output: 3
01/23/2015 04:11:03 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/rrdtool/bin/rrdtool update /var/www/html/cacti/rra/bad_host_apassoc_1399.rrd --template apassoc 1422029462:U
01/23/2015 04:11:03 PM - CMDPHP: Poller[0] Host[106] DS[1398] CMD: /usr/bin/php -q /var/www/html/cacti/scripts/Cisco_WLC_APs_Assoc.php 1.1.1.1 mycommunity 2 get apassoc 28:34:A2:4E:E3:80, output: 2
01/23/2015 04:11:03 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/rrdtool/bin/rrdtool update /var/www/html/cacti/rra/bad_host_apassoc_1398.rrd --template apassoc 1422029462:U
01/23/2015 04:11:03 PM - CMDPHP: Poller[0] Host[106] DS[1397] CMD: /usr/bin/php -q /var/www/html/cacti/scripts/Cisco_WLC_APs_Assoc.php 1.1.1.1 mycommunity 2 get apassoc 28:34:A2:0C:56:60, output: 4
01/23/2015 04:11:03 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/rrdtool/bin/rrdtool update /var/www/html/cacti/rra/bad_host_apassoc_1397.rrd --template apassoc 1422029462:U
01/23/2015 04:11:04 PM - CMDPHP: Poller[0] Host[106] DS[1396] CMD: /usr/bin/php -q /var/www/html/cacti/scripts/Cisco_WLC_APs_Assoc.php 1.1.1.1 mycommunity 2 get apassoc 28:34:A2:7E:EC:E0, output: 0
and for the GOOD HOST

Code: Select all

01/23/2015 04:10:54 PM - CMDPHP: Poller[0] Host[67] DS[892] CMD: /usr/bin/php -q /var/www/html/cacti/scripts/Cisco_WLC_APs_Assoc.php 2.2.2.2 mycommunity 2 get apassoc 34:BD:C8:AC:1D:40, output: 0
01/23/2015 04:10:54 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/rrdtool/bin/rrdtool update /var/www/html/cacti/rra/good_host_apassoc_892.rrd --template apassoc 1422029454:0
01/23/2015 04:10:54 PM - CMDPHP: Poller[0] Host[67] DS[894] CMD: /usr/bin/php -q /var/www/html/cacti/scripts/Cisco_WLC_APs_Assoc.php 2.2.2.2 mycommunity 2 get apassoc 88:75:56:DA:8D:E0, output: 8
01/23/2015 04:10:54 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/rrdtool/bin/rrdtool update /var/www/html/cacti/rra/good_host_apassoc_894.rrd --template apassoc 1422029454:8
01/23/2015 04:10:55 PM - CMDPHP: Poller[0] Host[67] DS[895] CMD: /usr/bin/php -q /var/www/html/cacti/scripts/Cisco_WLC_APs_Assoc.php 2.2.2.2 mycommunity 2 get apassoc 88:75:56:DB:3A:00, output: 1
01/23/2015 04:10:55 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/rrdtool/bin/rrdtool update /var/www/html/cacti/rra/good_host_apassoc_895.rrd --template apassoc 1422029454:1
01/23/2015 04:10:55 PM - CMDPHP: Poller[0] Host[67] DS[896] CMD: /usr/bin/php -q /var/www/html/cacti/scripts/Cisco_WLC_APs_Assoc.php 2.2.2.2 mycommunity 2 get apassoc 88:75:56:DB:3B:80, output: 11
01/23/2015 04:10:55 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/rrdtool/bin/rrdtool update /var/www/html/cacti/rra/good_host_apassoc_896.rrd --template apassoc 1422029454:11
01/23/2015 04:10:55 PM - CMDPHP: Poller[0] Host[67] DS[897] CMD: /usr/bin/php -q /var/www/html/cacti/scripts/Cisco_WLC_APs_Assoc.php 2.2.2.2 mycommunity 2 get apassoc 88:75:56:DB:40:90, output: 0
As you can see, in both hosts the script successfully returns valid values, but when updating the RRD file, a U is sent in the case of the BAD HOST. I can't see anything in the logs to suggest why Cacti is treating the result as a U and not a valid value. The script is the exact same.

Running the script manually as the cacti user yields proper results and the OIDs etc are correct for both hosts.

I have done the following steps in gandalfs troubleshooting guide:
  1. Check Cacti Log File - no errors regarding SNMP
  • Check Basic Data Gathering - no problem with the script as it outputs as expected and is working for GOOD HOST
  • Check Cacti's Poller - Poller is running script correctly and expected results are being returned (see logs above)
  • Check MySQL Updating - No issue as far as I can see, again other hosts are fine
  • Check RRD File Updating - Yes, see logs above
  • Check RRD File Numbers - All values are between MIN and MAX, although it looks like one value made it into the rrd file (See above)
  • Check rrdtool Graph Statement - The graph statement looks fine
I am at a bit of a loss as to what is happening. Any help would be greatly appreciated.

Regards
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Re: Two similar hosts, same SNMP query, NaNs on one!

Post by BSOD2600 »

Hmm interesting problem indeed.

Cacti version?
Using 1 or 5 min polling?
Recall any changes that you made to Cacti between the time the good and bad hosts were created (like imported other templates, etc)?
In DS[1400], does it allow you to change the min/max acceptable values? Not sure what the template author set up which might not be standard...
Etzeitet
Posts: 2
Joined: Mon Jan 26, 2015 4:43 am

Re: Two similar hosts, same SNMP query, NaNs on one!

Post by Etzeitet »

BSOD2600 wrote:Hmm interesting problem indeed.

Cacti version?
0.8.8a
Using 1 or 5 min polling?
5 Minute polling (using PHP Poller)
Recall any changes that you made to Cacti between the time the good and bad hosts were created (like imported other templates, etc)?
Nothing I am aware of. I am not the only administrator for Cacti so there are others who make changes. Though I asked about and no one recalls any changes to those templates (or any others).
In DS[1400], does it allow you to change the min/max acceptable values? Not sure what the template author set up which might not be standard...
Do you mean change this via the graph template or using RRDTOOL directly on the rra file? Any settings/defaults in those templates are exactly as they were when they were installed.
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Re: Two similar hosts, same SNMP query, NaNs on one!

Post by BSOD2600 »

Etzeitet wrote:
In DS[1400], does it allow you to change the min/max acceptable values? Not sure what the template author set up which might not be standard...
Do you mean change this via the graph template or using RRDTOOL directly on the rra file? Any settings/defaults in those templates are exactly as they were when they were installed.
Actually meant Data Sources -> find data source # 1400, and then see if there are custom fields (not typically present... but hard to say what custom scripts/templates implement).

Well lets try the shotgun approach since you've done some good digging already.
- Upgrade to cacti 0.8.8c
- clear poller cache
if still broken, then try a delete of the data source(s)/graph(s) for BADHOST and recreate. Fixed?
Post Reply

Who is online

Users browsing this forum: No registered users and 4 guests