Gaps in Cacti grphs

monsoft · Post by **monsoft** » Fri Oct 16, 2009 4:44 am

I understand that topics like this one were many but I can't find any information how to resolve this problem.

I have one Cacti server but on few graphs there are some gaps, so I installed another test server to see if gaps occur as well. Unfortunately graphs gaps on test server appeared as well.

According to logs from poller there is no problem with time synchronisation:

Code: Select all

10/16/2009 10:25:01 AM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '299', Max Runtime '298', Poller Runs: '1'
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[1] DS[3] CMD: perl /var/www/html/cacti/scripts/linux_memory.pl MemFree:, output: 55788
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[1] DS[4] CMD: perl /var/www/html/cacti/scripts/linux_memory.pl SwapFree:, output: 1572856
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[1] DS[5] CMD: perl /var/www/html/cacti/scripts/loadavg_multi.pl, output: 1min:0.08 5min:0.06 10min:0.02
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[1] DS[6] CMD: perl /var/www/html/cacti/scripts/unix_users.pl , output: 1
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[1] DS[7] CMD: perl /var/www/html/cacti/scripts/unix_processes.pl, output: 99
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[2] DS[8] SNMP: v2: xxx.xx.xxx.xxx, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.97, output: 115299285982486
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[2] DS[8] SNMP: v2: xxx.xx.xxx.xxx, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.97, output: 93330752125784
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[2] DS[9] SNMP: v2: xxx.xx.xxx.xxx, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.103, output: 58357145
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[2] DS[9] SNMP: v2: xxx.xx.xxx.xxx, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.103, output: 233857837
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Time: 0.1500 s, Theads: N/A, Hosts: 2
10/16/2009 10:25:02 AM - SYSTEM STATS: Time:1.1435 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:9 RRDsProcessed:7

So I tried to check one of my rdd files:

Code: Select all

# rrdtool fetch router-1_traffic_in_8.rrd AVERAGE | tail -n20
1255679400: 6.3712042701e+06 4.7278309463e+06
1255679700: 6.4252280325e+06 4.7115762508e+06
1255680000: 6.6766284956e+06 4.9200634360e+06
1255680300: 6.7634168029e+06 5.0558932932e+06
1255680600: 7.1297393991e+06 5.3736627476e+06
1255680900: 7.0947032333e+06 5.3990274100e+06
1255681200: 7.1372324900e+06 5.4529263985e+06
1255681500: 7.0571054821e+06 5.4114678891e+06
1255681800: 7.1198637382e+06 5.3747711200e+06
1255682100: 7.1973048753e+06 5.4498155756e+06
1255682400: 7.2654043852e+06 5.4870659240e+06
1255682700: 7.1622073983e+06 5.4610208159e+06
1255683000: 7.1273687863e+06 5.4593536919e+06
1255683300: 7.2711121127e+06 5.5019282204e+06
1255683600: 7.7067091980e+06 5.5951818648e+06
1255683900: 7.7956630182e+06 5.6751145764e+06
1255684200: 7.8504757613e+06 5.7860009849e+06
1255684500: 7.8077068073e+06 5.8254050257e+06
1255684800: 7.5986619758e+06 5.6957064397e+06
1255685100: nan nan

And I notified there is 2 nan in last line which probably indicate that there is some problem with fetching data by rrdtool, but I couldnâ€™t find anything wrong.

This is output from rddtool info command:

Code: Select all

# rrdtool info router-1_traffic_in_8.rrd
filename = "router-1_traffic_in_8.rrd"
rrd_version = "0003"
step = 300
last_update = 1255685402
ds[traffic_in].type = "COUNTER"
ds[traffic_in].minimal_heartbeat = 600
ds[traffic_in].min = 0.0000000000e+00
ds[traffic_in].max = 1.0000000000e+09
ds[traffic_in].last_ds = "115301587611265"
ds[traffic_in].value = 1.5344191860e+07
ds[traffic_in].unknown_sec = 0
ds[traffic_out].type = "COUNTER"
ds[traffic_out].minimal_heartbeat = 600
ds[traffic_out].min = 0.0000000000e+00
ds[traffic_out].max = 1.0000000000e+09
ds[traffic_out].last_ds = "93332506295063"
ds[traffic_out].value = 1.1694461860e+07
ds[traffic_out].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 600
rra[0].cur_row = 464
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[0].cdp_prep[1].value = NaN
rra[0].cdp_prep[1].unknown_datapoints = 0
rra[1].cf = "AVERAGE"
rra[1].rows = 700
rra[1].cur_row = 79
rra[1].pdp_per_row = 6
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = 0.0000000000e+00
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[1].cdp_prep[1].value = 0.0000000000e+00
rra[1].cdp_prep[1].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 775
rra[2].cur_row = 752
rra[2].pdp_per_row = 24
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = 1.3223543049e+08
rra[2].cdp_prep[0].unknown_datapoints = 0
rra[2].cdp_prep[1].value = 9.9490740540e+07
rra[2].cdp_prep[1].unknown_datapoints = 0
rra[3].cf = "AVERAGE"
rra[3].rows = 797
rra[3].cur_row = 774
rra[3].pdp_per_row = 288
rra[3].xff = 5.0000000000e-01
rra[3].cdp_prep[0].value = 4.9234503971e+08
rra[3].cdp_prep[0].unknown_datapoints = 0
rra[3].cdp_prep[1].value = 3.5266516558e+08
rra[3].cdp_prep[1].unknown_datapoints = 0
rra[4].cf = "MAX"
rra[4].rows = 600
rra[4].cur_row = 274
rra[4].pdp_per_row = 1
rra[4].xff = 5.0000000000e-01
rra[4].cdp_prep[0].value = NaN
rra[4].cdp_prep[0].unknown_datapoints = 0
rra[4].cdp_prep[1].value = NaN
rra[4].cdp_prep[1].unknown_datapoints = 0
rra[5].cf = "MAX"
rra[5].rows = 700
rra[5].cur_row = 292
rra[5].pdp_per_row = 6
rra[5].xff = 5.0000000000e-01
rra[5].cdp_prep[0].value = 7.6708096459e+06
rra[5].cdp_prep[0].unknown_datapoints = 0
rra[5].cdp_prep[1].value = 5.8458505951e+06
rra[5].cdp_prep[1].unknown_datapoints = 0
rra[6].cf = "MAX"
rra[6].rows = 775
rra[6].cur_row = 751
rra[6].pdp_per_row = 24
rra[6].xff = 5.0000000000e-01
rra[6].cdp_prep[0].value = 7.8504757613e+06
rra[6].cdp_prep[0].unknown_datapoints = 0
rra[6].cdp_prep[1].value = 5.8458505951e+06
rra[6].cdp_prep[1].unknown_datapoints = 0
rra[7].cf = "MAX"
rra[7].rows = 797
rra[7].cur_row = 457
rra[7].pdp_per_row = 288
rra[7].xff = 5.0000000000e-01
rra[7].cdp_prep[0].value = 7.8504757613e+06
rra[7].cdp_prep[0].unknown_datapoints = 0
rra[7].cdp_prep[1].value = 5.8458505951e+06
rra[7].cdp_prep[1].unknown_datapoints = 0

Do you have any suggestions what could be wrong ? Maybe I omitted something ?

Many Thanks for any suggestions and help !!!

jay · Post by **jay** » Fri Oct 16, 2009 11:29 am

Hi

PLeas read this http://docs.cacti.net/manual:087:4_help ... #debugging and re post.

Are the problem host/s located on the same network as Cacti?? If not it may be latency across a wan thats the issues with no data being returned.

Can you snmpwalk the device in question using the OIDS in question?

Cheers

Jay

BSOD2600 · Post by **BSOD2600** » Fri Oct 16, 2009 1:03 pm

So far your cacti log and rrdtool outputs look fine. Random graphs in all data typically is a sign of high network latency, cacti thinking the device is down, or your poller is exceeding the default 5 min polling window.

monsoft · Post by **monsoft** » Fri Oct 16, 2009 5:48 pm

We though about network latency as well, so this test cacti is connected exactly to router and is only asking this router about 2 interfaces. So network latency has been minimized.
I think that this is problem with rrdtool because I tried another monitoring tool - Cricket - and there were gaps as well. Last try was with MRTG which is not working with rrdtool, and there were no any gaps in graphs.
Or this can be problem with some misconfiguration from my side.

BSOD2600 · Post by **BSOD2600** » Sat Oct 17, 2009 1:36 am

Turn the cacti logging level to medium or higher and let it run for a bit. When you get gaps, look back in the cacti.log for what was going on to cause it.

What types of bandwidth are you dealing with? If 1GB interfaces and they're actually used a lot.. the 32bit counter could be wrapping. switch to snmpv2, and 64bit graph templates.

monsoft · Post by **monsoft** » Sat Oct 17, 2009 4:14 am

Yes, 1GB connection and I already setup 64bit counters and snmp v2 (this setup we have on both cacti servers).

Thanks for advice with log level. I will try to change it and and run through weekend .

monsoft · Post by **monsoft** » Mon Oct 19, 2009 7:13 am

I looked to cacti.log file when gaps occur and this is what I found:

Code: Select all

10/19/2009 11:20:04 AM - CMDPHP: Poller[0] Host[2] DS[9] WARNING: Result from SNMP not valid.  Partial Result: U
10/19/2009 11:20:04 AM - CMDPHP: Poller[0] Host[2] DS[9] SNMP: v2: xxx.xxx.xxx.xxx, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.103, output: U
10/19/2009 11:20:06 AM - CMDPHP: Poller[0] Host[2] DS[9] WARNING: Result from SNMP not valid.  Partial Result: U
10/19/2009 11:20:06 AM - CMDPHP: Poller[0] Host[2] DS[9] SNMP: v2: xxx.xxx.xxx.xxx, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.103, output: U

Snmpwalk collecting information without problems.
Is there is anything what I can do with this. Maybe something that repeat snmp query if this problem occur ?

hayden · Post by **hayden** » Mon Oct 19, 2009 10:01 am

Im running into almost the same issue here. Mostly I have everything turned off except for ping latency just to minimize if it is traffic issues. Ill post but I just want to follow and see where this is going.

monsoft · Post by **monsoft** » Mon Oct 19, 2009 11:30 am

If all resolves fail maybe I should install logwatch and look into cacti.log file for â€œoutput: Uâ€ and if it appear try to run another query from a script to update rddtool file.
I donâ€™t know if this is good idea but could work.

hayden · Post by **hayden** » Mon Oct 19, 2009 11:59 am

Im just running in High Logging right now to get an idea. But the weird thing is a majority of my sites whether they are being pinged internally via VPNs or external IP are working. And others not so much a lot of spotty little spaces. Even cacti server comes up with spots on its ethernet and latency graphs.

BSOD2600 · Post by **BSOD2600** » Mon Oct 19, 2009 2:41 pm

monsoft wrote:Snmpwalk collecting information without problems.

Please post the output of the snmpwalk against Host[2] for OID .1.3.6.1.2.1.31.1.1.1.6.103

monsoft · Post by **monsoft** » Tue Oct 20, 2009 4:27 am

This is output from Host[2]

Code: Select all

snmpwalk -v 2c -c xxx xxx.xxx.xxx.xxx .1.3.6.1.2.1.31.1.1.1.6.103
IF-MIB::ifHCInOctets.103 = Counter64: 61347289

BSOD2600 · Post by **BSOD2600** » Wed Oct 21, 2009 10:22 pm

Hmm, well that looks valid. Try increasing the snmp timeout for that device in cacti.

monsoft · Post by **monsoft** » Thu Oct 22, 2009 4:13 am

Many Thanks.

I will try.

Do you know what command use cacti to send snmp query to device ?

BSOD2600 · Post by **BSOD2600** » Thu Oct 22, 2009 2:21 pm

monsoft wrote:Do you know what command use cacti to send snmp query to device ?

It's documented in the code

. Depending on the device, poller, settings, etc.. snmpget, snmpnextget, snmpwalk, etc can all be used.

Gaps in Cacti grphs

Gaps in Cacti grphs

Who is online