Gaps in Cacti grphs

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

monsoft
Posts: 20
Joined: Fri Oct 16, 2009 4:22 am
Location: London, UK

Gaps in Cacti grphs

Post by monsoft »

I understand that topics like this one were many but I can't find any information how to resolve this problem.

I have one Cacti server but on few graphs there are some gaps, so I installed another test server to see if gaps occur as well. Unfortunately graphs gaps on test server appeared as well.

According to logs from poller there is no problem with time synchronisation:

Code: Select all

10/16/2009 10:25:01 AM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '299', Max Runtime '298', Poller Runs: '1'
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[1] DS[3] CMD: perl /var/www/html/cacti/scripts/linux_memory.pl MemFree:, output: 55788
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[1] DS[4] CMD: perl /var/www/html/cacti/scripts/linux_memory.pl SwapFree:, output: 1572856
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[1] DS[5] CMD: perl /var/www/html/cacti/scripts/loadavg_multi.pl, output: 1min:0.08 5min:0.06 10min:0.02
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[1] DS[6] CMD: perl /var/www/html/cacti/scripts/unix_users.pl , output: 1
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[1] DS[7] CMD: perl /var/www/html/cacti/scripts/unix_processes.pl, output: 99
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[2] DS[8] SNMP: v2: xxx.xx.xxx.xxx, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.97, output: 115299285982486
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[2] DS[8] SNMP: v2: xxx.xx.xxx.xxx, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.97, output: 93330752125784
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[2] DS[9] SNMP: v2: xxx.xx.xxx.xxx, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.103, output: 58357145
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Host[2] DS[9] SNMP: v2: xxx.xx.xxx.xxx, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.103, output: 233857837
10/16/2009 10:25:02 AM - CMDPHP: Poller[0] Time: 0.1500 s, Theads: N/A, Hosts: 2
10/16/2009 10:25:02 AM - SYSTEM STATS: Time:1.1435 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:9 RRDsProcessed:7


So I tried to check one of my rdd files:

Code: Select all

# rrdtool fetch router-1_traffic_in_8.rrd AVERAGE | tail -n20
1255679400: 6.3712042701e+06 4.7278309463e+06
1255679700: 6.4252280325e+06 4.7115762508e+06
1255680000: 6.6766284956e+06 4.9200634360e+06
1255680300: 6.7634168029e+06 5.0558932932e+06
1255680600: 7.1297393991e+06 5.3736627476e+06
1255680900: 7.0947032333e+06 5.3990274100e+06
1255681200: 7.1372324900e+06 5.4529263985e+06
1255681500: 7.0571054821e+06 5.4114678891e+06
1255681800: 7.1198637382e+06 5.3747711200e+06
1255682100: 7.1973048753e+06 5.4498155756e+06
1255682400: 7.2654043852e+06 5.4870659240e+06
1255682700: 7.1622073983e+06 5.4610208159e+06
1255683000: 7.1273687863e+06 5.4593536919e+06
1255683300: 7.2711121127e+06 5.5019282204e+06
1255683600: 7.7067091980e+06 5.5951818648e+06
1255683900: 7.7956630182e+06 5.6751145764e+06
1255684200: 7.8504757613e+06 5.7860009849e+06
1255684500: 7.8077068073e+06 5.8254050257e+06
1255684800: 7.5986619758e+06 5.6957064397e+06
1255685100: nan nan
And I notified there is 2 nan in last line which probably indicate that there is some problem with fetching data by rrdtool, but I couldn’t find anything wrong.

This is output from rddtool info command:

Code: Select all

# rrdtool info router-1_traffic_in_8.rrd
filename = "router-1_traffic_in_8.rrd"
rrd_version = "0003"
step = 300
last_update = 1255685402
ds[traffic_in].type = "COUNTER"
ds[traffic_in].minimal_heartbeat = 600
ds[traffic_in].min = 0.0000000000e+00
ds[traffic_in].max = 1.0000000000e+09
ds[traffic_in].last_ds = "115301587611265"
ds[traffic_in].value = 1.5344191860e+07
ds[traffic_in].unknown_sec = 0
ds[traffic_out].type = "COUNTER"
ds[traffic_out].minimal_heartbeat = 600
ds[traffic_out].min = 0.0000000000e+00
ds[traffic_out].max = 1.0000000000e+09
ds[traffic_out].last_ds = "93332506295063"
ds[traffic_out].value = 1.1694461860e+07
ds[traffic_out].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 600
rra[0].cur_row = 464
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[0].cdp_prep[1].value = NaN
rra[0].cdp_prep[1].unknown_datapoints = 0
rra[1].cf = "AVERAGE"
rra[1].rows = 700
rra[1].cur_row = 79
rra[1].pdp_per_row = 6
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = 0.0000000000e+00
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[1].cdp_prep[1].value = 0.0000000000e+00
rra[1].cdp_prep[1].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 775
rra[2].cur_row = 752
rra[2].pdp_per_row = 24
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = 1.3223543049e+08
rra[2].cdp_prep[0].unknown_datapoints = 0
rra[2].cdp_prep[1].value = 9.9490740540e+07
rra[2].cdp_prep[1].unknown_datapoints = 0
rra[3].cf = "AVERAGE"
rra[3].rows = 797
rra[3].cur_row = 774
rra[3].pdp_per_row = 288
rra[3].xff = 5.0000000000e-01
rra[3].cdp_prep[0].value = 4.9234503971e+08
rra[3].cdp_prep[0].unknown_datapoints = 0
rra[3].cdp_prep[1].value = 3.5266516558e+08
rra[3].cdp_prep[1].unknown_datapoints = 0
rra[4].cf = "MAX"
rra[4].rows = 600
rra[4].cur_row = 274
rra[4].pdp_per_row = 1
rra[4].xff = 5.0000000000e-01
rra[4].cdp_prep[0].value = NaN
rra[4].cdp_prep[0].unknown_datapoints = 0
rra[4].cdp_prep[1].value = NaN
rra[4].cdp_prep[1].unknown_datapoints = 0
rra[5].cf = "MAX"
rra[5].rows = 700
rra[5].cur_row = 292
rra[5].pdp_per_row = 6
rra[5].xff = 5.0000000000e-01
rra[5].cdp_prep[0].value = 7.6708096459e+06
rra[5].cdp_prep[0].unknown_datapoints = 0
rra[5].cdp_prep[1].value = 5.8458505951e+06
rra[5].cdp_prep[1].unknown_datapoints = 0
rra[6].cf = "MAX"
rra[6].rows = 775
rra[6].cur_row = 751
rra[6].pdp_per_row = 24
rra[6].xff = 5.0000000000e-01
rra[6].cdp_prep[0].value = 7.8504757613e+06
rra[6].cdp_prep[0].unknown_datapoints = 0
rra[6].cdp_prep[1].value = 5.8458505951e+06
rra[6].cdp_prep[1].unknown_datapoints = 0
rra[7].cf = "MAX"
rra[7].rows = 797
rra[7].cur_row = 457
rra[7].pdp_per_row = 288
rra[7].xff = 5.0000000000e-01
rra[7].cdp_prep[0].value = 7.8504757613e+06
rra[7].cdp_prep[0].unknown_datapoints = 0
rra[7].cdp_prep[1].value = 5.8458505951e+06
rra[7].cdp_prep[1].unknown_datapoints = 0
Do you have any suggestions what could be wrong ? Maybe I omitted something ?

Many Thanks for any suggestions and help !!!
jay
Cacti User
Posts: 390
Joined: Wed Aug 31, 2005 8:55 am
Location: Bristol, England

Post by jay »

Hi

PLeas read this http://docs.cacti.net/manual:087:4_help ... #debugging and re post.

Are the problem host/s located on the same network as Cacti?? If not it may be latency across a wan thats the issues with no data being returned.

Can you snmpwalk the device in question using the OIDS in question?

Cheers

Jay
Cacti Version 0.8.7e, Spine 0.8.7e, Apache 2.2.15, Mysql 5.0.88, PHP 5.2.13, RRDTool 1.2.30, NET-SNMP 5.5
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75

SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

So far your cacti log and rrdtool outputs look fine. Random graphs in all data typically is a sign of high network latency, cacti thinking the device is down, or your poller is exceeding the default 5 min polling window.
monsoft
Posts: 20
Joined: Fri Oct 16, 2009 4:22 am
Location: London, UK

Post by monsoft »

We though about network latency as well, so this test cacti is connected exactly to router and is only asking this router about 2 interfaces. So network latency has been minimized.
I think that this is problem with rrdtool because I tried another monitoring tool - Cricket - and there were gaps as well. Last try was with MRTG which is not working with rrdtool, and there were no any gaps in graphs.
Or this can be problem with some misconfiguration from my side.
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

Turn the cacti logging level to medium or higher and let it run for a bit. When you get gaps, look back in the cacti.log for what was going on to cause it.

What types of bandwidth are you dealing with? If 1GB interfaces and they're actually used a lot.. the 32bit counter could be wrapping. switch to snmpv2, and 64bit graph templates.
monsoft
Posts: 20
Joined: Fri Oct 16, 2009 4:22 am
Location: London, UK

Post by monsoft »

Yes, 1GB connection and I already setup 64bit counters and snmp v2 (this setup we have on both cacti servers).

Thanks for advice with log level. I will try to change it and and run through weekend .
monsoft
Posts: 20
Joined: Fri Oct 16, 2009 4:22 am
Location: London, UK

Post by monsoft »

I looked to cacti.log file when gaps occur and this is what I found:

Code: Select all

10/19/2009 11:20:04 AM - CMDPHP: Poller[0] Host[2] DS[9] WARNING: Result from SNMP not valid.  Partial Result: U
10/19/2009 11:20:04 AM - CMDPHP: Poller[0] Host[2] DS[9] SNMP: v2: xxx.xxx.xxx.xxx, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.103, output: U
10/19/2009 11:20:06 AM - CMDPHP: Poller[0] Host[2] DS[9] WARNING: Result from SNMP not valid.  Partial Result: U
10/19/2009 11:20:06 AM - CMDPHP: Poller[0] Host[2] DS[9] SNMP: v2: xxx.xxx.xxx.xxx, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.103, output: U
Snmpwalk collecting information without problems.
Is there is anything what I can do with this. Maybe something that repeat snmp query if this problem occur ?
hayden
Posts: 34
Joined: Fri Sep 18, 2009 5:10 pm

Post by hayden »

Im running into almost the same issue here. Mostly I have everything turned off except for ping latency just to minimize if it is traffic issues. Ill post but I just want to follow and see where this is going.
monsoft
Posts: 20
Joined: Fri Oct 16, 2009 4:22 am
Location: London, UK

Post by monsoft »

If all resolves fail maybe I should install logwatch and look into cacti.log file for “output: U” and if it appear try to run another query from a script to update rddtool file.
I don’t know if this is good idea but could work.
hayden
Posts: 34
Joined: Fri Sep 18, 2009 5:10 pm

Post by hayden »

Im just running in High Logging right now to get an idea. But the weird thing is a majority of my sites whether they are being pinged internally via VPNs or external IP are working. And others not so much a lot of spotty little spaces. Even cacti server comes up with spots on its ethernet and latency graphs.
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

monsoft wrote:Snmpwalk collecting information without problems.
Please post the output of the snmpwalk against Host[2] for OID .1.3.6.1.2.1.31.1.1.1.6.103
monsoft
Posts: 20
Joined: Fri Oct 16, 2009 4:22 am
Location: London, UK

Post by monsoft »

This is output from Host[2]

Code: Select all

snmpwalk -v 2c -c xxx xxx.xxx.xxx.xxx .1.3.6.1.2.1.31.1.1.1.6.103
IF-MIB::ifHCInOctets.103 = Counter64: 61347289
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

Hmm, well that looks valid. Try increasing the snmp timeout for that device in cacti.
monsoft
Posts: 20
Joined: Fri Oct 16, 2009 4:22 am
Location: London, UK

Post by monsoft »

Many Thanks.

I will try.

Do you know what command use cacti to send snmp query to device ?
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

monsoft wrote:Do you know what command use cacti to send snmp query to device ?
It's documented in the code ;). Depending on the device, poller, settings, etc.. snmpget, snmpnextget, snmpwalk, etc can all be used.
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest