gaps in graphs

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
mengesb
Posts: 16
Joined: Fri Jun 08, 2007 4:41 pm

gaps in graphs

Post by mengesb »

I've been running cacti for a while, and it wasn't showing any gaps in either of my two monitored hosts, but now all the sudden its gotten real flakey on my virtual server host.

I'm not quite sure what's the problem, because its never been flakey since setup.

http://sano.cat6wired.net/cacti/graph_v ... &leaf_id=9

NET-SNMP version: 5.3.1
cacti 0.8.6j

/etc/snmp/snmpd.conf of the server that has cacti installed:

Code: Select all

####
# First, map the community name "public" into a "security name"

#       sec.name  source          community
com2sec notConfigUser  default       public

#rocommunity public
#disk /

####
# Second, map the security name into a group name:

#       groupName      securityModel securityName
group   notConfigGroup v1           notConfigUser
group   notConfigGroup v2c           notConfigUser

####
# Third, create a view for us to let the group have rights to:

# Make at least  snmpwalk -v 1 localhost -c public system fast again.
#       name           incl/excl     subtree         mask(optional)
view    systemview    included   .1.3.6.1.2.1.1
view    systemview      included        .1      80
view    systemview      included        .1
view    systemview    included   .1.3.6.1.2.1.25.1.1

####
# Finally, grant the group read-only access to the systemview view.

#       group          context sec.model sec.level prefix read   write  notif
access  notConfigGroup ""      any       noauth    exact  systemview none none

# -----------------------------------------------------------------------------
/etc/snmp/snmpd.conf from the server withon the graphs

Code: Select all

####
# First, map the community name "public" into a "security name"

#       sec.name  source          community
com2sec notConfigUser  default       public

####
# Second, map the security name into a group name:

#       groupName      securityModel securityName
group   notConfigGroup v1           notConfigUser
group   notConfigGroup v2c           notConfigUser

####
# Third, create a view for us to let the group have rights to:

# Make at least  snmpwalk -v 1 localhost -c public system fast again.
#       name           incl/excl     subtree         mask(optional)
view    systemview    included   .1.3.6.1.2.1.1
view    systemview      included        .1      80
view    systemview      included        .1
view    systemview    included   .1.3.6.1.2.1.25.1.1

####
# Finally, grant the group read-only access to the systemview view.

#       group          context sec.model sec.level prefix read   write  notif
access  notConfigGroup ""      any       noauth    exact  systemview none none
I have noticed that this line:

Code: Select all

SNMP - Interface Statistics  (Verbose Query)  Uptime Goes Backwards  Success [26 Items, 4 Rows] 
keeps changing the items and rows its reporting when hitting the refresh/verbose circle.

Snip or running poller and cmd:

Code: Select all

[cactiuser@sano cacti]$ php cmd.php
07/20/2007 12:21:05 PM - CMDPHP: Poller[0] Time: 0.4435 s, Theads: N/A, Hosts: 2
[cactiuser@sano cacti]$ php poller.php
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
07/20/2007 12:21:08 PM - SYSTEM STATS: Time:1.0649 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:18 RRDsProcessed:28
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.02
OK u:0.00 s:0.00 r:0.03
OK u:0.00 s:0.00 r:0.03
OK u:0.00 s:0.00 r:0.03
OK u:0.00 s:0.00 r:0.03
OK u:0.00 s:0.00 r:0.03
OK u:0.00 s:0.00 r:0.03
OK u:0.00 s:0.00 r:0.03
OK u:0.00 s:0.00 r:0.03
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

A common VM issue is a glitch with the machine's local time. Please google for possible solutions. To verify that my assumption is correct, use DEBUG mode as given at the second link of my signature
Reinhard
mengesb
Posts: 16
Joined: Fri Jun 08, 2007 4:41 pm

Post by mengesb »

gandalf wrote:A common VM issue is a glitch with the machine's local time. Please google for possible solutions. To verify that my assumption is correct, use DEBUG mode as given at the second link of my signature
Reinhard
Doesn't quite explain the history of it being perfect for so long though...
mbhoward
Posts: 20
Joined: Wed Sep 05, 2007 4:31 pm

Post by mbhoward »

not sure if this is the right thread for this but i just fixed my own graph gap issue and i feel compelled to share...

my environment:

linux FC4
cacti-0.8.6j-1 (with all current patches applied)
php-5.0.4-10.5
net-snmp-5.2.1.2-fc4.1
poller: poller.php

all my SNMP graphs were showing up fine but my script graphs would show intermittent gaps. the script grabs data from a remote host via an ssh connection. intermittent gaps but very consistent. no detectable pattern but somewhere on the order of 1 out of every 3 polling cycles would fail. nans in the rrd for these polling cycles but switching to debug mode showed that the script was getting the data fine and the rrdtool update command was getting formulated (and presumably executed) correctly.

i tried many of the proposed solutions which i found floating through the forums (bump memory limit in php.ini, modify mysql global options, change to SNMP v2, kill all rogue processes, etc..) but nothing worked. it seemed strange that i was experiencing this problem with only 1 remote host. poller.php executes in less than 2 seconds.

give up? well, i still don't understand why, but the problem went away when i modified my crontab entry (/etc/cron.d/cacti) to include the -q flag to php as follows:

*/5 * * * * cactiuser /usr/bin/php -q /var/www/html/cacti/poller.php > /var/local/log/poller.log 2>&1

the -q flag suppresses HTTP header output but i'm not sure why this was causing a problem. pretty definitive fix though, since i was seeing consistent gaps before the change and now i'm not. after letting it run all night (no gaps), i removed the -q flag, restarted crond and started seeing gaps within 3 polling cycles. i put it back and they went away.

maybe someone who understands this better than me can determine whether or not this "fix" should make its way into the next version and/or documentation? many thanks to gandalf's wonderful document "Debug NaN's in your graphs" for helping me isolate this problem.

mark
mbhoward
Posts: 20
Joined: Wed Sep 05, 2007 4:31 pm

Post by mbhoward »

well, as usual, i spoke too soon. i'm still seeing the occasional gap, although not nearly at the same frequency (about one every 20 polling cycles as opposed to one every 3 polling cycles). polling interval = 300 seconds.

my new theory is that the problem was caused by incorrectly set heartbeat values on the rrd. step was 300, heartbeat was 300. i boosted heartbeat to 600 and i haven't seen the problem since, although still too soon to say for sure.

i still find it strange that adding the -q flag in the crontab entry would result in such a drastic improvement. especially since the -q flag is already specified in the header of poller.php. but seeing is believing. i'm guessing that somehow specifying the -q flag speeds up script execution enough to make a difference to rrd step/heartbeat timing.

mark
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

mbhoward wrote:my new theory is that the problem was caused by incorrectly set heartbeat values on the rrd. step was 300, heartbeat was 300. i boosted heartbeat to 600 and i haven't seen the problem since, although still too soon to say for sure.
That's definitvely the reason for your problems. If heartbeat = interval, even a slight overtime of one second would cause rrdtool to flag the value as NaN! heartbeat should IMHO at least be 1.5 times interval (that would make 450 for default interval) or more. Please make sure, to rrdtool tune all existing rrd files, then
Reinhard
Post Reply

Who is online

Users browsing this forum: No registered users and 5 guests