Cacti graph gap

star3am · Post by **star3am** » Mon Jun 01, 2009 4:26 am

# Cacti poller to run every minute
        */1     *       *       *       *       /usr/bin/php /var/www/cacti/poller.php --force > /dev/null 2>&1
        */5     *       *       *       *       /usr/bin/php /var/www/cacti/plugins/weathermap/weathermap-cacti-rebuild.php

star3am · Post by **star3am** » Thu Jun 11, 2009 5:28 am

Does the poller have correct access to the log? What does it look like?
L
Sent from my Verizon Wireless BlackBerry

Hey Larry, honestly, I don't see ANY errors

I have setup cacti before, and am so impressed, i have tuned mySQL and PHP and my system, I have also now moved it onto x86_64 but the gaps still remain.

I really need help to solve this, so I have attached the output from spine poller

Thank you !
riaan

Post by **TheWitness** » Thu Jun 11, 2009 5:46 am

Do you have the process balancing turned on? If so, turn it off.

TheWitness

dieselboy · Post by **dieselboy** » Thu Jun 11, 2009 5:46 am

if you have moved to a completely different machine, but gaps still appear, then may be the fault does not lie with the machine (since you swapped it)

may be the fault lies with the communication between the cacti machine and the devices you are trying to poll?

star3am · Post by **star3am** » Thu Jun 11, 2009 8:09 am

Hi all, Larry thanks for coming back to me, ok, I have disabled the treads,

Code: Select all

06/11/2009 03:10:04 PM - SYSTEM STATS: Time:2.6017 Method:spine Processes:1 Threads:1 Hosts:11 HostsPerProcess:11 DataSources:541 RRDsProcessed:46

I'm actually so disappointed, normally I'd have this fixed already, but for the life of me, i'm still staring at UGLY graphs with gaps

@Dieselboy - man, I thought along the same lines, but I never get any error when I do snmpwalk :\ nor do I see error in the logs

2 things worry me,

1) I'm using Centos, with net-snmp version 5.3
2) I't running in a XEN VM ..

could this play a part ?

Could you perhaps help me to see what the log did when the timeout occurred ?

Thanks again, as always, hoping I can get this fixed and documented

dieselboy · Post by **dieselboy** » Thu Jun 11, 2009 8:14 am

at what point do the gaps appear? is it always the same time or is it random?

just out of interest, can you do from the VM a continuous ping to the ip address of the device that has the graph gaps, this will just ensure that the communication between the two devices is continuous.

i looked at this topic because my graphs used to do the same, but this was partly because i had two lines in the crontab and also that the permissions kept being messed up for some strange reason that i couldnt figure out.
i now run the crontab line as root and not cactiuser.
i dont know why i kept having crontab issues, i just left alone for half a day with just one line in there and it all just worked. so i left it since then.

star3am · Post by **star3am** » Fri Jun 12, 2009 2:30 am

hey guys just an update, I found this post, http://forums.cacti.net/viewtopic.php?t ... light=1062 also relating to gaps in the graphs ..

Ok. when I installed Cacti via RPM, the install process created a cron entry in /etc/cron.d/cacti

I also created a cron entry for cacti user
Code:

crontab -e -u cacti

This created double cron entries, and I got gaps in graphs. Strangly, I did not see a single error / message in cacti.log or syslog stating along the line of "duplicate insert statement".

From one of the posts, I did look at /etc/crontab file, but that did not have cacti cron.

So, to all users using opensuse 10.3 or higher, and getting gaps, please make sure to check /etc/cron.d/ directory...

Thanks for the great product and for this wonderful forum and eceryones help.

I found /etc/cron.d/cacti and have deleted it now. Hoping that will fix mt issues, will report back,

star3am · Post by **star3am** » Fri Jun 12, 2009 6:06 am

Just an update ...

Looking at this thread http://forums.cacti.net/about29252.html

If you want to run Cacti as a VM (which is what I do, production and testing. ESX as my base, CentOS 4 / 5 as the Cacti server) then there are a few easy tips to fix the clock drift for CentOS at least.

Edit /etc/yum.repos.d/CentOS-Base.repo
Add this
Code:
[testing]
name=CentOS Testing
baseurl=http://dev.centos.org/centos/$releaseve ... $basearch/
enabled=0
gpgcheck=1
gpgkey=http://dev.centos.org/centos/RPM-GPG-KEY-CentOS-testing

You will notice that its not enabled by default, keep it that way.

Now just run this command
Code:
yum install kernel-vm --enablerepo=testing

This will install a kernel that uses a 100Mhz clock, versus the 1000Mhz used in the normal kernel.

You will then want to edit /etc/grub.conf to make it boot to the VM Kernel normally. Then just reboot.

You will then want to synchronize the time properly, so run these commands
Code:
service ntpd stop
ntpdate -u 0.pool.ntp.org
hwclock -w
service ntpd start

also looking at this thread, http://forums.cacti.net/about22443.html

mbhoward wrote:
my new theory is that the problem was caused by incorrectly set heartbeat values on the rrd. step was 300, heartbeat was 300. i boosted heartbeat to 600 and i haven't seen the problem since, although still too soon to say for sure.
That's definitvely the reason for your problems. If heartbeat = interval, even a slight overtime of one second would cause rrdtool to flag the value as NaN! heartbeat should IMHO at least be 1.5 times interval (that would make 450 for default interval) or more. Please make sure, to rrdtool tune all existing rrd files, then
Reinhard

will update this post as I tug along... well, at least it's Friday

dieselboy · Post by **dieselboy** » Fri Jun 12, 2009 6:45 am

The best of luck!

star3am · Post by **star3am** » Fri Jun 12, 2009 8:03 am

@dieselboy, thanks bro !!

Ok, it seems that I am getting somewhere .. after adding clock=tsc to my kernel line of grub.conf (of the VM itself) i am now not seeing any graps, I'm going to leave it for the weekend and report back.

this image is with Spine as poller, at around 17:30 I changed to poller.php and left it for the night, as you can see, the change made no effect.

This morning i read up and following the posts above set my kernel boot parameter, which seemed to have made a difference, this graph is after the change and so far no gaps,

Have a great weekend, thank you for the support thus far, will let you know the outcome,

ps. the graphs above is from the memory usage of the cacti/nagios VM itself, so it's monitoring localhost, no iptables
ciao/Riaan

dieselboy · Post by **dieselboy** » Fri Jun 12, 2009 8:12 am

so if it monitors itself its okay, but if it monitors other devices then you have problems? i really dont know much about cacti but to me that would say may be a connection issue between devices?

out of interest, do you have "realtime" plugin installed? im wondering if you view the graphs in realtime does the gaps appear in that too, or is it clean?
realtime is easy to setup. you just put the plugin in the directory and specify the path to save the new temp graphs it makes.

star3am · Post by **star3am** » Fri Jun 12, 2009 8:16 am

some graphs still have a gap or two

but other have no gaps whatsoever .. so progress is being made, and I'm happy for that ..

also a good read Re: Heartbeat and Step ..

http://forums.cacti.net/about29905.html

baxford wrote:
gandalf wrote:
You will have to run "rrdtool info" against the new rrd file to make sure.
Cacti 088 will support a new feature to compare cacti defs to the current rrd file defs (yesterday I implemented the base part of it), so your info won't get lost
Reinhard

I ran "rrdtool info" on the new rrd and it does show a step value of 60. But it also shows a heartbeat of 600?? The heartbeat should always be 2x the step value, right?
That's the rule of thumb. But it's not a must. From man rrdcreate (V1.3.current)
Code:
The HEARTBEAT and the STEP
Here is an explanation by Don Baarda on the inner workings of RRDtool. It may help you to sort out why
all this *UNKNOWN* data is popping up in your databases:

RRDtool gets fed samples/updates at arbitrary times. From these it builds Primary Data Points (PDPs) on
every "step" interval. The PDPs are then accumulated into the RRAs.

The "heartbeat" defines the maximum acceptable interval between samples/updates. If the interval between
samples is less than "heartbeat", then an average rate is calculated and applied for that interval. If the
interval between samples is longer than "heartbeat", then that entire interval is considered "unknown".
Note that there are other things that can make a sample interval "unknown", such as the rate exceeding
limits, or a sample that was explicitly marked as unknown.

The known rates during a PDP’s "step" interval are used to calculate an average rate for that PDP. If the
total "unknown" time accounts for more than half the "step", the entire PDP is marked as "unknown". This
means that a mixture of known and "unknown" sample times in a single PDP "step" may or may not add up to
enough "known" time to warrent for a known PDP.

The "heartbeat" can be short (unusual) or long (typical) relative to the "step" interval between PDPs. A
short "heartbeat" means you require multiple samples per PDP, and if you don’t get them mark the PDP
unknown. A long heartbeat can span multiple "steps", which means it is acceptable to have multiple PDPs
calculated from a single sample. An extreme example of this might be a "step" of 5 minutes and a "heart-
beat" of one day, in which case a single sample every day will result in all the PDPs for that entire day
period being set to the same average rate. -- Don Baarda <don.baarda@baesystems.com>
time|
axis|
begin__|00|
|01|
u|02|----* sample1, restart "hb"-timer
u|03| /
u|04| /
u|05| /
u|06|/ "hbt" expired
u|07|
|08|----* sample2, restart "hb"
|09| /
|10| /
u|11|----* sample3, restart "hb"
u|12| /
u|13| /
step1_u|14| /
u|15|/ "swt" expired
u|16|
|17|----* sample4, restart "hb", create "pdp" for step1 =
|18| / = unknown due to 10 "u" labled secs > 0.5 * step
|19| /
|20| /
|21|----* sample5, restart "hb"
|22| /
|23| /
|24|----* sample6, restart "hb"
|25| /
|26| /
|27|----* sample7, restart "hb"
step2__|28| /
|22| /
|23|----* sample8, restart "hb", create "pdp" for step1, create "cdp"
|24| /
|25| /

graphics by vladimir.lavrov@desy.de.

Reinhard

star3am · Post by **star3am** » Fri Jun 12, 2009 8:20 am

No there were gaps regardless which host was being monitored, I have tested the network, and I'm certain it's not the network ... no packet loss ever :\

Thanks for the plugin suggestion, will look into it

Thanks a bunch

dieselboy wrote:so if it monitors itself its okay, but if it monitors other devices then you have problems? i really dont know much about cacti but to me that would say may be a connection issue between devices?

out of interest, do you have "realtime" plugin installed? im wondering if you view the graphs in realtime does the gaps appear in that too, or is it clean?
realtime is easy to setup. you just put the plugin in the directory and specify the path to save the new temp graphs it makes.

star3am · Post by **star3am** » Tue Jun 16, 2009 12:34 pm

SOLVED !

Great to come back from a week holiday and see no gaps

What fixed it for me? Running cacti in a VM was,
adding clock=tsc to your kernel line in grub.conf (of your VM)

Then I also set the time with Cron Daemon like this,

Code: Select all

       30      *       *       *       *       /usr/sbin/ntpdate -s pool.ntp.org

Also very important, is to make sure that only one poller process gets called, so delete /etc/cron.d/cacti if you are not using it ...

This fixed nearly all my gaps. But some graphs still had gaps ..

I looked at the graph's data template .. the step of the rrd was 60, some of the graphs that had gaps, heartbeat was 60 and/or 120, after changing the heartbeat to 300 I had no more gaps.

Maybe it would be a good idea to, in the log, mention the step and the heartbeat for debugging?

super happy to have this resolved

hope it helps other people

ciao/Riaan

Cacti

Cacti graph gap

GRRrrrrr

just an update

Thanks

also a good read

SOLVED !

Who is online