Cacti graph gap

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

star3am
Posts: 23
Joined: Mon Aug 04, 2008 5:08 am
Location: Cape Town

Post by star3am »

Code: Select all

# Cacti poller to run every minute
        */1     *       *       *       *       /usr/bin/php /var/www/cacti/poller.php --force > /dev/null 2>&1
        */5     *       *       *       *       /usr/bin/php /var/www/cacti/plugins/weathermap/weathermap-cacti-rebuild.php 
star3am
Posts: 23
Joined: Mon Aug 04, 2008 5:08 am
Location: Cape Town

GRRrrrrr

Post by star3am »

Does the poller have correct access to the log? What does it look like?
L
Sent from my Verizon Wireless BlackBerry
Hey Larry, honestly, I don't see ANY errors :(
I have setup cacti before, and am so impressed, i have tuned mySQL and PHP and my system, I have also now moved it onto x86_64 but the gaps still remain.

I really need help to solve this, so I have attached the output from spine poller

Thank you !
riaan
Attachments
2009-06-11_spine_output.txt
(103.4 KiB) Downloaded 496 times
User avatar
TheWitness
Developer
Posts: 17047
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Do you have the process balancing turned on? If so, turn it off.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
dieselboy
Cacti User
Posts: 135
Joined: Wed May 27, 2009 5:10 pm

Post by dieselboy »

if you have moved to a completely different machine, but gaps still appear, then may be the fault does not lie with the machine (since you swapped it)

may be the fault lies with the communication between the cacti machine and the devices you are trying to poll?
star3am
Posts: 23
Joined: Mon Aug 04, 2008 5:08 am
Location: Cape Town

Post by star3am »

Hi all, Larry thanks for coming back to me, ok, I have disabled the treads,

Code: Select all

06/11/2009 03:10:04 PM - SYSTEM STATS: Time:2.6017 Method:spine Processes:1 Threads:1 Hosts:11 HostsPerProcess:11 DataSources:541 RRDsProcessed:46
I'm actually so disappointed, normally I'd have this fixed already, but for the life of me, i'm still staring at UGLY graphs with gaps :-?

@Dieselboy - man, I thought along the same lines, but I never get any error when I do snmpwalk :\ nor do I see error in the logs :(

2 things worry me,

1) I'm using Centos, with net-snmp version 5.3
2) I't running in a XEN VM ..

could this play a part ?

Could you perhaps help me to see what the log did when the timeout occurred ?

Thanks again, as always, hoping I can get this fixed and documented :(
dieselboy
Cacti User
Posts: 135
Joined: Wed May 27, 2009 5:10 pm

Post by dieselboy »

at what point do the gaps appear? is it always the same time or is it random?

just out of interest, can you do from the VM a continuous ping to the ip address of the device that has the graph gaps, this will just ensure that the communication between the two devices is continuous.

i looked at this topic because my graphs used to do the same, but this was partly because i had two lines in the crontab and also that the permissions kept being messed up for some strange reason that i couldnt figure out.
i now run the crontab line as root and not cactiuser.
i dont know why i kept having crontab issues, i just left alone for half a day with just one line in there and it all just worked. so i left it since then.
star3am
Posts: 23
Joined: Mon Aug 04, 2008 5:08 am
Location: Cape Town

just an update

Post by star3am »

hey guys just an update, I found this post, http://forums.cacti.net/viewtopic.php?t ... light=1062 also relating to gaps in the graphs ..
Ok. when I installed Cacti via RPM, the install process created a cron entry in /etc/cron.d/cacti

I also created a cron entry for cacti user
Code:

crontab -e -u cacti


This created double cron entries, and I got gaps in graphs. Strangly, I did not see a single error / message in cacti.log or syslog stating along the line of "duplicate insert statement".

From one of the posts, I did look at /etc/crontab file, but that did not have cacti cron.

So, to all users using opensuse 10.3 or higher, and getting gaps, please make sure to check /etc/cron.d/ directory...

Thanks for the great product and for this wonderful forum and eceryones help.
I found /etc/cron.d/cacti and have deleted it now. Hoping that will fix mt issues, will report back,
star3am
Posts: 23
Joined: Mon Aug 04, 2008 5:08 am
Location: Cape Town

Post by star3am »

Just an update ...

Looking at this thread http://forums.cacti.net/about29252.html
If you want to run Cacti as a VM (which is what I do, production and testing. ESX as my base, CentOS 4 / 5 as the Cacti server) then there are a few easy tips to fix the clock drift for CentOS at least.

Edit /etc/yum.repos.d/CentOS-Base.repo
Add this
Code:
[testing]
name=CentOS Testing
baseurl=http://dev.centos.org/centos/$releaseve ... $basearch/
enabled=0
gpgcheck=1
gpgkey=http://dev.centos.org/centos/RPM-GPG-KEY-CentOS-testing

You will notice that its not enabled by default, keep it that way.

Now just run this command
Code:
yum install kernel-vm --enablerepo=testing

This will install a kernel that uses a 100Mhz clock, versus the 1000Mhz used in the normal kernel.

You will then want to edit /etc/grub.conf to make it boot to the VM Kernel normally. Then just reboot.

You will then want to synchronize the time properly, so run these commands
Code:
service ntpd stop
ntpdate -u 0.pool.ntp.org
hwclock -w
service ntpd start
also looking at this thread, http://forums.cacti.net/about22443.html
mbhoward wrote:
my new theory is that the problem was caused by incorrectly set heartbeat values on the rrd. step was 300, heartbeat was 300. i boosted heartbeat to 600 and i haven't seen the problem since, although still too soon to say for sure.
That's definitvely the reason for your problems. If heartbeat = interval, even a slight overtime of one second would cause rrdtool to flag the value as NaN! heartbeat should IMHO at least be 1.5 times interval (that would make 450 for default interval) or more. Please make sure, to rrdtool tune all existing rrd files, then
Reinhard
will update this post as I tug along... well, at least it's Friday :)
dieselboy
Cacti User
Posts: 135
Joined: Wed May 27, 2009 5:10 pm

Post by dieselboy »

The best of luck!
star3am
Posts: 23
Joined: Mon Aug 04, 2008 5:08 am
Location: Cape Town

Thanks

Post by star3am »

@dieselboy, thanks bro !!

Ok, it seems that I am getting somewhere .. after adding clock=tsc to my kernel line of grub.conf (of the VM itself) i am now not seeing any graps, I'm going to leave it for the weekend and report back.

this image is with Spine as poller, at around 17:30 I changed to poller.php and left it for the night, as you can see, the change made no effect.
Image

This morning i read up and following the posts above set my kernel boot parameter, which seemed to have made a difference, this graph is after the change and so far no gaps,
Image

Have a great weekend, thank you for the support thus far, will let you know the outcome,

ps. the graphs above is from the memory usage of the cacti/nagios VM itself, so it's monitoring localhost, no iptables
ciao/Riaan
Last edited by star3am on Tue Jun 16, 2009 11:48 am, edited 1 time in total.
dieselboy
Cacti User
Posts: 135
Joined: Wed May 27, 2009 5:10 pm

Post by dieselboy »

so if it monitors itself its okay, but if it monitors other devices then you have problems? i really dont know much about cacti but to me that would say may be a connection issue between devices?

out of interest, do you have "realtime" plugin installed? im wondering if you view the graphs in realtime does the gaps appear in that too, or is it clean?
realtime is easy to setup. you just put the plugin in the directory and specify the path to save the new temp graphs it makes.
star3am
Posts: 23
Joined: Mon Aug 04, 2008 5:08 am
Location: Cape Town

also a good read

Post by star3am »

some graphs still have a gap or two :( but other have no gaps whatsoever .. so progress is being made, and I'm happy for that ..

also a good read Re: Heartbeat and Step ..

http://forums.cacti.net/about29905.html
baxford wrote:
gandalf wrote:
You will have to run "rrdtool info" against the new rrd file to make sure.
Cacti 088 will support a new feature to compare cacti defs to the current rrd file defs (yesterday I implemented the base part of it), so your info won't get lost
Reinhard


I ran "rrdtool info" on the new rrd and it does show a step value of 60. But it also shows a heartbeat of 600?? The heartbeat should always be 2x the step value, right?
That's the rule of thumb. But it's not a must. From man rrdcreate (V1.3.current)
Code:
The HEARTBEAT and the STEP
Here is an explanation by Don Baarda on the inner workings of RRDtool. It may help you to sort out why
all this *UNKNOWN* data is popping up in your databases:

RRDtool gets fed samples/updates at arbitrary times. From these it builds Primary Data Points (PDPs) on
every "step" interval. The PDPs are then accumulated into the RRAs.

The "heartbeat" defines the maximum acceptable interval between samples/updates. If the interval between
samples is less than "heartbeat", then an average rate is calculated and applied for that interval. If the
interval between samples is longer than "heartbeat", then that entire interval is considered "unknown".
Note that there are other things that can make a sample interval "unknown", such as the rate exceeding
limits, or a sample that was explicitly marked as unknown.

The known rates during a PDP’s "step" interval are used to calculate an average rate for that PDP. If the
total "unknown" time accounts for more than half the "step", the entire PDP is marked as "unknown". This
means that a mixture of known and "unknown" sample times in a single PDP "step" may or may not add up to
enough "known" time to warrent for a known PDP.

The "heartbeat" can be short (unusual) or long (typical) relative to the "step" interval between PDPs. A
short "heartbeat" means you require multiple samples per PDP, and if you don’t get them mark the PDP
unknown. A long heartbeat can span multiple "steps", which means it is acceptable to have multiple PDPs
calculated from a single sample. An extreme example of this might be a "step" of 5 minutes and a "heart-
beat" of one day, in which case a single sample every day will result in all the PDPs for that entire day
period being set to the same average rate. -- Don Baarda <don.baarda@baesystems.com>
time|
axis|
begin__|00|
|01|
u|02|----* sample1, restart "hb"-timer
u|03| /
u|04| /
u|05| /
u|06|/ "hbt" expired
u|07|
|08|----* sample2, restart "hb"
|09| /
|10| /
u|11|----* sample3, restart "hb"
u|12| /
u|13| /
step1_u|14| /
u|15|/ "swt" expired
u|16|
|17|----* sample4, restart "hb", create "pdp" for step1 =
|18| / = unknown due to 10 "u" labled secs > 0.5 * step
|19| /
|20| /
|21|----* sample5, restart "hb"
|22| /
|23| /
|24|----* sample6, restart "hb"
|25| /
|26| /
|27|----* sample7, restart "hb"
step2__|28| /
|22| /
|23|----* sample8, restart "hb", create "pdp" for step1, create "cdp"
|24| /
|25| /

graphics by vladimir.lavrov@desy.de.

Reinhard
star3am
Posts: 23
Joined: Mon Aug 04, 2008 5:08 am
Location: Cape Town

Post by star3am »

No there were gaps regardless which host was being monitored, I have tested the network, and I'm certain it's not the network ... no packet loss ever :\

Thanks for the plugin suggestion, will look into it ;)

Thanks a bunch
dieselboy wrote:so if it monitors itself its okay, but if it monitors other devices then you have problems? i really dont know much about cacti but to me that would say may be a connection issue between devices?

out of interest, do you have "realtime" plugin installed? im wondering if you view the graphs in realtime does the gaps appear in that too, or is it clean?
realtime is easy to setup. you just put the plugin in the directory and specify the path to save the new temp graphs it makes.
star3am
Posts: 23
Joined: Mon Aug 04, 2008 5:08 am
Location: Cape Town

SOLVED !

Post by star3am »

SOLVED !

Great to come back from a week holiday and see no gaps :)

What fixed it for me? Running cacti in a VM was,
adding clock=tsc to your kernel line in grub.conf (of your VM)

Then I also set the time with Cron Daemon like this,

Code: Select all

       30      *       *       *       *       /usr/sbin/ntpdate -s pool.ntp.org
Also very important, is to make sure that only one poller process gets called, so delete /etc/cron.d/cacti if you are not using it ...

This fixed nearly all my gaps. But some graphs still had gaps ..

I looked at the graph's data template .. the step of the rrd was 60, some of the graphs that had gaps, heartbeat was 60 and/or 120, after changing the heartbeat to 300 I had no more gaps.

Image

Image

Maybe it would be a good idea to, in the log, mention the step and the heartbeat for debugging?

super happy to have this resolved :) hope it helps other people ;)
ciao/Riaan
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest