Dropouts in graphs but no snmp-timeouts.

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
ShejtanVrbaski
Posts: 4
Joined: Mon Oct 01, 2012 4:49 am

Dropouts in graphs but no snmp-timeouts.

Post by ShejtanVrbaski »

Hi guys,
I'm not experienced in Cacti. I've recently (like a month ago) started graphing all of the servers in my company, about 80. No oddities occured until for the recent past few days when few of the database hosts (on the same VLAN) started getting sporadic dropouts in graphs, 5 minutes dropouts every few hours, sometimes 3x5 minutes in the row. All graphs were effected, at the same time, on those hosts (not on the all hosts on that VLAN, though, but quite many). My first spontanious thought was that SNMP timeout occured but I couldn't find a single row in the Cacti logs (that I've had set to the debug level) pointing that out. I'm very confused and would really appreciate if someone could fill me in with tips/explanation/theory what could possibly cause this type of behaviour in graphs.

Thanks in advance for any reply.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Dropouts in graphs but no snmp-timeouts.

Post by gandalf »

First step is to watch SYSTEM STATS for a poller overrun. Then, when running spine088a, a verbosity level of 3 prints per host polling times which may show single hosts taking too long.
R.
ShejtanVrbaski
Posts: 4
Joined: Mon Oct 01, 2012 4:49 am

Re: Dropouts in graphs but no snmp-timeouts.

Post by ShejtanVrbaski »

gandalf wrote:First step is to watch SYSTEM STATS for a poller overrun. Then, when running spine088a, a verbosity level of 3 prints per host polling times which may show single hosts taking too long.
R.
Hello,
Thank you very much for a fast reply.
The suggested look-out-for rows are not found in the log.
The Spine running here is the latest to be found on Debian Squeeze repository - being 0.8.7e.
I did, however found following output in the logs which corresponds to the datasources of the effected servers where these gaps are occuring (timestamps suggest as well this has something to do with it).
11/12/2012 02:00:01 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 701, Data Sources: SysContext(DS[3619]), cpu_idle(DS[3620]), cpu_nice(DS[3621]), cpu_system(DS[3622]), cpu_user(DS[3623]), loadavg1(DS[3624]), loadavg15(DS[3625]), loadavg5(DS[3626]), cpu_interrupts(DS[3627]), cpu_nice(DS[3628]), cpu_system(DS[3629]), cpu_user(DS[3630]), cpu_wait(DS[3631]), tcpCurrEstab(DS[3632]), SysInterrupts(DS[3633]), load_1min(DS[3634]), load_15min(DS[3635]), load_5min(DS[3636]), proc(DS[3637]), users(DS[3638]), SwapOut(DS[3639]), Additional Issues Remain. Only showing first 20
I've done some research on these things, and I couldn't find anything that helps me out. Some people suggest increasing the memory_limit in php.ini (/etc/php5/cli/php.ini for Debian), and even to check out, using poller_output_empty.php. It appears to be no data in output empty table, also the actual memory limit in the php.ini is already set by default to -1 (unlimited), and the issue remains.

Thanks in advance for any suggestion.
paulgevers
Cacti Pro User
Posts: 613
Joined: Tue Aug 29, 2006 4:09 pm
Location: NL

Re: Dropouts in graphs but no snmp-timeouts.

Post by paulgevers »

ShejtanVrbaski wrote:I've done some research on these things, and I couldn't find anything that helps me out. Some people suggest increasing the memory_limit in php.ini (/etc/php5/cli/php.ini for Debian), and even to check out, using poller_output_empty.php. It appears to be no data in output empty table, also the actual memory limit in the php.ini is already set by default to -1 (unlimited), and the issue remains.
Not sure if the memory really has to do with it, but on Debian, there are TWO php.ini files. I think you have the right one, but please check. Also, you don't have suhosin active, right? Maybe have a look at Debian bug 566609
Maintainer of cacti in Debian (and Ubuntu).
Cacti 1.* is now officially supported on Debian Stretch via Debian backports
FAQ Ubuntu and Debian differences
Generic cacti debugging
ShejtanVrbaski
Posts: 4
Joined: Mon Oct 01, 2012 4:49 am

Re: Dropouts in graphs but no snmp-timeouts.

Post by ShejtanVrbaski »

No suhosin active, no.
Yep, that should be the correct file, the php.ini inside of ./cli sub-dir in /etc/php5 (Debian based). But obviously that didn't do anything either way. The problem is still there. This is very frustrating.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Dropouts in graphs but no snmp-timeouts.

Post by gandalf »

Are you able to compile a custom 088a version of spine on that system? This will bring you the wanted timings for hosts.
Long time ago, I debugged those issues using "ps" to list active pocesses and often found a specific script for a specific host taking too long. But still, this is only one possible root cause
R.
ShejtanVrbaski
Posts: 4
Joined: Mon Oct 01, 2012 4:49 am

Re: Dropouts in graphs but no snmp-timeouts.

Post by ShejtanVrbaski »

A little update before I start considering compiling a newer version of Spine.

For the past few days (this Cacti host is a virtual machine running on an Esxi server), I've experimented a little bit with resource allocation between different machines (there are several machines on this esxi running). By adding 2 additional cores (from 2 to 4) + making sure 4 cores are actually dedicated for this machine only, adding even additional 2GB RAM (from 2 to 4 here as well) and even increasing Maximum Threads per Process for Poller to 50, I've noticed improvement. - The gaps are not showing as frequently, this time, however, not same hosts are effected anymore, but its more of random nature... This maybe should point out the issue might at least partially have something to do with hardware performance. Yet, here we are talking about like 80 hosts that are being polled...so it still feels a little bit insane having to add this many resources. In fact, in the Cacti manual it says that adding any more than 50 as value in Maximum Threads per Process _is_ insane :lol: So I'm unsure.

Also, another issue that I've recently discovered, that might or not be related with what I've previously discovered, is that 4 of our hosts (quite unimportant hosts so nobody really looks at them), have not been updated until late October. I could see in the logs that the data has been frequently polled but poller didn't update rrd files. I figured, since these hosts were unimportant, I would just dump them and recreate them quickly. After doing this, no graphs are being shown at all. No new rrd files connected to these hosts are have been created either. And as previously, I can see that poller is retrieving the proper data for these hosts.

Thanks in advance for any feedback.
Post Reply

Who is online

Users browsing this forum: No registered users and 3 guests