Okay, Have a 1.2GHz machine with 1GB of ram. Running cacti 8.7a with spine 8.7a. First had issues with there just being one gap in most of the graphs during one specific time of day, everyday. Now it seems to have degraded to several gaps randomly throughout the day. Doing a "top" shows that several php and spine processes are running. There's only one cron setup to run, however I do think that one run overlaps into the next sometimes. Have several of these in the cacti log "WARNING: Result from SNMP not valid. Partial Result: ... " and quite often this "WARNING: Poller Output Table not Empty. Potential Data Source Issues for Data Sources"
A reboot of the machine will kill off the extra processes and lower the amount of gaps, but they never entirely go away.
Here's a line from a successfull poll:
SYSTEM STATS: Time:14.4864 Method:spine Processes:1 Threads:15 Hosts:10 HostsPerProcess:10 DataSources:384 RRDsProcessed:165
Turning logging onto HIGH I get this:
POLLER: Poller[0] NOTE: Cron is configured to run too often! The Poller Interval is '60' seconds, with a minimum Cron period of '60' seconds, but only 180 seconds have passed since the poller last ran.
10/21/2008 11:06:14 AM - POLLER: Poller[0] NOTE: Poller Int: '60', Cron Int: '300', Time Since Last: '180', Max Runtime '298', Poller Runs: '5'
The polling period is every minute and cron is every 5.
(Which I've read is having more than one cron, but I've checked in etc/cron.d / crontab for users and /etc/crontab and I've only seen it mentioned in /etc/crontab)
Something else that seems weird to me is when viewing the poller cache there seems to be duplicate entries for the most part except for the OID that has 1 different number:
rtr1 - Errors - 208.x.x.x - Gi5/3 SNMP Version: 2, Community: Nagix, OID: .1.3.6.1.2.1.2.2.1.13.51
RRD: /var/www/html/rra/6509-stldist-rtr1_errors_in_470.rrd
rtr1 - Errors - 208.x.x.x - Gi5/3 SNMP Version: 2, Community: Nagix, OID: .1.3.6.1.2.1.2.2.1.19.51
RRD: /var/www/html/rra/6509-stldist-rtr1_errors_in_470.rrd
rtr1 - Errors - 208.x.x.x - Gi5/3 SNMP Version: 2, Community: Nagix, OID: .1.3.6.1.2.1.2.2.1.14.51
RRD: /var/www/html/rra/6509-stldist-rtr1_errors_in_470.rrd
rtr1 - Errors - 208.x.x.x - Gi5/3 SNMP Version: 2, Community: Nagix, OID: .1.3.6.1.2.1.2.2.1.20.51
RRD: /var/www/html/rra/6509-stldist-rtr1_errors_in_470.rrd
Don't know if this could be part of the problem or not so I'm including in case it means something to somebody.
There's also 327 data sources.
I hope I've included enough information. Hope somebody can help!
Cacti degrading...any hope?
Moderators: Developers, Moderators
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
Well, you system is not weak. However, there are a few notes of interest:
1) Those poller items are not duplicate. Look closely at the OIDS
2) The cron sync issue was a bug corrected with 0.8.7b. It still happens from time to time if your cron start time varies a lot. This can be remediated and should be. Right now we only allow 5 second for cron to launch the process. I have increased to 10 and that seems to help. The change would be in poller.php (search for the number 5 and you will eventually track it down).
3) If using spine, the poller output table empty warnings may be from an anomaly that just received a bug ticket the other day when some error counters from a 4 counter set are missing. I am still exploring what to do with this issue. It's a vendor specific corner case.
TheWitness
1) Those poller items are not duplicate. Look closely at the OIDS
2) The cron sync issue was a bug corrected with 0.8.7b. It still happens from time to time if your cron start time varies a lot. This can be remediated and should be. Right now we only allow 5 second for cron to launch the process. I have increased to 10 and that seems to help. The change would be in poller.php (search for the number 5 and you will eventually track it down).
3) If using spine, the poller output table empty warnings may be from an anomaly that just received a bug ticket the other day when some error counters from a 4 counter set are missing. I am still exploring what to do with this issue. It's a vendor specific corner case.
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
- oxo-oxo
- Cacti User
- Posts: 126
- Joined: Thu Aug 30, 2007 11:35 am
- Location: Silkeborg, Denmark
- Contact:
spine ran out of time and exits: possible gap cause ...
- over to thewitness ,,,
- over to thewitness ,,,
Code: Select all
/* get current time and exit program if time limit exceeded */
if (poller_counter >= 20) {
current_time = get_time_as_double();
if ((current_time - begin_time + 6) > poller_interval) {
SPINE_LOG(("ERROR: Spine Timed Out While Processing Hosts Internal\n"));
canexit = 1;
break;
}
poller_counter = 0;
}else{
poller_counter++;
}
Owen Brotherwood, JN Data A/S, Denmark.
Who is online
Users browsing this forum: No registered users and 8 guests