I've noticed a problem with some stuff getting stuck in poller_output. Having cranked the php memory to 512M and still seeing this problem I decided to investigate further!
05/28/2009 02:00:34 PM - SPINE: Poller[0] Time: 31.9659 s, Threads: 10, Hosts: 2265
05/28/2009 02:00:34 PM - SYSTEM STATS: Time:32.6916 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:14022 RRDsProcessed:9351
05/28/2009 02:05:02 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 1, Data Sources: (DS[9532])
mysql> select * from poller_output;
+---------------+----------+---------------------+-------------+
| local_data_id | rrd_name | time | output |
+---------------+----------+---------------------+-------------+
| 9532 | | 2009-05-28 14:00:05 | rta:15.9821 |
+---------------+----------+---------------------+-------------+
For the record, it always seems to be the same hosts too.
I believe I've caught all the logs at the various levels. I also have a some what convoluted way of getting data into cacti, hopefully that's not where the problem lies!
I'm using the script server to fetch the data. The 'ss_fetch_nagios_perf_web' retrieves the nagios performance data from a database through a simple web service.
05/28/2009 02:00:05 PM - SPINE: Poller[0] Host[37] DS[507] SS[0] SERVER: /var/www/cacti/scripts/ss_fetch_nagios_perf_web.php ss_fetch_nagios_perf_web ns1 ping, output: rta:3.3261
05/28/2009 02:00:05 PM - SPINE: Poller[0] Host[37] DS[9532] SS[1] SERVER: /var/www/cacti/scripts/ss_fetch_nagios_perf_web.php ss_fetch_nagios_perf_web ns1 dns_stats, output: rta:15.9821
The ping output is correct, however the dns_stats output is wrong. It's actually ping output as well!
I can see the incoming web service requests in the apache logs. The returned data size is spot on for the two different metrics.
127.0.0.1 - - [28/May/2009:14:00:05 +1000] "GET /perl/nagios-perf?host=ns1&stat=ping HTTP/1.1" 200 9 "-" "-"
127.0.0.1 - - [28/May/2009:14:00:05 +1000] "GET /perl/nagios-perf?host=ns1&stat=dns_stats HTTP/1.1" 200 61 "-" "-"
I also enabled the mysql logging and everything is in order there too. It looks like the script server is getting confused handling all the data it gets back??
The data cacti logged for the dns_stats is actually ping data from a totally different host!
| 470891 | 2009-05-28 13:59:23 | redback | ping | 0.099 | 0.595 | rta=15.982ms;500.000;4000.000;0; pl=0%;80;100;; |
I'll switch back to running my perl script to fetch all this data to see if it improves. However I was enjoying the improved performance when using the script server. I don't think it's my dodgy php script..... When I test it via the php script server it works as expected for both ping and dns_stats. This issue only happens intermittently as well which makes it odd.
For the record, running cacti 0.8.7d and spine 0.8.7c.
Any thoughts welcomed!
Regards,
ajt.
poller_output prob and script server confusing the output
Moderators: Developers, Moderators
-
- Posts: 15
- Joined: Tue Oct 17, 2006 10:27 pm
Spine 0.8.7d is the latest -- its hidden in the announcement forum. Lots of bug fixes.
Does your problem persist when you use the cmd.php poller?
Does your problem persist when you use the cmd.php poller?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
-
- Posts: 15
- Joined: Tue Oct 17, 2006 10:27 pm
Roger that. Just spotted the new spine after my initial post. I have upgraded spine to the latest version.
I have also now switched to script/server running a perl script instead of using the script server (PHP).
Although the run time doubles, I haven't seen the problem again.. early days, I'll let it run a bit longer before I make any big calls!
05/28/2009 03:41:13 PM - SYSTEM STATS: Time:71.6179 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 03:46:11 PM - SYSTEM STATS: Time:69.0398 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 03:51:08 PM - SYSTEM STATS: Time:67.3727 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 03:56:10 PM - SYSTEM STATS: Time:68.5181 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 04:01:15 PM - SYSTEM STATS: Time:73.7860 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 04:06:05 PM - SYSTEM STATS: Time:64.0604 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 04:11:09 PM - SYSTEM STATS: Time:67.2138 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9350
Regards,
ajt.
I have also now switched to script/server running a perl script instead of using the script server (PHP).
Although the run time doubles, I haven't seen the problem again.. early days, I'll let it run a bit longer before I make any big calls!
05/28/2009 03:41:13 PM - SYSTEM STATS: Time:71.6179 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 03:46:11 PM - SYSTEM STATS: Time:69.0398 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 03:51:08 PM - SYSTEM STATS: Time:67.3727 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 03:56:10 PM - SYSTEM STATS: Time:68.5181 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 04:01:15 PM - SYSTEM STATS: Time:73.7860 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 04:06:05 PM - SYSTEM STATS: Time:64.0604 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 04:11:09 PM - SYSTEM STATS: Time:67.2138 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9350
Regards,
ajt.
-
- Posts: 15
- Joined: Tue Oct 17, 2006 10:27 pm
Again, spoke too soon!
Just happened.
05/28/2009 04:11:09 PM - SYSTEM STATS: Time:67.2138 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9350
05/28/2009 04:15:01 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 2, Data Sources: (DS[9536]), (DS[9539])
05/28/2009 04:16:07 PM - SYSTEM STATS: Time:66.0896 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
I'll switch back to cmd.php and see how it goes.
Regards,
ajt.
Just happened.
05/28/2009 04:11:09 PM - SYSTEM STATS: Time:67.2138 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9350
05/28/2009 04:15:01 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 2, Data Sources: (DS[9536]), (DS[9539])
05/28/2009 04:16:07 PM - SYSTEM STATS: Time:66.0896 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
I'll switch back to cmd.php and see how it goes.
Regards,
ajt.
1) you really should use the php script server for increased efficiency among multiple servers.
2) if the script only occasionally fails, sounds like it needs some improvement since that is the point of failure.
2) if the script only occasionally fails, sounds like it needs some improvement since that is the point of failure.
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
-
- Posts: 15
- Joined: Tue Oct 17, 2006 10:27 pm
Yeah, I'm back on the php script server as it doesn't seem to be the problem. Much friendlier on my system too!
Anyway, ran with cmd.php overnight and only had one instance of the poller output issue.
05/28/2009 11:15:02 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 2, Data Sources: (DS[9536]), (DS[9539])
I investigated this and found I wasn't handling a named restart quite right which explains why I had two problems at 23:15.
| 778012 | 2009-05-28 23:09:01 | dns_stats | suc=-8.370,ref=0.000,nxr=-1.253,nxd=-2.507,rec=0.000,f=-0.027 |
| 779675 | 2009-05-28 23:10:02 | dns_stats | suc=-10.360,ref=0.000,nxr=-1.227,nxd=-3.587,rec=0.000,f=-56.203 |
I have corrected that issue, and I think it's safe to conclude that cmd.php is working a treat.
I'll switch back to spine and see if the problem manifests itself again.
Regards,
ajt.
Anyway, ran with cmd.php overnight and only had one instance of the poller output issue.
05/28/2009 11:15:02 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 2, Data Sources: (DS[9536]), (DS[9539])
I investigated this and found I wasn't handling a named restart quite right which explains why I had two problems at 23:15.
| 778012 | 2009-05-28 23:09:01 | dns_stats | suc=-8.370,ref=0.000,nxr=-1.253,nxd=-2.507,rec=0.000,f=-0.027 |
| 779675 | 2009-05-28 23:10:02 | dns_stats | suc=-10.360,ref=0.000,nxr=-1.227,nxd=-3.587,rec=0.000,f=-56.203 |
I have corrected that issue, and I think it's safe to conclude that cmd.php is working a treat.
I'll switch back to spine and see if the problem manifests itself again.
Regards,
ajt.
-
- Posts: 15
- Joined: Tue Oct 17, 2006 10:27 pm
Who is online
Users browsing this forum: No registered users and 3 guests