poller_output prob and script server confusing the output

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
ajtwatching
Posts: 15
Joined: Tue Oct 17, 2006 10:27 pm

poller_output prob and script server confusing the output

Post by ajtwatching »

I've noticed a problem with some stuff getting stuck in poller_output. Having cranked the php memory to 512M and still seeing this problem I decided to investigate further!

05/28/2009 02:00:34 PM - SPINE: Poller[0] Time: 31.9659 s, Threads: 10, Hosts: 2265
05/28/2009 02:00:34 PM - SYSTEM STATS: Time:32.6916 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:14022 RRDsProcessed:9351
05/28/2009 02:05:02 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 1, Data Sources: (DS[9532])

mysql> select * from poller_output;
+---------------+----------+---------------------+-------------+
| local_data_id | rrd_name | time | output |
+---------------+----------+---------------------+-------------+
| 9532 | | 2009-05-28 14:00:05 | rta:15.9821 |
+---------------+----------+---------------------+-------------+

For the record, it always seems to be the same hosts too.

I believe I've caught all the logs at the various levels. I also have a some what convoluted way of getting data into cacti, hopefully that's not where the problem lies! ;)

I'm using the script server to fetch the data. The 'ss_fetch_nagios_perf_web' retrieves the nagios performance data from a database through a simple web service.

05/28/2009 02:00:05 PM - SPINE: Poller[0] Host[37] DS[507] SS[0] SERVER: /var/www/cacti/scripts/ss_fetch_nagios_perf_web.php ss_fetch_nagios_perf_web ns1 ping, output: rta:3.3261

05/28/2009 02:00:05 PM - SPINE: Poller[0] Host[37] DS[9532] SS[1] SERVER: /var/www/cacti/scripts/ss_fetch_nagios_perf_web.php ss_fetch_nagios_perf_web ns1 dns_stats, output: rta:15.9821

The ping output is correct, however the dns_stats output is wrong. It's actually ping output as well!

I can see the incoming web service requests in the apache logs. The returned data size is spot on for the two different metrics.

127.0.0.1 - - [28/May/2009:14:00:05 +1000] "GET /perl/nagios-perf?host=ns1&stat=ping HTTP/1.1" 200 9 "-" "-"
127.0.0.1 - - [28/May/2009:14:00:05 +1000] "GET /perl/nagios-perf?host=ns1&stat=dns_stats HTTP/1.1" 200 61 "-" "-"

I also enabled the mysql logging and everything is in order there too. It looks like the script server is getting confused handling all the data it gets back??

The data cacti logged for the dns_stats is actually ping data from a totally different host!

| 470891 | 2009-05-28 13:59:23 | redback | ping | 0.099 | 0.595 | rta=15.982ms;500.000;4000.000;0; pl=0%;80;100;; |

I'll switch back to running my perl script to fetch all this data to see if it improves. However I was enjoying the improved performance when using the script server. I don't think it's my dodgy php script..... When I test it via the php script server it works as expected for both ping and dns_stats. This issue only happens intermittently as well which makes it odd.

For the record, running cacti 0.8.7d and spine 0.8.7c.

Any thoughts welcomed!

Regards,

ajt.
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

Spine 0.8.7d is the latest -- its hidden in the announcement forum. Lots of bug fixes.

Does your problem persist when you use the cmd.php poller?
ajtwatching
Posts: 15
Joined: Tue Oct 17, 2006 10:27 pm

Post by ajtwatching »

Roger that. Just spotted the new spine after my initial post. I have upgraded spine to the latest version.

I have also now switched to script/server running a perl script instead of using the script server (PHP).

Although the run time doubles, I haven't seen the problem again.. early days, I'll let it run a bit longer before I make any big calls!

05/28/2009 03:41:13 PM - SYSTEM STATS: Time:71.6179 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 03:46:11 PM - SYSTEM STATS: Time:69.0398 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 03:51:08 PM - SYSTEM STATS: Time:67.3727 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 03:56:10 PM - SYSTEM STATS: Time:68.5181 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 04:01:15 PM - SYSTEM STATS: Time:73.7860 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 04:06:05 PM - SYSTEM STATS: Time:64.0604 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352
05/28/2009 04:11:09 PM - SYSTEM STATS: Time:67.2138 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9350

Regards,

ajt.
ajtwatching
Posts: 15
Joined: Tue Oct 17, 2006 10:27 pm

Post by ajtwatching »

Again, spoke too soon!

Just happened.

05/28/2009 04:11:09 PM - SYSTEM STATS: Time:67.2138 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9350
05/28/2009 04:15:01 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 2, Data Sources: (DS[9536]), (DS[9539])
05/28/2009 04:16:07 PM - SYSTEM STATS: Time:66.0896 Method:spine Processes:1 Threads:10 Hosts:2265 HostsPerProcess:2265 DataSources:13975 RRDsProcessed:9352

I'll switch back to cmd.php and see how it goes.

Regards,

ajt.
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

1) you really should use the php script server for increased efficiency among multiple servers.
2) if the script only occasionally fails, sounds like it needs some improvement since that is the point of failure.
ajtwatching
Posts: 15
Joined: Tue Oct 17, 2006 10:27 pm

Post by ajtwatching »

Yeah, I'm back on the php script server as it doesn't seem to be the problem. Much friendlier on my system too!

Anyway, ran with cmd.php overnight and only had one instance of the poller output issue.

05/28/2009 11:15:02 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 2, Data Sources: (DS[9536]), (DS[9539])

I investigated this and found I wasn't handling a named restart quite right which explains why I had two problems at 23:15.

| 778012 | 2009-05-28 23:09:01 | dns_stats | suc=-8.370,ref=0.000,nxr=-1.253,nxd=-2.507,rec=0.000,f=-0.027 |
| 779675 | 2009-05-28 23:10:02 | dns_stats | suc=-10.360,ref=0.000,nxr=-1.227,nxd=-3.587,rec=0.000,f=-56.203 |

I have corrected that issue, and I think it's safe to conclude that cmd.php is working a treat.

I'll switch back to spine and see if the problem manifests itself again.

Regards,

ajt.
ajtwatching
Posts: 15
Joined: Tue Oct 17, 2006 10:27 pm

Post by ajtwatching »

Well, I'm back on spine, back on php script server and all seems to be well again. No obvious changes made that have contributed to this but will have to see how it tracks over the course of time!

ajt.
Post Reply

Who is online

Users browsing this forum: No registered users and 4 guests