Host Disk query failure - 8.6j on Red Hat querying WinXP box

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
briaread
Posts: 4
Joined: Tue Jan 15, 2008 12:48 pm

Host Disk query failure - 8.6j on Red Hat querying WinXP box

Post by briaread »

Hello,

We have two installations of Cacti, one an upgrade the other a new installation. Like others, we're having issues with the host disk partition queries; in our case the problem is on the box running the upgraded Cacti. Queries from the upgraded box to a WinXP server fail whereas queries from the freshly installed box to the same server work fine. Here's the specs on the two systems:

Upgraded Cacti:
Operating System: Red Hat 3.4.6-3.1
Webserver: Apache 2.2.4
Cacti: 0.8.6j
MySQL: 5.0.27
PHP: PHP 5.2.3 (cli) (built: Aug 30 2007 15:51:09)
RRDTool: 1.2.11
Net-SNMP: 5.1.2-11
Poller type: cmd.php

Freshly installed Cacti:
Operating System: OpenSuse 10.3
Webserver: Apache 2-2.2.4-70.2
Cacti: 0.8.6j-64.2
MySQL: 5.0.45-22
PHP: PHP 5.2.5 with Suhosin-Patch 0.9.6.2 (cli) (built: Dec 12 2007 03:47:43)
RRDTool: 1.2.23-47
Net-SNMP: 5.4.1-19
Poller type: cmd.php

The problem symptoms on the upgraded system are as follows:
1) Existing hosts report/graph disk used and disk new correctly.
2) Newly added hosts report either nan, 0 or only partial information. This varied as I deleted and re-created the Device. Currently reporting only Disk Total on 1 of 4 drives.
3) Increasing the SNMP timeout to 5 secs did nothing.

As suggested by a moderator, I set the log on the upgraded system to DEBUG and was able to capture the following:

01) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] Host[286] DS[5117] SERVER: /usr/local/apache2/htdocs/cacti/scripts/ss_host_disk.php ss_host_disk SERVERNAME 286 1:161:500:comm-string:::MD5::[None] get used 6, output: 26565885952
02) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] DEBUG: SQL Exec: "insert into poller_output (local_data_id, rrd_name, time, output) values (5117, 'hdd_used', '2008-01-14 21:40:55', '26565885952')"
03) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] Host[286] DS[5117] WARNING: Result from SERVER not valid. Partial Result: 01/14/2008 09:40:56
04) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] Host[286] DS[5117] SERVER: /usr/local/apache2/htdocs/cacti/scripts/ss_host_disk.php ss_host_disk SERVERNAME 286 1:161:500:comm-string:::MD5::[None] get total 6, output: U
05) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] DEBUG: SQL Exec: "insert into poller_output (local_data_id, rrd_name, time, output) values (5117, 'hdd_total', '2008-01-14 21:40:55', 'U')"
06) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] Host[286] DS[5116] WARNING: Result from SERVER not valid. Partial Result:
07) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] Host[286] DS[5116] SERVER: /usr/local/apache2/htdocs/cacti/scripts/ss_host_disk.php ss_host_disk SERVERNAME 286 1:161:500:comm-string:::MD5::[None] get total 5, output: U
08) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] DEBUG: SQL Exec: "insert into poller_output (local_data_id, rrd_name, time, output) values (5116, 'hdd_total', '2008-01-14 21:40:55', 'U')"
09) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] Host[286] DS[5116] WARNING: Result from SERVER not valid. Partial Result:
10) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] Host[286] DS[5116] SERVER: /usr/local/apache2/htdocs/cacti/scripts/ss_host_disk.php ss_host_disk SERVERNAME 286 1:161:500:comm-string:::MD5::[None] get used 5, output: U
11) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] DEBUG: SQL Exec: "insert into poller_output (local_data_id, rrd_name, time, output) values (5116, 'hdd_used', '2008-01-14 21:40:55', 'U')"
12) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] Host[286] DS[5114] WARNING: Result from SERVER not valid. Partial Result:
13) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] Host[286] DS[5114] SERVER: /usr/local/apache2/htdocs/cacti/scripts/ss_host_disk.php ss_host_disk SERVERNAME 286 1:161:500:comm-string:::MD5::[None] get used 2, output: U
14) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] DEBUG: SQL Exec: "insert into poller_output (local_data_id, rrd_name, time, output) values (5114, 'hdd_used', '2008-01-14 21:40:55', 'U')"
15) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] Host[286] DS[5114] WARNING: Result from SERVER not valid. Partial Result:
16) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] Host[286] DS[5114] SERVER: /usr/local/apache2/htdocs/cacti/scripts/ss_host_disk.php ss_host_disk SERVERNAME 286 1:161:500:comm-string:::MD5::[None] get total 2, output: U
17) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] DEBUG: SQL Exec: "insert into poller_output (local_data_id, rrd_name, time, output) values (5114, 'hdd_total', '2008-01-14 21:40:55', 'U')"
18) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] Host[286] DS[5115] WARNING: Result from SERVER not valid. Partial Result:
19) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] Host[286] DS[5115] SERVER: /usr/local/apache2/htdocs/cacti/scripts/ss_host_disk.php ss_host_disk SERVERNAME 286 1:161:500:comm-string:::MD5::[None] get used 3, output: U
20) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] DEBUG: SQL Exec: "insert into poller_output (local_data_id, rrd_name, time, output) values (5115, 'hdd_used', '2008-01-14 21:40:55', 'U')"
21) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] Host[286] DS[5115] WARNING: Result from SERVER not valid. Partial Result:
22) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] Host[286] DS[5115] SERVER: /usr/local/apache2/htdocs/cacti/scripts/ss_host_disk.php ss_host_disk SERVERNAME 286 1:161:500:comm-string:::MD5::[None] get total 3, output: U
23) 01/14/2008 09:40:56 PM - CMDPHP: Poller[0] DEBUG: SQL Exec: "insert into poller_output (local_data_id, rrd_name, time, output) values (5115, 'hdd_total', '2008-01-14 21:40:55', 'U')"


The first run of ss_host_disk.php (line 01) returned a correct value (which is providing the Disk Total being graphed successfully), so I'm wondering why the debug on line 03 reports that the result wasn't valid. As can be seen, no results afterward are considered valid. I see in cmd.php where $output = "U" (as shown in the debugs above) but as a novice php programmer, this is as far as I've come in deciphering what's going on at that point.

Running ss_host_disk.php within php script_server.php works fine:
$ php script_server.php
PHP Script Server has Started - Parent is cmd
/usr/local/apache2/htdocs/cacti/scripts/ss_host_disk.php ss_host_disk SERVERNAME 286 1:161:500:comm-string:::MD5::[None] get total 6
53678141440
quit
$
I can run manual queries like this on all the disks for both Total and Used without problem.

I've seen the postings that there's a problem within 8.7 with this query but haven't seen confirmation that the problem exists in 8.6j. If anyone could suggest where else to look or what else to try, I'd appreciate it.

Thanks
Brian Read
:D
briaread
Posts: 4
Joined: Tue Jan 15, 2008 12:48 pm

Host Disk query failure - 8.6j on Red Hat querying WinXP box

Post by briaread »

Does anyone have any suggestions, ideas, etc pertaining to this problem? I'd be happy with any input at this point.... :(

TIA,
BR
bsc
Posts: 8
Joined: Wed Jan 23, 2008 2:40 pm

Same

Post by bsc »

I have been having the exact same problem to a T. I am trying to research it right now, but I haven't found anything useful yet. I have 8.7a and I am still having the problem.

I rebuilt my poller cache for all 3 hosts that were having problems as well, no change. :cry:
briaread
Posts: 4
Joined: Tue Jan 15, 2008 12:48 pm

Host Disk query failure - 8.6j on Red Hat querying WinXP box

Post by briaread »

I've been digging through Cacti code to try and understand what functions do what, what calls what, etc - still haven't figured out the problem. Just now starting to get an idea of how the overall system works. I'll have to punt soon as other priorities will take over, but if I find anything I'll post it here.

Good to know I'm not alone in this. :)
BR
bsc
Posts: 8
Joined: Wed Jan 23, 2008 2:40 pm

Post by bsc »

Well I turned out to be a liar.. The Poller Cache rebuild seems to have improved it a lot. I still get a couple errors, but its one in 45 minutes on one host as opposed to many every few minutes on 4 hosts..

Have you tried to rebuild your Poller Cache at all?
briaread
Posts: 4
Joined: Tue Jan 15, 2008 12:48 pm

Host Disk query failure - 8.6j on Red Hat querying WinXP box

Post by briaread »

I think I've figured it out! I noticed that the poller wasn't completing within the 5 minute window. So I got to wondering if the system simply wasn't able to handle overlapping polling. I checked and Maximum Concurrent Poller Processes was set to a single process of cmd.php! I reset it to 3 and voila! Polling is completing in around 3:30. The queries are working and the graphs are populating! Yee haw! :P

Nothin' in the code, nothing with the cache, no problem anywhere except that Cacti was hamstringing itself with a single thread. I've added 5 more systems to be monitored - all working OK.

Hopefully my experiences will serve to help others - sometimes you can drill deep and be way off base. Chalk it up as a Cacti noobie learning experience. :lol:

BR
bsc
Posts: 8
Joined: Wed Jan 23, 2008 2:40 pm

Post by bsc »

Yeah, my overnight graphs started to look almost as bad as before.

After looking at the cacti stats graph, I have a feeling my poller might not be finishing either. I just changed my spine to 6 threads. I will post the results later today.
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests