Cacti 0.8.7 cmd.php to spine, several hosts show down.

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
frankfegert
Posts: 28
Joined: Fri Nov 16, 2007 1:55 pm
Location: Stuttgart, Germany

Post by frankfegert »

nvetro wrote: ... DID do something is changing the host detection method for each host to Ping & SNMP, it has the default ms now, default port (23) and default protocol (udp). All coming up now, ill report back in a bit.
Well, of course that would work. AFAICT from the spine debug output, if you set "Ping&SNMP" and the ping works, SNMP is never tried. You wrote that a snmpwalk from the command line for those particular hosts also times out? This really sounds like a network problem or an issue with the snmpd on the target hosts. Might be worth to dig deeper into that. Are your target hosts protected by host based firewalls like IPF or IPTables? Look into the firewalls logs, to see if there are denied packets from/to your cacti server. Are the target hosts under heavy load/network traffic?

As TheWitness already said, the next thing to get to the bottom of this is to install a sniffer (tcpdump) on the cacti server and the target host and sniff one full spine command posted earlier. If you can't install tcpdump, the Solaris "snoop" command can produce trace files that are readable by wireshark/ethereal.

Regards,

Frank
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

Seems like the hosts are up, some are polling fine with data on the graphs, and some are showing very broken data on the graphs. What is the difference between cmd.php and spine that is so different? cmd.php worked flawlessly...Frank, can you tell me a few settings for the poller tab that will make this spine optimal to rule out that issue...also if u can tell me the ping timeout and snmp timeout, as well as max oid on the host level i should set also that would be great.
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

Can someone please explain to me why turning "debug" logging on for one pass instantly downs all my hosts? I get error messages for every host:

Code: Select all

02/14/2008 09:30:27 AM - SPINE: Poller[0] Host[66] Hostname[192,168.1.66] ERROR: HOST EVENT: Host is DOWN Message: Host did not respond to SNMP, UDP: Ping timed out
02/14/2008 09:30:27 AM - SPINE: Poller[0] Host[66] SNMP Result: Host did not respond to SNMP
02/14/2008 09:30:27 AM - SPINE: Poller[0] Host[66] PING Result: UDP: Ping timed out
The ping timeout value is set to 1500ms for each host, with a ping retry count of 1, UDP ping on port 23...works fine without "debug" on, but downs everything once it is on.
frankfegert
Posts: 28
Joined: Fri Nov 16, 2007 1:55 pm
Location: Stuttgart, Germany

Post by frankfegert »

nvetro wrote:Frank, can you tell me a few settings for the poller tab that will make this spine optimal to rule out that issue...also if u can tell me the ping timeout and snmp timeout, as well as max oid on the host level i should set also that would be great.
No problem, but mind you that those values are very dependent on how your network/environment is set up. E.g. we have some hosts behind lots of router-hops and crappy connections, those needed timeout values up to 5000ms.

Maximum Concurrent Poller Processes: 2 (Its a dual CPU machine)
Maximum Threads per Process: 10
The Maximum SNMP OID's Per SNMP Get Request: 10
Downed Host Detection: Ping and SNMP
Ping Type: ICMP
Ping Timeout Value: 400 (some exceptions up to 5000)
Ping Retry Count: 1
SNMP Timeout: 1000 (some exceptions up to 5000)
SNMP Retries: 3
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

how many php script servers?
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

Also, i decided to download and installed the newest version of cacti run it as cmd.php for awhile (which didn't error once in 3 hours) and then install spine.....Now this is a complete NEW install, only thing i copied over was the scripts directory....and when i switch to spine, i still randomly get this:

Code: Select all

2/14/2008 03:10:53 PM - SYSTEM STATS: Time:53.5418 Method:spine Processes:2 Threads:10 Hosts:36 HostsPerProcess:18 DataSources:682 RRDsProcessed:415
02/14/2008 03:06:25 PM - SYSTEM STATS: Time:84.0902 Method:spine Processes:2 Threads:10 Hosts:36 HostsPerProcess:18 DataSources:682 RRDsProcessed:427
02/14/2008 03:06:20 PM - SPINE: Poller[0] Host[64] DS[1207] SS[2] WARNING: Result from SERVER not valid. Partial Result: ...
02/14/2008 03:05:58 PM - SPINE: Poller[0] Host[64] DS[1206] SS[6] WARNING: Result from SERVER not valid. Partial Result: ...
02/14/2008 03:05:57 PM - SPINE: Poller[0] WARNING: SS[6] The PHP Script Server did not respond in time and will therefore be restarted
02/14/2008 03:01:41 PM - SYSTEM STATS: Time:100.9895 Method:spine Processes:1 Threads:25 Hosts:36 HostsPerProcess:36 DataSources:682 RRDsProcessed:434
02/14/2008 03:01:41 PM - SPINE: Poller[0] Host[64] DS[1207] SS[3] WARNING: Result from SERVER not valid. Partial Result: ...
02/14/2008 03:01:41 PM - SPINE: Poller[0] WARNING: SS[3] The PHP Script Server did not respond in time and will therefore be restarted
02/14/2008 03:01:30 PM - SPINE: Poller[0] Host[14] DS[1832] WARNING: SNMP timeout detected [5000 ms], ignoring host '192.168.71.51'
02/14/2008 03:01:16 PM - SPINE: Poller[0] Host[64] DS[1206] SS[7] WARNING: Result from SERVER not valid. Partial Result: ...
02/14/2008 03:01:10 PM - SPINE: Poller[0] Host[14] DS[1832] WARNING: SNMP timeout detected [5000 ms], ignoring host '192.168.71.51'
02/14/2008 02:55:53 PM - SYSTEM STATS: Time:52.3846 Method:spine Processes:1 Threads:25 Hosts:36 HostsPerProcess:36 DataSources:682 RRDsProcessed:392
02/14/2008 02:51:07 PM - SYSTEM STATS: Time:66.4636 Method:spine Processes:1 Threads:25 Hosts:36 HostsPerProcess:36 DataSources:682 RRDsProcessed:404
02/14/2008 02:51:07 PM - SPINE: Poller[0] Host[14] DS[1832] WARNING: SNMP timeout detected [3000 ms], ignoring host '192.168.71.51'
02/14/2008 02:50:55 PM - SPINE: Poller[0] Host[14] DS[1832] WARNING: SNMP timeout detected [3000 ms], ignoring host '192.168.71.51'

What do you all think? When I snmpwalk the oid's they all return data. I'm out of ideas.
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

ok I went a step further, new cacti install, new spine install, AND FRESH mysql database, started out with (0) hosts, base install, and all that jazz.....So I added (2) hosts....they were polling fine with cmd.php, 2 hosts with several graphs is taking 6-7 seconds.....so I let her run a few cycles through cmd.php then switch over to spine, spine runs fine on first poll BUT for some odd reason its takes the same ammount of time as cmd.php, i let her run again and I get an SNMP timeout 500ms....i'm setting it to 1500ms to see if that helps here is the log, whatcha think?

Code: Select all

02/14/2008 08:15:08 PM - SYSTEM STATS: Time:7.2120 Method:spine Processes:1 Threads:25 Hosts:3 HostsPerProcess:3 DataSources:45 RRDsProcessed:18  
02/14/2008 08:15:07 PM - SPINE: Poller[0] Host[2] DS[12] WARNING: SNMP timeout detected [500 ms], ignoring host '192.168.71.51'  
02/14/2008 08:15:05 PM - SPINE: Poller[0] Host[2] DS[12] WARNING: SNMP timeout detected [500 ms], ignoring host '192.168.71.51'  
02/14/2008 08:10:07 PM - SYSTEM STATS: Time:6.5433 Method:spine Processes:10 Threads:5 Hosts:3 HostsPerProcess:1 DataSources:45 RRDsProcessed:29  
02/14/2008 08:05:07 PM - SYSTEM STATS: Time:6.5418 Method:cmd.php Processes:10 Threads:N/A Hosts:3 HostsPerProcess:1 DataSources:45 RRDsProcessed:29  
02/14/2008 08:00:07 PM - SYSTEM STATS: Time:6.7296 Method:cmd.php Processes:10 Threads:N/A Hosts:3 HostsPerProcess:1 DataSources:45 RRDsProcessed:29  
02/14/2008 07:55:07 PM - SYSTEM STATS: Time:7.3199 Method:cmd.php Processes:10 Threads:N/A Hosts:2 HostsPerProcess:1 DataSources:21 RRDsProcessed:13  
02/14/2008 07:50:08 PM - SYSTEM STATS: Time:7.3233 Method:cmd.php Processes:10 Threads:N/A Hosts:2 HostsPerProcess:1 DataSources:21 RRDsProcessed:13  
02/14/2008 07:45:06 PM - SYSTEM STATS: Time:6.3199 Method:cmd.php Processes:10 Threads:N/A Hosts:2 HostsPerProcess:1 DataSources:21 RRDsProcessed:13  
02/14/2008 07:40:08 PM - SYSTEM STATS: Time:7.3201 Method:cmd.php Processes:10 Threads:N/A Hosts:2 HostsPerProcess:1 DataSources:21 RRDsProcessed:13  
02/14/2008 07:35:07 PM - SYSTEM STATS: Time:7.3192 Method:cmd.php Processes:10 Threads:N/A Hosts:2 HostsPerProcess:1 DataSources:21 RRDsProcessed:13  
02/14/2008 07:30:07 PM - SYSTEM STATS: Time:7.3097 Method:cmd.php Processes:10 Threads:N/A Hosts:2 HostsPerProcess:1 DataSources:21 RRDsProcessed:13  
02/14/2008 07:25:07 PM - SYSTEM STATS: Time:6.4091 Method:cmd.php Processes:10 Threads:N/A Hosts:2 HostsPerProcess:1 DataSources:21 RRDsProcessed:13  
02/14/2008 07:20:07 PM - SYSTEM STATS: Time:7.4163 Method:cmd.php Processes:10 Threads:N/A Hosts:2 HostsPerProcess:1 DataSources:21 RRDsProcessed:13  
EDIT:

No go on the 1500ms snmp timeout either, check it out:

Code: Select all

02/14/2008 08:20:14 PM - SYSTEM STATS: Time:14.2821 Method:spine Processes:1 Threads:25 Hosts:3 HostsPerProcess:3 DataSources:45 RRDsProcessed:18  
02/14/2008 08:20:14 PM - SPINE: Poller[0] Host[2] DS[12] WARNING: SNMP timeout detected [1500 ms], ignoring host '192.168.71.51'  
02/14/2008 08:20:08 PM - SPINE: Poller[0] Host[2] DS[12] WARNING: SNMP timeout detected [1500 ms], ignoring host '192.168.71.51'  
Edit #2:

I'm sure you're wondering what that datasource is, its the standard 'interface traffic' script that comes with cacti, nothing special.

Edit #3:

The time keeps going higher and higher, and I have a new error to add to the mix, this DS is 'physical memory'

Code: Select all

02/14/2008 08:25:27 PM - SYSTEM STATS: Time:26.4027 Method:spine Processes:1 Threads:25 Hosts:3 HostsPerProcess:3 DataSources:45 RRDsProcessed:18  
02/14/2008 08:25:27 PM - SPINE: Poller[0] Host[2] DS[12] WARNING: SNMP timeout detected [3000 ms], ignoring host '192.168.71.51'  
02/14/2008 08:25:15 PM - SPINE: Poller[0] Host[2] DS[12] WARNING: SNMP timeout detected [3000 ms], ignoring host '192.168.71.51'  
02/14/2008 08:25:00 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Potential Data Source Issues for Data Sources: hdd_used(DS[13]) 
I'm out of ideas guys, the only common thing is the fact that i'm running on Solaris 10 w/ dual sparc cpu's. I changed servers not to long ago, wasn't that.
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

ok, so I add 1 host only....all is fine, add 2,3,4,5, etc.. then she starts snmp timing out on a specific datasource (interface stats) if i disable ALL the other hosts and just have that 1 it will poll fine in spine....even tried setting snmp timeout higher and higher, didn't work.......max oid to 1...didn't work...................cacti w/ spine works with 1 host only...i'm getting frusterated...what the heck does spine do so much different then cmd.php? cmd.php works fine, but its taking awhile and want to use spine....

Let me give you guys more information, the datasource that is snmp timing out, here is it in debug mode....

Code: Select all

Data Source Debug

/usr/local/rrdtool-1.2.19/bin/rrdtool create \
/data/www/cacti-0.8.7b/rra/celebweb2_traffic_in_48.rrd \
--step 300  \
DS:traffic_in:COUNTER:600:0:1000000000 \
DS:traffic_out:COUNTER:600:0:1000000000 \
RRA:AVERAGE:0.5:1:600 \
RRA:AVERAGE:0.5:6:700 \
RRA:AVERAGE:0.5:24:775 \
RRA:AVERAGE:0.5:288:797 \
RRA:MAX:0.5:1:600 \
RRA:MAX:0.5:6:700 \
RRA:MAX:0.5:24:775 \
RRA:MAX:0.5:288:797 \
all seems fine right???? Here is the cacti.log with all my timeouts and transitions between hosts and cmd.php to spine, etc..

Code: Select all

 02/15/2008 10:35:06 AM - SYSTEM STATS: Time:6.1943 Method:spine Processes:1 Threads:25 Hosts:2 HostsPerProcess:2 DataSources:23 RRDsProcessed:14
02/15/2008 10:30:07 AM - SYSTEM STATS: Time:7.2239 Method:spine Processes:1 Threads:25 Hosts:2 HostsPerProcess:2 DataSources:23 RRDsProcessed:14
02/15/2008 10:25:30 AM - SYSTEM STATS: Time:30.5049 Method:spine Processes:1 Threads:25 Hosts:5 HostsPerProcess:5 DataSources:89 RRDsProcessed:27
02/15/2008 10:25:30 AM - SPINE: Poller[0] Host[6] DS[48] WARNING: SNMP timeout detected [3000 ms], ignoring host '216.105.160.34'
02/15/2008 10:25:18 AM - SPINE: Poller[0] Host[6] DS[48] WARNING: SNMP timeout detected [3000 ms], ignoring host '216.105.160.34'
02/15/2008 10:25:00 AM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Potential Data Source Issues for Data Sources: hdd_total(DS[44])
02/15/2008 10:20:16 AM - SYSTEM STATS: Time:15.2916 Method:spine Processes:1 Threads:25 Hosts:5 HostsPerProcess:5 DataSources:89 RRDsProcessed:18
02/15/2008 10:20:15 AM - SPINE: Poller[0] Host[6] DS[48] WARNING: SNMP timeout detected [1500 ms], ignoring host '216.105.160.34'
02/15/2008 10:20:09 AM - SPINE: Poller[0] Host[6] DS[48] WARNING: SNMP timeout detected [1500 ms], ignoring host '216.105.160.34'
02/15/2008 10:15:08 AM - SYSTEM STATS: Time:8.2173 Method:spine Processes:1 Threads:25 Hosts:5 HostsPerProcess:5 DataSources:89 RRDsProcessed:16
02/15/2008 10:15:00 AM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Potential Data Source Issues for Data Sources: hdd_total(DS[44])
02/15/2008 10:10:16 AM - SYSTEM STATS: Time:15.2897 Method:spine Processes:1 Threads:25 Hosts:5 HostsPerProcess:5 DataSources:89 RRDsProcessed:18
02/15/2008 10:10:15 AM - SPINE: Poller[0] Host[6] DS[48] WARNING: SNMP timeout detected [1500 ms], ignoring host '216.105.160.34'
02/15/2008 10:10:09 AM - SPINE: Poller[0] Host[6] DS[48] WARNING: SNMP timeout detected [1500 ms], ignoring host '216.105.160.34'
02/15/2008 10:05:22 AM - SYSTEM STATS: Time:21.4855 Method:spine Processes:1 Threads:25 Hosts:5 HostsPerProcess:5 DataSources:89 RRDsProcessed:40
02/15/2008 10:05:22 AM - SPINE: Poller[0] Host[6] DS[48] WARNING: SNMP timeout detected [500 ms], ignoring host '216.105.160.34'
02/15/2008 10:05:20 AM - SPINE: Poller[0] Host[6] DS[48] WARNING: SNMP timeout detected [500 ms], ignoring host '216.105.160.34'
02/15/2008 10:05:13 AM - SPINE: Poller[0] Host[8] DS[64] WARNING: SNMP timeout detected [500 ms], ignoring host '192.168.70.81'
02/15/2008 10:05:11 AM - SPINE: Poller[0] Host[8] DS[64] WARNING: SNMP timeout detected [500 ms], ignoring host '192.168.70.81'
02/15/2008 10:00:06 AM - SYSTEM STATS: Time:6.2090 Method:spine Processes:1 Threads:25 Hosts:5 HostsPerProcess:5 DataSources:89 RRDsProcessed:16
02/15/2008 09:55:07 AM - SYSTEM STATS: Time:6.2050 Method:spine Processes:1 Threads:25 Hosts:5 HostsPerProcess:5 DataSources:89 RRDsProcessed:16
02/15/2008 09:55:01 AM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Potential Data Source Issues for Data Sources: hdd_total(DS[44])
02/15/2008 09:50:27 AM - SYSTEM STATS: Time:27.4117 Method:spine Processes:1 Threads:25 Hosts:5 HostsPerProcess:5 DataSources:89 RRDsProcessed:18
02/15/2008 09:50:26 AM - SPINE: Poller[0] Host[6] DS[48] WARNING: SNMP timeout detected [3000 ms], ignoring host '216.105.160.34'
02/15/2008 09:50:14 AM - SPINE: Poller[0] Host[6] DS[48] WARNING: SNMP timeout detected [3000 ms], ignoring host '216.105.160.34'
02/15/2008 09:45:15 AM - SYSTEM STATS: Time:14.2789 Method:spine Processes:1 Threads:25 Hosts:5 HostsPerProcess:5 DataSources:89 RRDsProcessed:16
02/15/2008 09:40:11 AM - SYSTEM STATS: Time:10.3724 Method:spine Processes:1 Threads:25 Hosts:5 HostsPerProcess:5 DataSources:89 RRDsProcessed:28
02/15/2008 09:40:11 AM - SPINE: Poller[0] Host[6] DS[48] WARNING: SNMP timeout detected [500 ms], ignoring host '216.105.160.34'
02/15/2008 09:40:09 AM - SPINE: Poller[0] Host[6] DS[48] WARNING: SNMP timeout detected [500 ms], ignoring host '216.105.160.34'
02/15/2008 09:35:06 AM - SYSTEM STATS: Time:6.1982 Method:spine Processes:1 Threads:25 Hosts:5 HostsPerProcess:5 DataSources:89 RRDsProcessed:16
02/15/2008 09:30:06 AM - SYSTEM STATS: Time:6.1999 Method:spine Processes:1 Threads:25 Hosts:4 HostsPerProcess:4 DataSources:68 RRDsProcessed:16
02/15/2008 09:25:07 AM - SYSTEM STATS: Time:7.2205 Method:spine Processes:1 Threads:25 Hosts:3 HostsPerProcess:3 DataSources:47 RRDsProcessed:16
02/15/2008 09:20:06 AM - SYSTEM STATS: Time:6.2009 Method:spine Processes:1 Threads:25 Hosts:3 HostsPerProcess:3 DataSources:47 RRDsProcessed:16
02/15/2008 09:15:06 AM - SYSTEM STATS: Time:6.2005 Method:spine Processes:1 Threads:25 Hosts:3 HostsPerProcess:3 DataSources:47 RRDsProcessed:16
02/15/2008 09:10:06 AM - SYSTEM STATS: Time:6.2009 Method:spine Processes:1 Threads:25 Hosts:3 HostsPerProcess:3 DataSources:47 RRDsProcessed:16
02/15/2008 09:05:43 AM - SYSTEM STATS: Time:42.5380 Method:spine Processes:1 Threads:25 Hosts:5 HostsPerProcess:5 DataSources:45 RRDsProcessed:0
02/15/2008 09:00:43 AM - SYSTEM STATS: Time:42.5587 Method:spine Processes:1 Threads:25 Hosts:4 HostsPerProcess:4 DataSources:45 RRDsProcessed:16
When I SNMPWALK the oids for that data source, i get a response for all of them, Help please :-?
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

anyone?
zonekiller
Posts: 25
Joined: Mon Feb 18, 2008 2:38 am
Location: Denmark

Did you got it fixed

Post by zonekiller »

Hello nvetro

I'm just curious. But did you get this Spine Poller Working. Because I have my spine polling my hosts fine, but from time to time, 3 polls in a row will noget succed. So I have choppy graphs :s

Best regards
Jan Madsen
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

No sir, I have yet to get spine working at all, no one knows the answer
zonekiller
Posts: 25
Joined: Mon Feb 18, 2008 2:38 am
Location: Denmark

Post by zonekiller »

Nvetro have you entered the location of your spine at the settings -> paths page ?

In the bottom you need to specifi your spine as
/opt/cacti-spine-0.8.7a/bin/spine
That is here I have compiled my spine to be :)

/Jan Madsen
frankfegert
Posts: 28
Joined: Fri Nov 16, 2007 1:55 pm
Location: Stuttgart, Germany

Post by frankfegert »

nvetro wrote:how many php script servers?
Number of PHP Script Servers: 2
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

zonekiller: my spine path is correct :D

ANYONE have any ideas?
frankfegert
Posts: 28
Joined: Fri Nov 16, 2007 1:55 pm
Location: Stuttgart, Germany

Post by frankfegert »

nvetro wrote:zonekiller: my spine path is correct :D

ANYONE have any ideas?
Well, no offense, but instead of asking the same question over and over, why don't you provide some more debugging information? So again, run spine with truss and run snoop on a failing spine run. Narrowed down as much as possible would be very helpful - meaning only the failing host or service.

Regards,

Frank
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests