Cacti 0.8.7 cmd.php to spine, several hosts show down.

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

ok I set it to 4, it is on a dual cpu sparc machine...i think they are dual core or maybe quad core...i'm thinking dual...so I put it to 4...I'll let it run a bit and report back in a few.
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

No go, still shows a bunch of hosts down and only a few up. What else you guys got :D
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

Just for laughs here is a screen shot of the settings on a 'downed' device that shows down when using spine, but is fine using cmd.php:
Attachments
downhost.JPG
downhost.JPG (97.46 KiB) Viewed 5048 times
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

Anyone? :D
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Run an ethereal/wireshark capture during polling and send me it using PM.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

TheWitness I am working on getting that to you asap using wireshark.
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Thank you. I will be back in Detroit on Saturday and likely working on Cacti 0.8.8, and a few other things on Sunday.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
frankfegert
Posts: 28
Joined: Fri Nov 16, 2007 1:55 pm
Location: Stuttgart, Germany

Post by frankfegert »

nvetro wrote:Anyone? :D
Ah, another SPARC/Solaris fellow? Thank god, i almost believed i'm alone with that :wink:
Did you upgrade from an earlier version of cacti, or is this a fresh install? If it's an upgrade, i had some problems with the SNMP-security-level although i'm using "authNoPriv" some hosts had "snmp_priv_protocol" != "[None]" in the cacti.host mysql table which causes an invalid command line for the snmpwalk/snmpget command. To verify connect to the DB with:

Code: Select all

mysql -u <yourcactiuser> -p cacti
and verify the "snmp_priv_protocol" is set accordingly to your needs:

Code: Select all

select id,hostname,snmp_version,snmp_priv_protocol from host;
For me there were some "MD5" entries, where there should have been "[None]".

Regards,

Frank
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

Frank,

Thanks for the reply! So I ran the SQL querry and everything looks to be in order...the SNMP_Version is 3, and the snmp_priv_protocol are all [None].....the details on a "down" host match the details of an "up" host also on the querry...What else ya got? I really dont want to install that packet monitioring app on this box as its a production machine and we do not have another box to acomplish this.

The Witness how else can I provide you with more information...also, for the record......several hosts that are "up" are on the same subnet as several hosts that are "down" I do not think its a network issue as we changed the network location of this box before to rule that issue out.
frankfegert
Posts: 28
Joined: Fri Nov 16, 2007 1:55 pm
Location: Stuttgart, Germany

Post by frankfegert »

nvetro wrote:What else ya got?
Please pick a host-ID shown as down. This is the number in the "... Host[62] ..." logfile output. If you're running the spine under its own user, switch to that user via 'su - <cactiuser>'. Run spine on the command line:

Code: Select all

spine -f <number> -l <number> -R -S -V 5
and post the output. Also the output of:

Code: Select all

truss -f spine -f <number> -l <number> -R -S -V 5
could be helpful, but is usually very verbose.
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

I sent you a PM with the outputs of those two commands in an attachment. Let me know if you need anything else, this issue has really stumped me.
crimsonstone
Posts: 19
Joined: Tue Feb 27, 2007 10:30 pm

Post by crimsonstone »

I've had this issue in the past, and while I never came up with the root issue, I noticed that changing the downed host detection on the down hosts from "SNMP" to "Ping and SNMP" (or vice-versa) cleared up the issue.

/shrug :-?
frankfegert
Posts: 28
Joined: Fri Nov 16, 2007 1:55 pm
Location: Stuttgart, Germany

Post by frankfegert »

nvetro wrote:I sent you a PM with the outputs of those two commands in an attachment. Let me know if you need anything else, this issue has really stumped me.
Sorry, the truss'd output file and spine debugging output file are basically the same. Something went wrong with your truss run.

But this is odd:

Code: Select all

...
DEBUG: SQL:'SELECT id, hostname, snmp_community, snmp_version, snmp_username, snmp_password, snmp_auth_protocol, snmp_priv_passphrase, snmp_priv_protocol, snmp_context, snmp_port, snmp_timeout, max_oids, availability_method, ping_method, ping_port, ping_timeout, ping_retries, status, status_event_count, status_fail_date, status_rec_date, status_last_error, min_time, max_time, cur_time, avg_time, total_polls, failed_polls, availability  FROM host WHERE id=72'
DEBUG: The Value of Active Threads is 1
Host[72] SNMP Result: Host responded to SNMP
DEBUG: SQL:'UPDATE host SET status='2', status_event_count='1', status_fail_date='2008-02-06 13:05:00', status_rec_date='2008-02-12 12:19', status_last_error='Host did not respond to SNMP', min_time='0.492100', max_time='2194.970000', cur_time='3.013850', avg_time='742.754782', total_polls='17159', failed_polls='1930', availability='88.7523' WHERE id='72''
DEBUG: SQL:'SELECT data_query_id, action, op, assert_value, arg1 FROM poller_reindex WHERE host_id=72'
Host[72] Host has no information for recache.
DEBUG: SQL:'SELECT snmp_port, count(snmp_port) FROM poller_item WHERE host_id=72 AND rrd_next_step < 0 GROUP BY snmp_port'
DEBUG: SQL:'SELECT action, hostname, snmp_community, snmp_version, snmp_username, snmp_password, rrd_name, rrd_path, arg1, arg2, arg3, local_data_id, rrd_num, snmp_port, snmp_timeout, snmp_auth_protocol, snmp_priv_passphrase, snmp_priv_protocol, snmp_context  FROM poller_item WHERE host_id=72 and rrd_next_step <=0 ORDER by snmp_port'
DEBUG: SQL:'UPDATE poller_item SET rrd_next_step=rrd_next_step-300 WHERE host_id=72'
DEBUG: SQL:'UPDATE poller_item SET rrd_next_step=rrd_step-300 WHERE rrd_next_step < 0 and host_id=72'
...
Host[72] DS[1436] SNMP: v3: 216.105.160.80, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.2, value: 267631185
Host[72] DS[1436] SNMP: v3: 216.105.160.80, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.2, value: 1600107098
So you're getting SNMP results, but the host status is set to "2" which is "recovering". Notice the values of "min_time" (0.4921) and "max_time" (2194.97), can you please make sure this isn't a timeout issue in the SNMP-ping by setting the "Ping Timeout Value" higher?

Regards,

Frank
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

ok I set the snmp timeout from 500ms (default) and ping timeout value from 400ms (default) to 1500ms (1.5secodns) for 8 hosts which were down, lets see if this fixes the issue....I don't think it will because if I do an snmpwalk from CLI it will timeout...thats something up on the host site right wouldn't you say and not cacti?
nvetro
Cacti User
Posts: 72
Joined: Tue Dec 18, 2007 11:31 am

Post by nvetro »

ok, I THINK its fixed...all hosts are currently "recovering". Here is what I did...increasing the SNMP Timeout for each host didn't do anything, increasing the ping timeout for each host didn't do anything. What DID do something is changing the host detection method for each host to Ping & SNMP, it has the default ms now, default port (23) and default protocol (udp). All coming up now, ill report back in a bit.
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests