Cacti 0.8.7 cmd.php to spine, several hosts show down.
Moderators: Developers, Moderators
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
Run an ethereal/wireshark capture during polling and send me it using PM.
TheWitness
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
Thank you. I will be back in Detroit on Saturday and likely working on Cacti 0.8.8, and a few other things on Sunday.
TheWitness
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
-
- Posts: 28
- Joined: Fri Nov 16, 2007 1:55 pm
- Location: Stuttgart, Germany
Ah, another SPARC/Solaris fellow? Thank god, i almost believed i'm alone with thatnvetro wrote:Anyone?
Did you upgrade from an earlier version of cacti, or is this a fresh install? If it's an upgrade, i had some problems with the SNMP-security-level although i'm using "authNoPriv" some hosts had "snmp_priv_protocol" != "[None]" in the cacti.host mysql table which causes an invalid command line for the snmpwalk/snmpget command. To verify connect to the DB with:
Code: Select all
mysql -u <yourcactiuser> -p cacti
Code: Select all
select id,hostname,snmp_version,snmp_priv_protocol from host;
Regards,
Frank
Frank,
Thanks for the reply! So I ran the SQL querry and everything looks to be in order...the SNMP_Version is 3, and the snmp_priv_protocol are all [None].....the details on a "down" host match the details of an "up" host also on the querry...What else ya got? I really dont want to install that packet monitioring app on this box as its a production machine and we do not have another box to acomplish this.
The Witness how else can I provide you with more information...also, for the record......several hosts that are "up" are on the same subnet as several hosts that are "down" I do not think its a network issue as we changed the network location of this box before to rule that issue out.
Thanks for the reply! So I ran the SQL querry and everything looks to be in order...the SNMP_Version is 3, and the snmp_priv_protocol are all [None].....the details on a "down" host match the details of an "up" host also on the querry...What else ya got? I really dont want to install that packet monitioring app on this box as its a production machine and we do not have another box to acomplish this.
The Witness how else can I provide you with more information...also, for the record......several hosts that are "up" are on the same subnet as several hosts that are "down" I do not think its a network issue as we changed the network location of this box before to rule that issue out.
-
- Posts: 28
- Joined: Fri Nov 16, 2007 1:55 pm
- Location: Stuttgart, Germany
Please pick a host-ID shown as down. This is the number in the "... Host[62] ..." logfile output. If you're running the spine under its own user, switch to that user via 'su - <cactiuser>'. Run spine on the command line:nvetro wrote:What else ya got?
Code: Select all
spine -f <number> -l <number> -R -S -V 5
Code: Select all
truss -f spine -f <number> -l <number> -R -S -V 5
-
- Posts: 19
- Joined: Tue Feb 27, 2007 10:30 pm
-
- Posts: 28
- Joined: Fri Nov 16, 2007 1:55 pm
- Location: Stuttgart, Germany
Sorry, the truss'd output file and spine debugging output file are basically the same. Something went wrong with your truss run.nvetro wrote:I sent you a PM with the outputs of those two commands in an attachment. Let me know if you need anything else, this issue has really stumped me.
But this is odd:
Code: Select all
...
DEBUG: SQL:'SELECT id, hostname, snmp_community, snmp_version, snmp_username, snmp_password, snmp_auth_protocol, snmp_priv_passphrase, snmp_priv_protocol, snmp_context, snmp_port, snmp_timeout, max_oids, availability_method, ping_method, ping_port, ping_timeout, ping_retries, status, status_event_count, status_fail_date, status_rec_date, status_last_error, min_time, max_time, cur_time, avg_time, total_polls, failed_polls, availability FROM host WHERE id=72'
DEBUG: The Value of Active Threads is 1
Host[72] SNMP Result: Host responded to SNMP
DEBUG: SQL:'UPDATE host SET status='2', status_event_count='1', status_fail_date='2008-02-06 13:05:00', status_rec_date='2008-02-12 12:19', status_last_error='Host did not respond to SNMP', min_time='0.492100', max_time='2194.970000', cur_time='3.013850', avg_time='742.754782', total_polls='17159', failed_polls='1930', availability='88.7523' WHERE id='72''
DEBUG: SQL:'SELECT data_query_id, action, op, assert_value, arg1 FROM poller_reindex WHERE host_id=72'
Host[72] Host has no information for recache.
DEBUG: SQL:'SELECT snmp_port, count(snmp_port) FROM poller_item WHERE host_id=72 AND rrd_next_step < 0 GROUP BY snmp_port'
DEBUG: SQL:'SELECT action, hostname, snmp_community, snmp_version, snmp_username, snmp_password, rrd_name, rrd_path, arg1, arg2, arg3, local_data_id, rrd_num, snmp_port, snmp_timeout, snmp_auth_protocol, snmp_priv_passphrase, snmp_priv_protocol, snmp_context FROM poller_item WHERE host_id=72 and rrd_next_step <=0 ORDER by snmp_port'
DEBUG: SQL:'UPDATE poller_item SET rrd_next_step=rrd_next_step-300 WHERE host_id=72'
DEBUG: SQL:'UPDATE poller_item SET rrd_next_step=rrd_step-300 WHERE rrd_next_step < 0 and host_id=72'
...
Host[72] DS[1436] SNMP: v3: 216.105.160.80, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.2, value: 267631185
Host[72] DS[1436] SNMP: v3: 216.105.160.80, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.2, value: 1600107098
Regards,
Frank
ok I set the snmp timeout from 500ms (default) and ping timeout value from 400ms (default) to 1500ms (1.5secodns) for 8 hosts which were down, lets see if this fixes the issue....I don't think it will because if I do an snmpwalk from CLI it will timeout...thats something up on the host site right wouldn't you say and not cacti?
ok, I THINK its fixed...all hosts are currently "recovering". Here is what I did...increasing the SNMP Timeout for each host didn't do anything, increasing the ping timeout for each host didn't do anything. What DID do something is changing the host detection method for each host to Ping & SNMP, it has the default ms now, default port (23) and default protocol (udp). All coming up now, ill report back in a bit.
Who is online
Users browsing this forum: No registered users and 2 guests