spine 500 ms timeout

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

micoots
Posts: 5
Joined: Mon Apr 09, 2007 8:27 pm

spine 500 ms timeout

Post by micoots »

Hi,

The cmd.php poller takes a long time these days to go through my scripts, so I've started looking at spine (cactid).

I've installed and configured it, but when running it, it continually complains about:

02/10/2008 02:30:07 PM - SPINE: Poller[0] Host[4] DS[32] WARNING: SNMP timeout detected [500 ms], ignoring host 'hostname.example.com'

I've increased the time in the cacti settings from 500 ms to 5000ms but I found out this does nothing as spine doesn't read that setting.

I tracked down the 500 ms for spine in it's poller.c file. I changed all references for 500 ms to 5000 ms, recompiled spine, but spine continues to give me this 500 ms timeout.

I've also reduced the number of snmp oid calls from (previous) 10 to 1 to make sure it doesn't try and make too many snmp connections to the host, but all to no avail, it continues to time out after 500 ms.

I'm using cacti 0.8.7a with plugin architecture 1.4 and spine 0.8.7a

Any ideas what else I could try or why changing the timeouts in the poller.c for spine doesn't have any effect?

Thanks.

Michael.
Ateo
Posts: 26
Joined: Fri Sep 26, 2008 1:07 pm
Location: http://reno.nevada.u$

Post by Ateo »

I am bumping this nearly year old thread because I'm running into the same issue.

My situtation, much the same. However, poller cmd.php works fine and does not time out. Spine does.

Changing "SNMP Timeout" to any value has no effect with spine.

Spine debug output:

Code: Select all

boron bin # spine --verbosity=5 34 34
SPINE: Using spine config file [/etc/spine.conf]
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'path_webroot''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'path_cactilog''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The path_php_server variable is /var/www/cacti/htdocs/cacti/script_server.php
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The path_cactilog variable is /var/log/cacti/cacti.log
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'log_destination''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The log_destination variable is 1 (FILE)
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'path_php_binary''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The path_php variable is /usr/bin/php
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'availability_method''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The availability_method variable is 3
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'ping_recovery_count''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The ping_recovery_count variable is 2
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'ping_failure_count''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The ping_failure_count variable is 2
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'ping_method''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The ping_method variable is 2
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'ping_retries''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The ping_retries variable is 1
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'ping_timeout''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The ping_timeout variable is 400
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'log_perror''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The log_perror variable is 1
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'log_pwarn''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The log_pwarn variable is 0
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'log_pstats''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The log_pstats variable is 0
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'max_threads''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The threads variable is 10
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'poller_interval''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The polling interval is 60 seconds
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'concurrent_processes''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The number of concurrent processes is 10
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'script_timeout''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The script timeout is 10
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'php_servers''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The number of php script servers to run is 5
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT action FROM poller_item WHERE action=2 AND host_id BETWEEN 34 AND 34 LIMIT 1'
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: StartHost='34', EndHost='34', TotalPHPScripts='0'
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The PHP Script Server is Not Required
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT value FROM settings WHERE name = 'max_get_size''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The Maximum SNMP OID Get Size is 1
09/29/2008 12:04:50 PM - SPINE: Poller[0] Version 0.8.7b starting
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: MySQL is Thread Safe!
09/29/2008 12:04:50 PM - SPINE: Poller[0] SPINE: Initializing Net-SNMP API
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SNMP Header Version is 5.4.1.1
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SNMP Library Version is 5.4.1.1
09/29/2008 12:04:50 PM - SPINE: Poller[0] SPINE: Initializing PHP Script Server(s)
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT id FROM host WHERE disabled='' AND id BETWEEN 34 AND 34 ORDER BY id'
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: Initial Value of Active Threads is 0
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT snmp_port, count(snmp_port) FROM poller_item WHERE host_id=0 AND rrd_next_step < 0 GROUP BY snmp_port'
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT action, hostname, snmp_community, snmp_version, snmp_username, snmp_password, rrd_name, rrd_path, arg1, arg2, arg3, local_data_id, rrd_num, snmp_port, snmp_timeout, snmp_auth_protocol, snmp_priv_passphrase, snmp_priv_protocol, snmp_context  FROM poller_item WHERE host_id=0 and rrd_next_step <=0 ORDER by snmp_port'
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'UPDATE poller_item SET rrd_next_step=rrd_next_step-60 WHERE host_id=0'
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'UPDATE poller_item SET rrd_next_step=rrd_step-60 WHERE rrd_next_step < 0 and host_id=0'
09/29/2008 12:04:50 PM - SPINE: Poller[0] Host[0] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 2
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT id, hostname, snmp_community, snmp_version, snmp_username, snmp_password, snmp_auth_protocol, snmp_priv_passphrase, snmp_priv_protocol, snmp_context, snmp_port, snmp_timeout, max_oids, availability_method, ping_method, ping_port, ping_timeout, ping_retries, status, status_event_count, status_fail_date, status_rec_date, status_last_error, min_time, max_time, cur_time, avg_time, total_polls, failed_polls, availability  FROM host WHERE id=34'
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: UDP Ping return_code was -1, errno was 111, total_time was 21.9345
09/29/2008 12:04:50 PM - SPINE: Poller[0] Host[34] PING: Result UDP: Host is Alive
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'UPDATE host SET status='3', status_event_count='0', status_fail_date='0000-00-00 00:00:00', status_rec_date='0000-00-00 00:00:00', status_last_error='', min_time='0.018840', max_time='0.098940', cur_time='0.021930', avg_time='0.028319', total_polls='5420', failed_polls='0', availability='100.0000' WHERE id='34''
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT data_query_id, action, op, assert_value, arg1 FROM poller_reindex WHERE host_id=34'
09/29/2008 12:04:50 PM - SPINE: Poller[0] Host[34] Host has no information for recache.
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT snmp_port, count(snmp_port) FROM poller_item WHERE host_id=34 AND rrd_next_step < 0 GROUP BY snmp_port'
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'SELECT action, hostname, snmp_community, snmp_version, snmp_username, snmp_password, rrd_name, rrd_path, arg1, arg2, arg3, local_data_id, rrd_num, snmp_port, snmp_timeout, snmp_auth_protocol, snmp_priv_passphrase, snmp_priv_protocol, snmp_context  FROM poller_item WHERE host_id=34 and rrd_next_step <=0 ORDER by snmp_port'
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'UPDATE poller_item SET rrd_next_step=rrd_next_step-60 WHERE host_id=34'
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: SQL:'UPDATE poller_item SET rrd_next_step=rrd_step-60 WHERE rrd_next_step < 0 and host_id=34'
09/29/2008 12:04:50 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
09/29/2008 12:04:51 PM - SPINE: Poller[0] Host[34] DS[953] SNMP: v1: 127.0.0.1, dsname: pop3d_total, oid: .1.3.6.1.2.1.42.2.146.7, value: 5
09/29/2008 12:04:51 PM - SPINE: Poller[0] Host[34] DS[952] SNMP: v1: 127.0.0.1, dsname: pop3d, oid: .1.3.6.1.2.1.42.2.146.6, value: 1
09/29/2008 12:04:51 PM - SPINE: Poller[0] Host[34] DS[954] SNMP: v1: 127.0.0.1, dsname: pop3dssl, oid: .1.3.6.1.2.1.42.2.146.4, value: 0
09/29/2008 12:04:51 PM - SPINE: Poller[0] Host[34] DS[955] SNMP: v1: 127.0.0.1, dsname: pop3dssl_total, oid: .1.3.6.1.2.1.42.2.146.5, value: 0
09/29/2008 12:04:51 PM - SPINE: Poller[0] Host[34] DS[956] SNMP: v1: 127.0.0.1, dsname: policyd, oid: .1.3.6.1.2.1.42.2.145.12, value: 0
09/29/2008 12:04:51 PM - SPINE: Poller[0] Host[34] DS[957] SNMP: v1: 127.0.0.1, dsname: policyd_total, oid: .1.3.6.1.2.1.42.2.145.13, value: 3
09/29/2008 12:04:51 PM - SPINE: Poller[0] Host[34] DS[958] SNMP: v1: 127.0.0.1, dsname: postfix_spam, oid: .1.3.6.1.2.1.42.2.145.8, value: 0
09/29/2008 12:04:51 PM - SPINE: Poller[0] Host[34] DS[959] SNMP: v1: 127.0.0.1, dsname: postfix_spam_total, oid: .1.3.6.1.2.1.42.2.145.9, value: 0
09/29/2008 12:04:51 PM - SPINE: Poller[0] Host[34] DS[951] SNMP: v1: 127.0.0.1, dsname: imapd_ssl_total, oid: .1.3.6.1.2.1.42.2.146.1, value: 5
09/29/2008 12:04:51 PM - SPINE: Poller[0] Host[34] DS[950] SNMP: v1: 127.0.0.1, dsname: imapd_ssl, oid: .1.3.6.1.2.1.42.2.146.0, value: 1
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[949] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[949] SNMP: v1: 127.0.0.1, dsname: imapd_total, oid: .1.3.6.1.2.1.42.2.146.3, value: U
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[948] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[948] SNMP: v1: 127.0.0.1, dsname: imapd, oid: .1.3.6.1.2.1.42.2.146.2, value: U
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[964] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[964] SNMP: v1: 127.0.0.1, dsname: postfix_recv, oid: .1.3.6.1.2.1.42.2.145.2, value: U
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[965] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[965] SNMP: v1: 127.0.0.1, dsname: postfix_recv_total, oid: .1.3.6.1.2.1.42.2.145.3, value: U
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[966] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[966] SNMP: v1: 127.0.0.1, dsname: postfix_reject, oid: .1.3.6.1.2.1.42.2.145.6, value: U
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[969] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[969] SNMP: v1: 127.0.0.1, dsname: postfix_sent_total, oid: .1.3.6.1.2.1.42.2.145.1, value: U
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[968] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[968] SNMP: v1: 127.0.0.1, dsname: postfix_sent, oid: .1.3.6.1.2.1.42.2.145.0, value: U
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[967] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[967] SNMP: v1: 127.0.0.1, dsname: postfix_rejct_total, oid: .1.3.6.1.2.1.42.2.145.7, value: U
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[961] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[961] SNMP: v1: 127.0.0.1, dsname: postfix_virus_total, oid: .1.3.6.1.2.1.42.2.145.11, value: U
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[962] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
09/29/2008 12:04:53 PM - SPINE: Poller[0] Host[34] DS[962] SNMP: v1: 127.0.0.1, dsname: postfix_bounced, oid: .1.3.6.1.2.1.42.2.145.4, value: U
09/29/2008 12:04:55 PM - SPINE: Poller[0] Host[34] DS[963] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
09/29/2008 12:04:55 PM - SPINE: Poller[0] Host[34] DS[963] SNMP: v1: 127.0.0.1, dsname: postfix_bncd_total, oid: .1.3.6.1.2.1.42.2.145.5, value: U
09/29/2008 12:04:55 PM - SPINE: Poller[0] DEBUG: SQL:'INSERT INTO poller_output (local_data_id, rrd_name, time, output) VALUES (953,'pop3d_total','2008-09-29 12:04:50','5'),(952,'pop3d','2008-09-29 12:04:50','1'),(954,'pop3dssl','2008-09-29 12:04:50','0'),(955,'pop3dssl_total','2008-09-29 12:04:50','0'),(956,'policyd','2008-09-29 12:04:50','0'),(957,'policyd_total','2008-09-29 12:04:50','3'),(958,'postfix_spam','2008-09-29 12:04:50','0'),(959,'postfix_spam_total','2008-09-29 12:04:50','0'),(951,'imapd_ssl_total','2008-09-29 12:04:50','5'),(950,'imapd_ssl','2008-09-29 12:04:50','1'),(949,'imapd_total','2008-09-29 12:04:50','U'),(948,'imapd','2008-09-29 12:04:50','U'),(964,'postfix_recv','2008-09-29 12:04:50','U'),(965,'postfix_recv_total','2008-09-29 12:04:50','U'),(966,'postfix_reject','2008-09-29 12:04:50','U'),(969,'postfix_sent_total','2008-09-29 12:04:50','U'),(968,'postfix_sent','2008-09-29 12:04:50','U'),(967,'postfix_rejct_total','2008-09-29 12:04:50','U'),(961,'postfix_virus_total','2008-09-29 12:04:50','U'),(962,'postfix_bounced','2008-09-29 12:04:50','
09/29/2008 12:04:55 PM - SPINE: Poller[0] Host[34] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
09/29/2008 12:04:55 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 0
09/29/2008 12:04:55 PM - SPINE: Poller[0] DEBUG: SQL:'replace into settings (name,value) values ('date',NOW())'
09/29/2008 12:04:55 PM - SPINE: Poller[0] DEBUG: SQL:'insert into poller_time (poller_id, start_time, end_time) values (0, NOW(), NOW())'
09/29/2008 12:04:55 PM - SPINE: Poller[0] DEBUG: Thread Cleanup Complete
09/29/2008 12:04:55 PM - SPINE: Poller[0] DEBUG: PHP Script Server Pipes Closed
09/29/2008 12:04:55 PM - SPINE: Poller[0] DEBUG: Allocated Variable Memory Freed
09/29/2008 12:04:55 PM - SPINE: Poller[0] SPINE: Net-SNMP API Shutdown Completed
09/29/2008 12:04:55 PM - SPINE: Poller[0] DEBUG: MYSQL Free & Close Completed
09/29/2008 12:04:55 PM - SPINE: Poller[0] Time: 5.6568 s, Threads: 10, Hosts: 2
cmd.php works fine so I'll skip posting debug info.

I can snmp(get|walk) my script so I know it works:

Code: Select all

boron bin # snmpwalk -v 1 -c public 127.0.0.1 .1.3.6.1.2.1.42.2.146
SNMPv2-SMI::mib-2.42.2.146.0 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.146.1 = INTEGER: 7
SNMPv2-SMI::mib-2.42.2.146.2 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.146.3 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.146.4 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.146.5 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.146.6 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.146.7 = INTEGER: 7
boron bin # snmpwalk -v 1 -c public 127.0.0.1 .1.3.6.1.2.1.42.2.145
SNMPv2-SMI::mib-2.42.2.145.0 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.145.1 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.145.2 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.145.3 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.145.4 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.145.5 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.145.6 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.145.7 = INTEGER: 4
SNMPv2-SMI::mib-2.42.2.145.8 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.145.9 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.145.10 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.145.11 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.145.12 = INTEGER: 0
SNMPv2-SMI::mib-2.42.2.145.13 = INTEGER: 4
Either way, polling using cmp.php verifies that my scripts are functioning.

This is a core2duo server with oodles of memory so it should function fine with "Maximum SNMP OID's Per SNMP Get Request" set to 10. However, I have set it to 1,2,3,4,5,etc without success.

Package versions (Gentoo):

Code: Select all

net-analyzer/cacti-0.8.7b-r2  
net-analyzer/cacti-spine-0.8.7a
I decided to download the source. I also modified poller.c to change the snmp timeout and recompiled. Still getting SNMP timeouts [500 ms]....

Any ideas?

Thanks
rtorti19
Posts: 48
Joined: Wed May 07, 2008 1:20 pm

Post by rtorti19 »

Are you changing the snmp timeout at the per host level? Also I don't think you should have to increase the snmp timeout especially if you're only querying localhost, as indicated in your debugging output.
What are you using for your host down detection?
When you walk .1.3.6.1.2.1.42.2.146 or .145 how quickly do you get results, is there any lag?
Ateo
Posts: 26
Joined: Fri Sep 26, 2008 1:07 pm
Location: http://reno.nevada.u$

Post by Ateo »

rtorti19 wrote:Are you changing the snmp timeout at the per host level? Also I don't think you should have to increase the snmp timeout especially if you're only querying localhost, as indicated in your debugging output.
What are you using for your host down detection?
When you walk .1.3.6.1.2.1.42.2.146 or .145 how quickly do you get results, is there any lag?
The only SNMP timeout value I've changed is the one found in Settings -> General. So, I'm changing it at the global scope.

For host down, I am using udp ping and when I walk, I get quick response. No lag.

Anything else I can try?
Ateo
Posts: 26
Joined: Fri Sep 26, 2008 1:07 pm
Location: http://reno.nevada.u$

Post by Ateo »

This seems to be a spine bug (otherwise it wouldn't work with cmd.php, right?). Should I make a bug report? I don't like submitting bug reports unless needed. I have found no documentation nor patches addressing this issue.

Currently, cmd.php will work. However, when I deploy cacti (replacing the old and busted graphs), I will need spine to process LOTS of hosts.

Maybe it's the conversion from summer to winter that's messing things up.. =P
User avatar
TheWitness
Developer
Posts: 16997
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Please use the latest SVN spine. It is officially 0.8.7c-beta2. You will find that this issue will go away.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Ateo
Posts: 26
Joined: Fri Sep 26, 2008 1:07 pm
Location: http://reno.nevada.u$

Post by Ateo »

Code: Select all

# make
if gcc -DHAVE_CONFIG_H -I. -I. -I./config     -I/usr/include/net-snmp/ -I/usr/include/net-snmp//.. -I/usr/include/mysql -g -O2 -MT sql.o -MD -MP -MF ".deps/sql.Tpo" -c -o sql.o sql.c; \
        then mv -f ".deps/sql.Tpo" ".deps/sql.Po"; else rm -f ".deps/sql.Tpo"; exit 1; fi
In file included from sql.c:35:
spine.h:379: error: 'RESULTS_BUFFER' undeclared here (not in a function)
make: *** [sql.o] Error 1
I configured with:

Code: Select all

# ./configure --prefix=/usr/local --with-mysql=/usr/include/mysql --with-snmp=/usr/include/net-snmp
El stucko... hehe

Sorry for my ignorance, but I'm not sure which directory in the tree I should be in. Am I in the right directory?
I've tried compiling:
1. ...cacti/spine/0.8.7
2. ...cacti/spine/main

For the record, I can compile, make, install 0.8.7a.

Thanks!
User avatar
TheWitness
Developer
Posts: 16997
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

You also must download configure.ac and then "autoreconf".

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Ateo
Posts: 26
Joined: Fri Sep 26, 2008 1:07 pm
Location: http://reno.nevada.u$

Post by Ateo »

Ok. Thanks for the reply.

I successfully compiled the source in "main".. I had to run "autoreconf" and "libtoolize --force" to get configure to do it's thing..

However, I regret to inform that spine svn does not solve the snmp timeout issue.
Ateo
Posts: 26
Joined: Fri Sep 26, 2008 1:07 pm
Location: http://reno.nevada.u$

Post by Ateo »

Ok. Again. I recompiled svn, this time tagging the error with the poller.c line responsible for the error.

Here's the output:

Code: Select all

boron main # tail -f /var/log/cacti/cacti.log 
10/02/2008 11:04:58 AM - WEBUI: Cacti Log Cleared from Web Management Interface
10/02/2008 11:05:10 AM - SYSTEM STATS: Time:9.9935 Method:cmd.php Processes:10 Threads:N/A Hosts:8 HostsPerProcess:1 DataSources:36 RRDsProcessed:31
10/02/2008 11:06:03 AM - SPINE: Poller[0] Host[34] DS[1108] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:06:03 AM - SPINE: Poller[0] Host[34] DS[1109] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:06:03 AM - SPINE: Poller[0] Host[34] DS[1105] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:06:03 AM - SPINE: Poller[0] Host[34] DS[1103] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:06:03 AM - SPINE: Poller[0] Host[34] DS[1104] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:06:03 AM - SPINE: Poller[0] Host[34] DS[1107] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:06:03 AM - SPINE: Poller[0] Host[34] DS[1106] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:06:03 AM - SPINE: Poller[0] Host[34] DS[1111] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:06:03 AM - SPINE: Poller[0] Host[34] DS[1110] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:06:03 AM - SPINE: Poller[0] Host[34] DS[1097] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:06:03 AM - SPINE: Poller[0] Host[35] DS[1113] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 928 (poller.c)
10/02/2008 11:06:03 AM - SPINE: Poller[0] Host[35] DS[1114] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 928 (poller.c)
10/02/2008 11:06:05 AM - SPINE: Poller[0] Host[34] DS[1098] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 928 (poller.c)
10/02/2008 11:06:06 AM - SYSTEM STATS: Time:6.0405 Method:spine Processes:10 Threads:10 Hosts:8 HostsPerProcess:1 DataSources:32 RRDsProcessed:21
So, spine is setting host->ignore_host to true so I set a custom debug output at each and every host->ignore_host = TRUE; statement but nothing is caught (at lines 394, 424, 428 and 706).

Any ideas?

I'm not a C expert. I probably know just enough to be dangerous. I do know perl, bash, php so I'm not a total retard. =) Actually, i don't know C. I'm just applying my practical knowledge from the other languages...
Ateo
Posts: 26
Joined: Fri Sep 26, 2008 1:07 pm
Location: http://reno.nevada.u$

Post by Ateo »

Ok. I *think* nailed it. Spine is failing at function snmp_get_multi in snmp.c. For some reason, it can't talk to the host. Oddly, cacti runs on the snmp host.

Code: Select all

10/02/2008 11:50:10 AM - SYSTEM STATS: Time:9.9961 Method:cmd.php Processes:10 Threads:N/A Hosts:8 HostsPerProcess:1 DataSources:35 RRDsProcessed:30
10/02/2008 11:51:01 AM - SPINE: Poller[0] CUSTOM DEBUG OUTPUT: response->errstat == SNMP_ERR_NOERROR at Line 535 (snmp.c)
10/02/2008 11:51:01 AM - SPINE: Poller[0] CUSTOM DEBUG OUTPUT: response->errstat == SNMP_ERR_NOERROR at Line 535 (snmp.c)
10/02/2008 11:51:01 AM - SPINE: Poller[0] CUSTOM DEBUG OUTPUT: response->errstat == SNMP_ERR_NOERROR at Line 535 (snmp.c)
10/02/2008 11:51:02 AM - SPINE: Poller[0] CUSTOM DEBUG OUTPUT: response->errstat == SNMP_ERR_NOERROR at Line 535 (snmp.c)
10/02/2008 11:51:03 AM - SPINE: Poller[0] CUSTOM DEBUG OUTPUT: Line 593 (snmp.c)
10/02/2008 11:51:03 AM - SPINE: Poller[0] Host[34] DS[1108] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:51:03 AM - SPINE: Poller[0] Host[34] DS[1109] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:51:03 AM - SPINE: Poller[0] Host[34] DS[1105] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:51:03 AM - SPINE: Poller[0] Host[34] DS[1103] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:51:03 AM - SPINE: Poller[0] Host[34] DS[1104] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:51:03 AM - SPINE: Poller[0] Host[34] DS[1107] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:51:03 AM - SPINE: Poller[0] Host[34] DS[1106] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:51:03 AM - SPINE: Poller[0] Host[34] DS[1111] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:51:03 AM - SPINE: Poller[0] Host[34] DS[1110] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:51:03 AM - SPINE: Poller[0] Host[34] DS[1097] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 788 (poller.c)
10/02/2008 11:51:03 AM - SPINE: Poller[0] CUSTOM DEBUG OUTPUT: Line 364 (snmp.c)
10/02/2008 11:51:03 AM - SPINE: Poller[0] CUSTOM DEBUG OUTPUT: Line 593 (snmp.c)
10/02/2008 11:51:03 AM - SPINE: Poller[0] Host[35] DS[1113] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 928 (poller.c)
10/02/2008 11:51:03 AM - SPINE: Poller[0] Host[35] DS[1114] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 928 (poller.c)
10/02/2008 11:51:05 AM - SPINE: Poller[0] CUSTOM DEBUG OUTPUT: Line 593 (snmp.c)
10/02/2008 11:51:05 AM - SPINE: Poller[0] Host[34] DS[1098] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1' at Line 928 (poller.c)
10/02/2008 11:51:06 AM - SYSTEM STATS: Time:6.0088 Method:spine Processes:10 Threads:10 Hosts:8 HostsPerProcess:1 DataSources:32 RRDsProcessed:21
When this variable is set...

Code: Select all

status = snmp_sess_synch_response(current_host->snmp_session, pdu, &response);
... it apparently returns the object response since &response is passed to the function.. right (or wrong)?

Anyways, I can't find the function snmp_sess_synch_response. I grepped the directory which returned:

Code: Select all

boron main # grep "snmp_sess_synch_response" *
snmp.c:         status = snmp_sess_synch_response(current_host->snmp_session, pdu, &response);
snmp.c:         status = snmp_sess_synch_response(current_host->snmp_session, pdu, &response);
snmp.c: status = snmp_sess_synch_response(current_host->snmp_session, pdu, &response);
Binary file snmp.o matches
Binary file spine matches


I guess I'm really stuck now....

//edit: might be best to split off the last few svn related threads and place them in the unstable forum?

Thanks
Ateo
Posts: 26
Joined: Fri Sep 26, 2008 1:07 pm
Location: http://reno.nevada.u$

Post by Ateo »

Oh. Also, for the record, it is also timing out on remote snmp requests...

Code: Select all

10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[45] DS[1280] WARNING: SNMP timeout detected [500 ms], ignoring host '192.168.4.240'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[45] DS[1281] WARNING: SNMP timeout detected [500 ms], ignoring host '192.168.4.240'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[45] DS[1282] WARNING: SNMP timeout detected [500 ms], ignoring host '192.168.4.240'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[45] DS[1275] WARNING: SNMP timeout detected [500 ms], ignoring host '192.168.4.240'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[45] DS[1276] WARNING: SNMP timeout detected [500 ms], ignoring host '192.168.4.240'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[45] DS[1277] WARNING: SNMP timeout detected [500 ms], ignoring host '192.168.4.240'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[45] DS[1278] WARNING: SNMP timeout detected [500 ms], ignoring host '192.168.4.240'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[45] DS[1279] WARNING: SNMP timeout detected [500 ms], ignoring host '192.168.4.240'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[45] DS[1283] WARNING: SNMP timeout detected [500 ms], ignoring host '192.168.4.240'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[46] DS[1313] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[46] DS[1317] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[46] DS[1318] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[46] DS[1312] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[46] DS[1316] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[46] DS[1321] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[46] DS[1320] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
10/03/2008 10:09:03 AM - SPINE: Poller[0] Host[46] DS[1319] WARNING: SNMP timeout detected [500 ms], ignoring host '127.0.0.1'
User avatar
TheWitness
Developer
Posts: 16997
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

I would like to troubleshoot this problem online. Are you available and at what times? (EDT GMT-5)

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Ateo
Posts: 26
Joined: Fri Sep 26, 2008 1:07 pm
Location: http://reno.nevada.u$

Post by Ateo »

TheWitness wrote:I would like to troubleshoot this problem online. Are you available and at what times? (EDT GMT-5)

TheWitness
Hey. Ok. Sorry I got back so late. I am available via IM however, I'll be gone (unable to troubleshoot) from the 8th - 12th. After that, yes, let's troubleshoot. I'm on the west coast ( -8 )...

Also, I have some more details. This is very interesting...

Spine works excellent until it must make queries against larger OID blocks. And by larger, I mean blocks with 9 OIDs or more.

I currently have my snmp daemon pass queries for 3 OID blocks to the same exact script. The only thing that changes is a -c option i've implement (-c for configuration file).

Spine fails ONLY with the .1.3.6.1.4.100 block. This block has 12 OIDs (0-11). The next largest block is Courier with 8 OIDs, and this polls fine. My firewall poller only has 2 OIDs....

Here's my snmpd conf and other details (so you can get an idea):

Code: Select all

# SNMP Management Information Block Object Identifier for Postfix polling statistics
pass .1.3.6.1.4.100 /usr/local/bin/cola.pl -c /usr/local/etc/cola/postfix.cf

# SNMP Management Information Block Object Identifier for IP Tables polling statistics
pass .1.3.6.1.4.400 /usr/local/bin/cola.pl -c /usr/local/etc/cola/wormhole.cf

# SNMP Management Information Block Object Identifier for Courier-IMAP polling statistics
pass .1.3.6.1.4.200 /usr/local/bin/cola.pl -c /usr/local/etc/cola/courier.cf
Postfix OIDs:

Code: Select all

SNMP Management Information Block for Postfix.
  Object Identifier (OID) Block: .1.3.6.1.4.100

    * .1.3.6.1.4.100.0 is reserved for 'sent_new'
    * .1.3.6.1.4.100.1 is reserved for 'sent_total'
    * .1.3.6.1.4.100.2 is reserved for 'received_new'
    * .1.3.6.1.4.100.3 is reserved for 'received_total'
    * .1.3.6.1.4.100.4 is reserved for 'rejected_new'
    * .1.3.6.1.4.100.5 is reserved for 'rejected_total'
    * .1.3.6.1.4.100.6 is reserved for 'bounced_new'
    * .1.3.6.1.4.100.7 is reserved for 'bounced_total'
    * .1.3.6.1.4.100.8 is reserved for 'spam_new'
    * .1.3.6.1.4.100.9 is reserved for 'spam_total'
    * .1.3.6.1.4.100.10 is reserved for 'virus_new'
    * .1.3.6.1.4.100.11 is reserved for 'virus_total'
Courier OIDs:

Code: Select all

SNMP Management Information Block for Courier-IMAP.
  Object Identifier (OID) Block: .1.3.6.1.4.200

    * .1.3.6.1.4.200.0 is reserved for 'imapdssl_new'
    * .1.3.6.1.4.200.1 is reserved for 'imapdssl_total'
    * .1.3.6.1.4.200.2 is reserved for 'imapd_new'
    * .1.3.6.1.4.200.3 is reserved for 'imapd_total'
    * .1.3.6.1.4.200.4 is reserved for 'pop3dssl_new'
    * .1.3.6.1.4.200.5 is reserved for 'pop3dssl_total'
    * .1.3.6.1.4.200.6 is reserved for 'pop3d_new'
    * .1.3.6.1.4.200.7 is reserved for 'pop3d_total'
So, since my Courier polls return results as expected but Postfix does not, I'm assuming it's because of the OID block size. There's no other explanation.

What is the "Maximum Threads per Process" limit for Spine? I've set it as high as 60. Should I go higher?

//edit. I can troubleshoot today if you're available....
User avatar
TheWitness
Developer
Posts: 16997
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Well, you can set the maximum OID get size on a per host basis. See if that has any impact.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest