Spine timeouts
Moderators: Developers, Moderators
-
- Posts: 23
- Joined: Fri Jul 20, 2007 10:24 am
Spine timeouts
Hi community,
I am using cacti 0.8.7 with spine.
I am occasionally getting snmp timeouts, while debugging using spine command line I discovered strange behavior:
If I do:
cacti@machine:~$ spine -V=4 3 7
SPINE: Using spine config file [/etc/spine.conf]
SPINE: Version 0.8.7 starting
11/06/2007 11:43:36 AM - SPINE: Poller[0] Host[3] SNMP Result: Host responded to SNMP
11/06/2007 11:43:38 AM - SPINE: Poller[0] Host[3] DS[362] SCRIPT: /usr/bin/php -q /var/www/html/cacti/scripts/netsnmp_memory_usage.php 192.xx.xx.11, public, 3, cacti, c4ct1us3r, 161, 500, output: totalReal:2075116 availReal:1273948 totalSwap:2031608 availSwap:2031608 memBuffer:187048 memCached:544180 usedReal:69940 usedSwap:0
11/06/2007 11:43:38 AM - SPINE: Poller[0] Host[3] DS[31] SNMP: v3: 192.xx.xx.11, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.4, value: 2540237459
11/06/2007 11:43:39 AM - SPINE: Poller[0] Host[3] DS[31] SNMP: v3: 192.xx.xx.11, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.4, value: 853418218
11/06/2007 11:43:39 AM - SPINE: Poller[0] Host[3] DS[30] SNMP: v3: 192.xx.xx.11, dsname: load_5min, oid: .1.3.6.1.4.1.2021.10.1.3.2, value: 1.14
11/06/2007 11:43:39 AM - SPINE: Poller[0] Host[3] DS[29] SNMP: v3: 192.xx.xx.11, dsname: load_15min, oid: .1.3.6.1.4.1.2021.10.1.3.3, value: 1.09
11/06/2007 11:43:39 AM - SPINE: Poller[0] Host[3] DS[28] SNMP: v3: 192.xx.xx.11, dsname: load_1min, oid: .1.3.6.1.4.1.2021.10.1.3.1, value: 1.21
11/06/2007 11:43:39 AM - SPINE: Poller[0] Host[3] DS[27] SNMP: v3: 192.xx.xx.11, dsname: cpu_user, oid: .1.3.6.1.4.1.2021.11.50.0, value: 54822114
11/06/2007 11:43:41 AM - SPINE: Poller[0] Host[3] DS[24] SCRIPT: /usr/bin/perl /var/www/html/cacti/scripts/lvm_netstat_tcp.pl 192.xx.xx.11 3 public 161 500, output: established:305 listen:0 timewait:177 timeclose:0 finwait1:0 finwait2:1 synsent:0 synrecv:0 closewait:0
11/06/2007 11:43:41 AM - SPINE: Poller[0] Host[3] DS[25] SNMP: v3: 192.xx.xx.11, dsname: cpu_nice, oid: .1.3.6.1.4.1.2021.11.51.0, value: 20664
11/06/2007 11:43:41 AM - SPINE: Poller[0] Host[3] DS[23] SNMP: v3: 192.xx.xx.11, dsname: proc, oid: .1.3.6.1.2.1.25.1.6.0, value: 133
11/06/2007 11:43:41 AM - SPINE: Poller[0] Host[3] DS[26] SNMP: v3: 192.xx.xx.11, dsname: cpu_system, oid: .1.3.6.1.4.1.2021.11.52.0, value: 146812722
11/06/2007 11:43:41 AM - SPINE: Poller[0] Host[3] DS[22] SNMP: v3: 192.xx.xx.11, dsname: users, oid: .1.3.6.1.2.1.25.1.5.0, value: 0
11/06/2007 11:43:43 AM - SPINE: Poller[0] Host[4] SNMP Result: Host did not respond to SNMP
11/06/2007 11:43:47 AM - SPINE: Poller[0] Host[5] SNMP Result: Host did not respond to SNMP
11/06/2007 11:43:51 AM - SPINE: Poller[0] Host[6] SNMP Result: Host did not respond to SNMP
11/06/2007 11:43:55 AM - SPINE: Poller[0] Host[7] SNMP Result: Host did not respond to SNMP
11/06/2007 11:43:57 AM - SPINE: Poller[0] Time: 22.2791 s, Threads: 1, Hosts: 6
In example above host 3 was fine, all the rest timed out.
If i do:
spine -V=4 4 5
host 4 is fine but 5 times out but I can poll host 5 using spine -V=4 5 5 command. In production to use spine I have to setup number of Maximum Concurrent Poller Processes to value bigger that amount of hosts so I poll one host per process, then it works fine.
does anyone have ideas what might be causing this problems?
Regards,
Jacek Nykis
I am using cacti 0.8.7 with spine.
I am occasionally getting snmp timeouts, while debugging using spine command line I discovered strange behavior:
If I do:
cacti@machine:~$ spine -V=4 3 7
SPINE: Using spine config file [/etc/spine.conf]
SPINE: Version 0.8.7 starting
11/06/2007 11:43:36 AM - SPINE: Poller[0] Host[3] SNMP Result: Host responded to SNMP
11/06/2007 11:43:38 AM - SPINE: Poller[0] Host[3] DS[362] SCRIPT: /usr/bin/php -q /var/www/html/cacti/scripts/netsnmp_memory_usage.php 192.xx.xx.11, public, 3, cacti, c4ct1us3r, 161, 500, output: totalReal:2075116 availReal:1273948 totalSwap:2031608 availSwap:2031608 memBuffer:187048 memCached:544180 usedReal:69940 usedSwap:0
11/06/2007 11:43:38 AM - SPINE: Poller[0] Host[3] DS[31] SNMP: v3: 192.xx.xx.11, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.4, value: 2540237459
11/06/2007 11:43:39 AM - SPINE: Poller[0] Host[3] DS[31] SNMP: v3: 192.xx.xx.11, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.4, value: 853418218
11/06/2007 11:43:39 AM - SPINE: Poller[0] Host[3] DS[30] SNMP: v3: 192.xx.xx.11, dsname: load_5min, oid: .1.3.6.1.4.1.2021.10.1.3.2, value: 1.14
11/06/2007 11:43:39 AM - SPINE: Poller[0] Host[3] DS[29] SNMP: v3: 192.xx.xx.11, dsname: load_15min, oid: .1.3.6.1.4.1.2021.10.1.3.3, value: 1.09
11/06/2007 11:43:39 AM - SPINE: Poller[0] Host[3] DS[28] SNMP: v3: 192.xx.xx.11, dsname: load_1min, oid: .1.3.6.1.4.1.2021.10.1.3.1, value: 1.21
11/06/2007 11:43:39 AM - SPINE: Poller[0] Host[3] DS[27] SNMP: v3: 192.xx.xx.11, dsname: cpu_user, oid: .1.3.6.1.4.1.2021.11.50.0, value: 54822114
11/06/2007 11:43:41 AM - SPINE: Poller[0] Host[3] DS[24] SCRIPT: /usr/bin/perl /var/www/html/cacti/scripts/lvm_netstat_tcp.pl 192.xx.xx.11 3 public 161 500, output: established:305 listen:0 timewait:177 timeclose:0 finwait1:0 finwait2:1 synsent:0 synrecv:0 closewait:0
11/06/2007 11:43:41 AM - SPINE: Poller[0] Host[3] DS[25] SNMP: v3: 192.xx.xx.11, dsname: cpu_nice, oid: .1.3.6.1.4.1.2021.11.51.0, value: 20664
11/06/2007 11:43:41 AM - SPINE: Poller[0] Host[3] DS[23] SNMP: v3: 192.xx.xx.11, dsname: proc, oid: .1.3.6.1.2.1.25.1.6.0, value: 133
11/06/2007 11:43:41 AM - SPINE: Poller[0] Host[3] DS[26] SNMP: v3: 192.xx.xx.11, dsname: cpu_system, oid: .1.3.6.1.4.1.2021.11.52.0, value: 146812722
11/06/2007 11:43:41 AM - SPINE: Poller[0] Host[3] DS[22] SNMP: v3: 192.xx.xx.11, dsname: users, oid: .1.3.6.1.2.1.25.1.5.0, value: 0
11/06/2007 11:43:43 AM - SPINE: Poller[0] Host[4] SNMP Result: Host did not respond to SNMP
11/06/2007 11:43:47 AM - SPINE: Poller[0] Host[5] SNMP Result: Host did not respond to SNMP
11/06/2007 11:43:51 AM - SPINE: Poller[0] Host[6] SNMP Result: Host did not respond to SNMP
11/06/2007 11:43:55 AM - SPINE: Poller[0] Host[7] SNMP Result: Host did not respond to SNMP
11/06/2007 11:43:57 AM - SPINE: Poller[0] Time: 22.2791 s, Threads: 1, Hosts: 6
In example above host 3 was fine, all the rest timed out.
If i do:
spine -V=4 4 5
host 4 is fine but 5 times out but I can poll host 5 using spine -V=4 5 5 command. In production to use spine I have to setup number of Maximum Concurrent Poller Processes to value bigger that amount of hosts so I poll one host per process, then it works fine.
does anyone have ideas what might be causing this problems?
Regards,
Jacek Nykis
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
Please use the 0.8.7a that has been posted under the announcements forum. Let me know how it goes.
Larry
Larry
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
-
- Posts: 23
- Joined: Fri Jul 20, 2007 10:24 am
TheWitness wrote:Please use the 0.8.7a that has been posted under the announcements forum. Let me know how it goes.
Larry
Thank you for your answer.
Unfortunately there is no difference after. I am using few custom scripts which are using snmp and it takes around 1-2 seconds until they complete. Do you think this makes the difference?
Regards
Jacek Nykis
-
- Posts: 23
- Joined: Fri Jul 20, 2007 10:24 am
Hi again,
I removed custom scripts, there is no difference again second host times out:
cacti@machine:~$ spine -V=4 4 5
SPINE: Using spine config file [/etc/spine.conf]
SPINE: Version 0.8.7a starting
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] SNMP Result: Host responded to SNMP
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[41] SNMP: v3: 192.xx.xx.12, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.4, value: 418765000
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[35] SNMP: v3: 192.xx.xx.12, dsname: cpu_nice, oid: .1.3.6.1.4.1.2021.11.51.0, value: 20158
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[33] SNMP: v3: 192.xx.xx.12, dsname: proc, oid: .1.3.6.1.2.1.25.1.6.0, value: 145
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[32] SNMP: v3: 192.xx.xx.12, dsname: users, oid: .1.3.6.1.2.1.25.1.5.0, value: 0
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[36] SNMP: v3: 192.xx.xx.12, dsname: cpu_system, oid: .1.3.6.1.4.1.2021.11.52.0, value: 114053186
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[37] SNMP: v3: 192.xx.xx.12, dsname: cpu_user, oid: .1.3.6.1.4.1.2021.11.50.0, value: 46976849
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[38] SNMP: v3: 192.xx.xx.12, dsname: load_1min, oid: .1.3.6.1.4.1.2021.10.1.3.1, value: 1.30
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[39] SNMP: v3: 192.xx.xx.12, dsname: load_15min, oid: .1.3.6.1.4.1.2021.10.1.3.3, value: 0.79
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[40] SNMP: v3: 192.xx.xx.12, dsname: load_5min, oid: .1.3.6.1.4.1.2021.10.1.3.2, value: 0.90
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[41] SNMP: v3: 192.xx.xx.12, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.4, value: 1093234323
11/06/2007 05:26:51 PM - SPINE: Poller[0] Host[5] SNMP Result: Host did not respond to SNMP
11/06/2007 05:26:53 PM - SPINE: Poller[0] Time: 4.9327 s, Threads: 1, Hosts: 3
Please note that I polled two hosts and in output cacti says Host: 3
Thank you for any advice
Regards
Jacek Nykis
I removed custom scripts, there is no difference again second host times out:
cacti@machine:~$ spine -V=4 4 5
SPINE: Using spine config file [/etc/spine.conf]
SPINE: Version 0.8.7a starting
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] SNMP Result: Host responded to SNMP
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[41] SNMP: v3: 192.xx.xx.12, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.4, value: 418765000
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[35] SNMP: v3: 192.xx.xx.12, dsname: cpu_nice, oid: .1.3.6.1.4.1.2021.11.51.0, value: 20158
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[33] SNMP: v3: 192.xx.xx.12, dsname: proc, oid: .1.3.6.1.2.1.25.1.6.0, value: 145
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[32] SNMP: v3: 192.xx.xx.12, dsname: users, oid: .1.3.6.1.2.1.25.1.5.0, value: 0
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[36] SNMP: v3: 192.xx.xx.12, dsname: cpu_system, oid: .1.3.6.1.4.1.2021.11.52.0, value: 114053186
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[37] SNMP: v3: 192.xx.xx.12, dsname: cpu_user, oid: .1.3.6.1.4.1.2021.11.50.0, value: 46976849
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[38] SNMP: v3: 192.xx.xx.12, dsname: load_1min, oid: .1.3.6.1.4.1.2021.10.1.3.1, value: 1.30
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[39] SNMP: v3: 192.xx.xx.12, dsname: load_15min, oid: .1.3.6.1.4.1.2021.10.1.3.3, value: 0.79
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[40] SNMP: v3: 192.xx.xx.12, dsname: load_5min, oid: .1.3.6.1.4.1.2021.10.1.3.2, value: 0.90
11/06/2007 05:26:49 PM - SPINE: Poller[0] Host[4] DS[41] SNMP: v3: 192.xx.xx.12, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.4, value: 1093234323
11/06/2007 05:26:51 PM - SPINE: Poller[0] Host[5] SNMP Result: Host did not respond to SNMP
11/06/2007 05:26:53 PM - SPINE: Poller[0] Time: 4.9327 s, Threads: 1, Hosts: 3
Please note that I polled two hosts and in output cacti says Host: 3
Thank you for any advice
Regards
Jacek Nykis
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
Could just be a data source that is not any good. You might consider reducing your max_oid count for the host.
TheWitness
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
-
- Posts: 23
- Joined: Fri Jul 20, 2007 10:24 am
Thank you for your answer.
Unfortunately if I change maximum number of oids it does not change anything.
I also excluded bad data source for the second host. If I poll it on its own all is fine. Problem occurs when I poll two host in one go, in this case first one is fine but second one times out.
ie.
"spine -V=4 4 5"
causes host 5 to time out but
"spine -V=4 5 6"
works fine for host 5 but 6 times out.
Do you have any other ideas?
Just wanted to mention that as soon as I change poller to cmd.php all graph are fine.
Regards
Jacek Nykis
Unfortunately if I change maximum number of oids it does not change anything.
I also excluded bad data source for the second host. If I poll it on its own all is fine. Problem occurs when I poll two host in one go, in this case first one is fine but second one times out.
ie.
"spine -V=4 4 5"
causes host 5 to time out but
"spine -V=4 5 6"
works fine for host 5 but 6 times out.
Do you have any other ideas?
Just wanted to mention that as soon as I change poller to cmd.php all graph are fine.
Regards
Jacek Nykis
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
./spine --help
Please post response. If not 0.8.7a, please upgrade.
TheWitness
Please post response. If not 0.8.7a, please upgrade.
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
-
- Posts: 23
- Joined: Fri Jul 20, 2007 10:24 am
Hi,
It is 0.8.7a:
cacti@watcher:~$ spine --help
SPINE 0.8.7a Copyright 2002-2007 by The Cacti Group
....................
I already did upgrade yesterday.
If you have any advices where to look for (like which strace output parts are important) I could perform more detailed investigation.
Regards
Jacek Nykis
It is 0.8.7a:
cacti@watcher:~$ spine --help
SPINE 0.8.7a Copyright 2002-2007 by The Cacti Group
....................
I already did upgrade yesterday.
If you have any advices where to look for (like which strace output parts are important) I could perform more detailed investigation.
Regards
Jacek Nykis
-
- Posts: 23
- Joined: Fri Jul 20, 2007 10:24 am
Hi,
Thank you very much for help
I compiled SVN version, but is did not fix my problem/
I used source from:
spine/branches/main/ and also tried one from spine/branches/0.8.7
Strangely there is no entry in ChangeLog except one for 0.8.7a.
I will monitor SVN state and try to compile new versions when they appear and let you know if any of them fixes the problem.
Thank you very much for help
I compiled SVN version, but is did not fix my problem/
I used source from:
spine/branches/main/ and also tried one from spine/branches/0.8.7
Strangely there is no entry in ChangeLog except one for 0.8.7a.
I will monitor SVN state and try to compile new versions when they appear and let you know if any of them fixes the problem.
-
- Posts: 23
- Joined: Fri Jul 20, 2007 10:24 am
Hi,
Unfortunately SVN code still does not work for me. I performed some debugging.
It is possible that my problem is caused by wrong snmp_host_init function result.
in poller.c file close to line 483 I can see:
poll_result = snmp_get(host, reindex->arg1);
This function returns value (like "960328001") for the first host checked, and for the rest of them I get "U" in which case I have host->ignore_host flag set.
I tried to check what are host->snmp_session pointer values and here is how they are set:
135095600 for first host which works fine
135150640 for the second which fails
135150640 for the third which fails as well
I found out that this value is initiated in line 351 by snmp_host_init function and possibly this function works bad.
Can you have a look into this problem or give my some advice how to try to debug it further?
Regards
Jacek
Unfortunately SVN code still does not work for me. I performed some debugging.
It is possible that my problem is caused by wrong snmp_host_init function result.
in poller.c file close to line 483 I can see:
poll_result = snmp_get(host, reindex->arg1);
This function returns value (like "960328001") for the first host checked, and for the rest of them I get "U" in which case I have host->ignore_host flag set.
I tried to check what are host->snmp_session pointer values and here is how they are set:
135095600 for first host which works fine
135150640 for the second which fails
135150640 for the third which fails as well
I found out that this value is initiated in line 351 by snmp_host_init function and possibly this function works bad.
Can you have a look into this problem or give my some advice how to try to debug it further?
Regards
Jacek
Who is online
Users browsing this forum: No registered users and 0 guests