SNMP Timeout
Moderators: Developers, Moderators
SNMP Timeout
Win2K3
IIS 6
Cacti 0.8.6f
PHP 5.0.4
mySQL 4.0.24
ActivePerl 5.8.0
RRDTool 1.2.9
Recently upgraded from Win2K server to Win2K3 (on new hardware) and imported across Cacti settings/dbase. All graphs working OK. Running poller.php from cmd line gives no errors (74 hosts, 2103 DS's & 1054 RRDs).
Problem arises when querying existing devices/adding new devices.
When trying either reloading the data query or using the verbose method to check ifNames (etc), the resultant web page is blank (ie rather than displaying the data query debug info, the status bar of the web page displays 'Done' and the page displayed is blank). The strange thing is that this works fine with devices that have a low RTT (ie under 100 ms). So I can successfully reload the query on a device in London or New York, but not in Hong Kong for example.
I have increased the SNMP timoute value for the device, and also the max-execution-time in php.ini, but neither seemed to make a difference.
Are there any other settings that can influence the snmp functions?
IIS 6
Cacti 0.8.6f
PHP 5.0.4
mySQL 4.0.24
ActivePerl 5.8.0
RRDTool 1.2.9
Recently upgraded from Win2K server to Win2K3 (on new hardware) and imported across Cacti settings/dbase. All graphs working OK. Running poller.php from cmd line gives no errors (74 hosts, 2103 DS's & 1054 RRDs).
Problem arises when querying existing devices/adding new devices.
When trying either reloading the data query or using the verbose method to check ifNames (etc), the resultant web page is blank (ie rather than displaying the data query debug info, the status bar of the web page displays 'Done' and the page displayed is blank). The strange thing is that this works fine with devices that have a low RTT (ie under 100 ms). So I can successfully reload the query on a device in London or New York, but not in Hong Kong for example.
I have increased the SNMP timoute value for the device, and also the max-execution-time in php.ini, but neither seemed to make a difference.
Are there any other settings that can influence the snmp functions?
1) What type of ping averages are you experiencing to these remote sites which are timing out?
2) What are the timeout values you've set in Cacti and php?
3) Does using a tool like Getif or snmpget work?
2) What are the timeout values you've set in Cacti and php?
3) Does using a tool like Getif or snmpget work?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
1) Anything over 250ms
2) 10000 ms on Cacti SNMP timeout, 300s on php.ini max-execution-time
3) yes (getif works OK)
There seems to be a correlation between latency & the no. of interfaces the router returns, and whether or not it succeeds in returning snmp data.
ie I can successfully enter into Cacti a new router in Hong Kong that has almost no interfaces, and receive/view all snmp data.
However, a device in Hong Kong that has many interfaces (say a switch or voice router with many dial-peers) fails. This can be consistently replicated - high latency/low # of i/f is OK; high latency/high # of i/f is not.
2) 10000 ms on Cacti SNMP timeout, 300s on php.ini max-execution-time
3) yes (getif works OK)
There seems to be a correlation between latency & the no. of interfaces the router returns, and whether or not it succeeds in returning snmp data.
ie I can successfully enter into Cacti a new router in Hong Kong that has almost no interfaces, and receive/view all snmp data.
However, a device in Hong Kong that has many interfaces (say a switch or voice router with many dial-peers) fails. This can be consistently replicated - high latency/low # of i/f is OK; high latency/high # of i/f is not.
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
BSOD,
This is the third time at least I have seen this exact behavior. I performed online trouble shooting with one user but could not find the source. The last time I talked with one of the users, he was migrating to Apache to solve. If you figure this one out, we have to post a sticky.
Larry
This is the third time at least I have seen this exact behavior. I performed online trouble shooting with one user but could not find the source. The last time I talked with one of the users, he was migrating to Apache to solve. If you figure this one out, we have to post a sticky.
Larry
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Did you also increase the max_input_time in php.ini ?
Heh, 10000ms seems a little excessive too.
When you turn on debug in cacti and view the log, does it say any the host is unreachable?
Heh, 10000ms seems a little excessive too.
When you turn on debug in cacti and view the log, does it say any the host is unreachable?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
I think this is definately something to do with the # of snmp interfaces a device runs. I have another device running with RTT 10ms, which also displays this issue - verbose/reload query returns blank web page.
Interestingly the returned items/rows varies each time you load the query. In the above example, I have recd 3546 items in 591 rows, and then next time recd 3898 items in 591 rows. (this router runs voice services with Cisco SRST hence the large no. of i/f's).
I have now updated php.ini max-execution-time and max-input-time to 300 seconds.
I agree that 10000ms is a little excessive, but hey!
Also changed Settings | poller | Script & Script server timeout value to 300s
Debug shows zero errors. We are getting graphs from this same host with no problems.
Interestingly the returned items/rows varies each time you load the query. In the above example, I have recd 3546 items in 591 rows, and then next time recd 3898 items in 591 rows. (this router runs voice services with Cisco SRST hence the large no. of i/f's).
I have now updated php.ini max-execution-time and max-input-time to 300 seconds.
I agree that 10000ms is a little excessive, but hey!
Also changed Settings | poller | Script & Script server timeout value to 300s
Debug shows zero errors. We are getting graphs from this same host with no problems.
I've read reports where excessive SNMP queries on a Win32 host can cause snmp to crap out. Maybe something simular is also happening on the Cisco devices? Can you turn on any debug for snmp on a Cisco router, when doing the verbose queries and see if anything useful / interesting turns up?
Can you successfully walk the interface OID with a something like snmpwalk (net-snmp)? That would help to eliminate of its a php/cacti or device issue.
Can you successfully walk the interface OID with a something like snmpwalk (net-snmp)? That would help to eliminate of its a php/cacti or device issue.
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Well, I have to say I'm not sure what's happened, but this is now working OK! I did re-boot server, but cant believe this caused the change...mind you, this is MS!!
I'm wondering whether the reboot is conincidental, and the real culprit is contention on the leased lines we have going to these remote sites.
Anyway, thanks for your suggestions and help BSOD2600.
I may start a new post, re: your Win32 process monitoring templates which I've got loaded, but not graphing!
I'm wondering whether the reboot is conincidental, and the real culprit is contention on the leased lines we have going to these remote sites.
Anyway, thanks for your suggestions and help BSOD2600.
I may start a new post, re: your Win32 process monitoring templates which I've got loaded, but not graphing!
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
I am considering that as well as sending a large PDU to the host. Bulek, to your knowledge, this all requires v2c right?
Larry
Larry
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Hi,
Didn't solve the issue but it all appears to be related to SNMP timeouts form the devices - doing an snmpwalk takes an age from a dodgy device yet another identical switch that responsds alot quicker is fine...
I have changed all the timeout values I can think of and am thinking that their is something that is still using a lower value...does rrdtool have a timeout setting?
Thanks,
Lard
Didn't solve the issue but it all appears to be related to SNMP timeouts form the devices - doing an snmpwalk takes an age from a dodgy device yet another identical switch that responsds alot quicker is fine...
I have changed all the timeout values I can think of and am thinking that their is something that is still using a lower value...does rrdtool have a timeout setting?
Thanks,
Lard
---- lard007skype ----
Who is online
Users browsing this forum: No registered users and 1 guest