SNMP Timeout

Post support questions that relate to the Windows 2003/2000/XP operating systems.

Moderators: Developers, Moderators

Post Reply
mikebruns
Posts: 4
Joined: Mon Jul 04, 2005 8:17 am
Location: London

SNMP Timeout

Post by mikebruns »

Win2K3
IIS 6
Cacti 0.8.6f
PHP 5.0.4
mySQL 4.0.24
ActivePerl 5.8.0
RRDTool 1.2.9

Recently upgraded from Win2K server to Win2K3 (on new hardware) and imported across Cacti settings/dbase. All graphs working OK. Running poller.php from cmd line gives no errors (74 hosts, 2103 DS's & 1054 RRDs).

Problem arises when querying existing devices/adding new devices.

When trying either reloading the data query or using the verbose method to check ifNames (etc), the resultant web page is blank (ie rather than displaying the data query debug info, the status bar of the web page displays 'Done' and the page displayed is blank). The strange thing is that this works fine with devices that have a low RTT (ie under 100 ms). So I can successfully reload the query on a device in London or New York, but not in Hong Kong for example.

I have increased the SNMP timoute value for the device, and also the max-execution-time in php.ini, but neither seemed to make a difference.

Are there any other settings that can influence the snmp functions?
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

1) What type of ping averages are you experiencing to these remote sites which are timing out?
2) What are the timeout values you've set in Cacti and php?
3) Does using a tool like Getif or snmpget work?
mikebruns
Posts: 4
Joined: Mon Jul 04, 2005 8:17 am
Location: London

Post by mikebruns »

1) Anything over 250ms
2) 10000 ms on Cacti SNMP timeout, 300s on php.ini max-execution-time
3) yes (getif works OK)

There seems to be a correlation between latency & the no. of interfaces the router returns, and whether or not it succeeds in returning snmp data.

ie I can successfully enter into Cacti a new router in Hong Kong that has almost no interfaces, and receive/view all snmp data.

However, a device in Hong Kong that has many interfaces (say a switch or voice router with many dial-peers) fails. This can be consistently replicated - high latency/low # of i/f is OK; high latency/high # of i/f is not.
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

BSOD,

This is the third time at least I have seen this exact behavior. I performed online trouble shooting with one user but could not find the source. The last time I talked with one of the users, he was migrating to Apache to solve. If you figure this one out, we have to post a sticky.

Larry
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

Did you also increase the max_input_time in php.ini ?

Heh, 10000ms seems a little excessive too.

When you turn on debug in cacti and view the log, does it say any the host is unreachable?
mikebruns
Posts: 4
Joined: Mon Jul 04, 2005 8:17 am
Location: London

Post by mikebruns »

I think this is definately something to do with the # of snmp interfaces a device runs. I have another device running with RTT 10ms, which also displays this issue - verbose/reload query returns blank web page.

Interestingly the returned items/rows varies each time you load the query. In the above example, I have recd 3546 items in 591 rows, and then next time recd 3898 items in 591 rows. (this router runs voice services with Cisco SRST hence the large no. of i/f's).

I have now updated php.ini max-execution-time and max-input-time to 300 seconds.
I agree that 10000ms is a little excessive, but hey!
Also changed Settings | poller | Script & Script server timeout value to 300s

Debug shows zero errors. We are getting graphs from this same host with no problems.
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

I've read reports where excessive SNMP queries on a Win32 host can cause snmp to crap out. Maybe something simular is also happening on the Cisco devices? Can you turn on any debug for snmp on a Cisco router, when doing the verbose queries and see if anything useful / interesting turns up?

Can you successfully walk the interface OID with a something like snmpwalk (net-snmp)? That would help to eliminate of its a php/cacti or device issue.
mikebruns
Posts: 4
Joined: Mon Jul 04, 2005 8:17 am
Location: London

Post by mikebruns »

Well, I have to say I'm not sure what's happened, but this is now working OK! I did re-boot server, but cant believe this caused the change...mind you, this is MS!!

I'm wondering whether the reboot is conincidental, and the real culprit is contention on the leased lines we have going to these remote sites.

Anyway, thanks for your suggestions and help BSOD2600.

I may start a new post, re: your Win32 process monitoring templates which I've got loaded, but not graphing!
User avatar
bulek
Cacti Pro User
Posts: 854
Joined: Mon May 20, 2002 2:07 am
Location: Poland
Contact:

Post by bulek »

Well... with 250ms doing 4000 snmp requests will take at least 1000 sec... right? This is another good reason to let Cacti (read implement :) ) to use snmpbulkwalk command which in case of high latency can download any snmp table 10x faster than snmpwalk.

- Piotr
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

I am considering that as well as sending a large PDU to the host. Bulek, to your knowledge, this all requires v2c right?

Larry
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
User avatar
bulek
Cacti Pro User
Posts: 854
Joined: Mon May 20, 2002 2:07 am
Location: Poland
Contact:

Post by bulek »

Unfortunately yes... and it requires also snmp service to handle "bulk" requests. Sometimes you can find 2c implementation without "bulk" commands. It would be nice anyway to have it as an option. It's quite easy to implement since the output is (afaik) compatible with snmpwalk.

- Piotr
kdsm
Posts: 41
Joined: Thu Mar 17, 2005 5:00 pm

Post by kdsm »

I was having the same problem. I would build a device (Cisco Router) and would not get anything. But others worked fine.
But I made the changes you outlined in your post for Cacti SNMP and max-execution-time and that may be the fix.

thanks..
User avatar
lard
Cacti User
Posts: 165
Joined: Wed Jul 20, 2005 10:48 am
Location: UK - Cambridge

Post by lard »

Hi,

Experiencing the same problem...changed the variables and have scheduled a reboot of the Cacti server tonight so fingers crossed!

Thanks for the pointers,

Lard
---- lard007skype ----
User avatar
lard
Cacti User
Posts: 165
Joined: Wed Jul 20, 2005 10:48 am
Location: UK - Cambridge

Post by lard »

Hi,

Didn't solve the issue but it all appears to be related to SNMP timeouts form the devices - doing an snmpwalk takes an age from a dodgy device yet another identical switch that responsds alot quicker is fine...

I have changed all the timeout values I can think of and am thinking that their is something that is still using a lower value...does rrdtool have a timeout setting?

Thanks,

Lard
---- lard007skype ----
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests