Strange behaviour of poller when switching to snmpv3

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
calvinII
Posts: 4
Joined: Sat Apr 16, 2011 3:42 pm
Location: Karlsruhe, Germany

Strange behaviour of poller when switching to snmpv3

Post by calvinII »

Hi,
I experience strange timeouts after switching some hosts from snmp v1/v2 to snmp v3. A deeper search and debugging session deliver following results:

- After enabling some of these Hosts (4-5 Hosts) The pollertime increased from some few (sub)seconds to minutes
- In the log I see this for soem of these Hosts
...
04/16/2011 10:35:26 PM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'hostname.invalid', and OID:'.1.3.6.1.4.1.2021.10.1.3.1'
this for every OID (at the moment for Debugging they only pull 3 OIDs...)
...
- if there are too many of these "timeouts" the poller reaches his maxtime before getting all values
- different poller settings tested: 1 Minute Cron / 20 Seconds Spine | 5 Minute Cron / 1 Minute poller | 1 Minute Cron | 1 Minute poller
- no problems with SNMPv2
- I fiddled with the snmp-timeouts and other snmp-settings -> no change in result
- this doesn't happen with every snmpv3 enabled host, only some of them have this phenomen.
- the hosts are virtualized using Xen
- the hosts are good reachable in th network
- the measured hosts are installed identical "script based"
- snmpwalk snmpget snmpbulk* on commandline works perfect
- also does cacti get "host-infos" via SNMP from host on device page (so there is no fault in snmp-parameter setting)
- if I disable all other SNMPv3 hosts and only enable one of the hosts with these timeouts cacti sucessfully receives all queried data:
- the result is independent of the used poller (tested with cmd.php and spine)
- the result is independent of the snmpv3 setting (authNopriv / authPriv) either MD5/DES or SHA/AES
- Setup is Debian squeeze cacti 0.8.7g with "Ping Response Fix" and "One Minute Polling Interval Fix"
- tested with poller: cmd.php / spine (0.8.7f Debian squeeze) / spine (0.8.7g-1+b1 Debian wheezy compiled against squeeze)

on the commandline I checked directly with spine (logs attached):
/usr/sbin/spine --verbosity=5 -H "93, 90, 91, 92" &> spine_three.log
# Host-id 93 timed out, so a single shot:
/usr/sbin/spine --verbosity=5 -H "93" &> spine_one.log
#

So, can someone give me some hints to get the setup running.
My goal for this Setup:
- Poller Interval down to 20 seconds for critical values (using Spine)
- Use completely SNMPv3 AuthPriv with normal hosts and devices supporting this...

Cheers,
Ulli
Attachments
spine_three.log.txt
(8.85 KiB) Downloaded 132 times
spine_one.log.txt
One Host timed out
(5.12 KiB) Downloaded 145 times
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Strange behaviour of poller when switching to snmpv3

Post by gandalf »

This is quite strange.
The timeout can easily be explained: In case, cacti SNMP requests fail for each OID, it will last ages to complete polling (that's why we do the "downed host detection" check).
But in your case, the UDP test is ok, but SNMP V3 then fails on host 93. I would ask, why you do perform a UDP check when using SNMP later. Well, this setup is valid and it should work, but it makes not ultimate sense.

And then, from your findings, it depends if the failing host is polled alone (works, then) or along with other hosts (fails, then). Never saw this before. And what makes it more strange is, that cmd.php and spine behave similar.
For a spine-only test, I'd like you to download latest spine from 087 trunk and compile it on your machine.
And please provide the SNMP V3 parameter set (I don't need the exact credentials) used for that host
R.
calvinII
Posts: 4
Joined: Sat Apr 16, 2011 3:42 pm
Location: Karlsruhe, Germany

Re: Strange behaviour of poller when switching to snmpv3

Post by calvinII »

Hi gandalf,

I changed the "host alive test" on this machine to "ping or snmp", only for testing purpose to see if only the snmp-ping fails ;-/ And yes, the snmp-ping fails equally to the query for data ....
One possibly interesting note: All SNMPv3 hosts are cloned from the same Image, and on some of these hosts I have problems, but not on all. So, if the installation of net-snmpd creates somewhere a unique ID at installation-time this ID is distributed over all monitored hosts...

Here are my snmpv3-settings (changed them many times :-| ):
Downed Detection: snmp (tried ping or snmp in testing scenario)
Ping Timeout Value: 5000 (changed this value between 1000 and 5000)
Ping retry count: 1
SNMP-Version: Version: 3
Username (v3): admin
Password: *****
Auth-Protocol: SHA
Privacy-Passphrase: *****
Privacy-Protocol: DES
SNMP context: <empty>
SNMP Port: 161
SNMP-Timeout: 5000 (this was my initial value, changed this between 1000 and 5000)
Maximum OID's per Get Request: 40 (Changed between 1 and 50)

I'll try Spine from trunk in the evening...

Cheers,
Ulli
calvinII
Posts: 4
Joined: Sat Apr 16, 2011 3:42 pm
Location: Karlsruhe, Germany

Re: Strange behaviour of poller when switching to snmpv3

Post by calvinII »

Hi,
I think the problem lies exactly in some sort of "unique ID" which is generated at installation time of the snmpd.
After reinstalling snmpd on all "buggy hosts", polling works perfect now:

on every Host:
aptitude purge snmpd
aptitude install snmpd
/etc/init.d/snmpd stop
net-snmp-config --create-snmpv3-user -ro -A $a_pass -a SHA -X $p_pass -x DES admin
echo -ne "$SNMPD_CONTENT" > /etc/snmp/snmpd.conf
/etc/init.d/snmpd start

So my next question is:
When is this ID generated / where is it stored / how can I change it "manually" for my "cloning process"

Cheers,
Ulli
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Strange behaviour of poller when switching to snmpv3

Post by gandalf »

Thanks for your last posting.
This was new to me. Would you mind starting a discussion at net-snmp-user mailing list concerning this topic? I'm quite sure, that system cloning is quite common these days ...
R.
calvinII
Posts: 4
Joined: Sat Apr 16, 2011 3:42 pm
Location: Karlsruhe, Germany

Re: Strange behaviour of poller when switching to snmpv3

Post by calvinII »

Hi Gandalf,

Also my last questions are solved for me:
The ID which is generated is the "engine ID" This ID must be unique for every Agent and is used for communication with the Agent.
If you regenerate this ID you must also recreate all snmpv3 users.
Howto regenerate engine ID and users:

stop snmpd

edit /var/lib/snmp/snmpd.conf (in some other distributions you find this file in usr/share/snmpd or /var/ucd-snmp/)
remove all Lines beginning with usmUser
remove the Line beginning with oldEngineID
insert Lines for every User with its credentials:

# replace SHA and DES with your used security model
createUser $username SHA $auth_pass DES $priv_pass

start snmpd again.

I put this file in my image and after startup of a clone everything necessary is generated uniquely at snmpd startup :D

Cheers,
Ulli

Informations gatherd from
* rfc 3414
* http://docstore.mik.ua/orelly/networkin ... ppf_02.htm (mainly from chapter F.2.2)
Post Reply

Who is online

Users browsing this forum: No registered users and 8 guests