Hello.
I've read as much as I can find here regarding CPU stats accuracy, and I have a few questions of sorts. I began this entire episode this morning in an effort to include IOWait on my CPU graphs. My apologies if I've missed something that makes the following banter redundant..
First off, thanks to Mr. Heisenberg, the notion of measuring CPU usage on the cacti box itself is suspect since the polling operation is going to introduce some cpu noise.
Also, measuring a 2 CPU box via snmp generates values that are roughly double, as ssCPURawXXXX doesn't handle multiple CPU's effectively (simple solution is to divide the SNMP cpu values by 2 via a CDEF).
I've also noticed, unfortunately, that my particular build (Centos 3) doesn't seem to support SNMP gathering of iowait and IRQ values. Unfortunate, since IOWait is what I was going for. I suppose I could graph what I have available (which includes CPUIdle) and assume the difference is IOWait/IRQ. I played with that a bit, but it's an ugly hack.
I began playing with SAR results and graphing them. I like what I see, though I noticed that SAR -u 1 is much, much more sensitive to measurement influence than the SNMP values (I would assume because the values being retrieved through SNMP queries are actually recorded at a different time, and not impacted by the polling operation, while my SAR call is in the midst of the polling battle). I adjusted it to SAR -u 5 1, which seems to give the system a little time to settle down before my value is generated (I have single-threaded all of my data requests).
I suppose I could just use the trailing line from SAR itself, though I currently only have that gathering every 10 minutes and am loathe to double my sa1 logs.
Just curious what other people have done and if I'm being an idiot about anything in terms of my exploration and assumptions.
For what it's worth, here's what my SAR-based graph is looking like.
CPU statistics accuracy, part XVII
Moderators: Developers, Moderators
CPU statistics accuracy, part XVII
- Attachments
-
- screen 2006-08-21 13_29_49.jpg (35.17 KiB) Viewed 3881 times
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
I'm using quite the same approach with iostat. I'm taking 16 cycles and average the last 15 one's via some sh/awk magic. Due to the long runtime, I have to submit those commands by crontabs on target machines, putting the results into some magic files and accessing them via http.
Its very cruel, and I do not like it very much.
Reinhard
Its very cruel, and I do not like it very much.
Reinhard
Comparing SNMP counters to sar -u 5 1
I have compared hp-ux snmp agent (default agent from hp-ux)
with sar -u 5 1.
sar -u 5 1 is a 5sec average every poll,
snmp values are a 5min average every poll.
See attachment images for differences.
with sar -u 5 1.
sar -u 5 1 is a 5sec average every poll,
snmp values are a 5min average every poll.
See attachment images for differences.
- Attachments
-
- sar -u 5 1 CPU utilization Host 1
- SAR_1.png (23.82 KiB) Viewed 3576 times
-
- SNMP CPU utilization Host 1
- SNMP_1.png (15.04 KiB) Viewed 3576 times
-
- sar -u 5 1 CPU utilization Host 2
- SAR_2.png (19.08 KiB) Viewed 3576 times
-
- SNMP CPU utilization Host 2
- SNMP_2.png (16.26 KiB) Viewed 3576 times
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Well, you are comparing very different things, even it seems not to be that obvious.
If done well (as with cacti) snmp data gathering should access RawCPU OIDs. They will increase with each pocessor "tick" that's executed. So it will be able to gather even those cycles between 2 polls (5 min difference), as the RawCounter is increased even when cacti is not looking at it.
AFAIK, the sar command looks at the "current" load for the CPU at hence will not see what's going on in between
just my 2 cents
Reinhard
If done well (as with cacti) snmp data gathering should access RawCPU OIDs. They will increase with each pocessor "tick" that's executed. So it will be able to gather even those cycles between 2 polls (5 min difference), as the RawCounter is increased even when cacti is not looking at it.
AFAIK, the sar command looks at the "current" load for the CPU at hence will not see what's going on in between
just my 2 cents
Reinhard
I just posted a link to a way of getting io stats from snmp - have you looked into doing all of that? It looks like a lot of work for us on over 100 AIX machines but might well be worth it in the end.gandalf wrote:I'm using quite the same approach with iostat. I'm taking 16 cycles and average the last 15 one's via some sh/awk magic. Due to the long runtime, I have to submit those commands by crontabs on target machines, putting the results into some magic files and accessing them via http.
Its very cruel, and I do not like it very much.
Reinhard
http://forums.cacti.net/viewtopic.php?t=6072&start=75
You mentioned that on a 2 cpu system the stats were all doubled and you had to divide by two. I've found that this depends heavily on what platform net-snmp is running on and what OS is on that platform. As I recall, on multi-cpu solaris machines the net-snmp cpu stats all added up to 100%, on most (all?) linux boxes it added up to Num_Cpus*100%, and on some other systems it was Num_Cpus*10*100%. I think there was even one other odd variation (a really old AIX or HPUX maybe?) where I had to multiply by 2 for a single processor machine, multiply by 4 for a 4 processor machine, etc. The more the processors, the smaller fraction of 100% the values would add up to. Weird.
Another thing I found was that I needed to monitor different cpu stats on different platforms to get everything to add up to 100%. On some systems, for instance, net-snmp returns separate values for System and Kernel. Some systems, as I recall, some values returned are the sum of 2 or 3 other values you can query (ie, System being a sum of Kernel and IOWait).
Unfortunately, all the templates I did for the different platforms were at a previous employer. I'll see if I can get 'em to give me copies for public release -- or you can just do what I did and run a few snmpwalks on each platform 'n see what adds up to 100% and what values are returned on which platforms.
Brent
Another thing I found was that I needed to monitor different cpu stats on different platforms to get everything to add up to 100%. On some systems, for instance, net-snmp returns separate values for System and Kernel. Some systems, as I recall, some values returned are the sum of 2 or 3 other values you can query (ie, System being a sum of Kernel and IOWait).
Unfortunately, all the templates I did for the different platforms were at a previous employer. I'll see if I can get 'em to give me copies for public release -- or you can just do what I did and run a few snmpwalks on each platform 'n see what adds up to 100% and what values are returned on which platforms.
Brent
Who is online
Users browsing this forum: No registered users and 2 guests