CPU statistics accuracy, part XVII

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
zman818
Posts: 10
Joined: Sun Aug 13, 2006 5:30 pm

CPU statistics accuracy, part XVII

Post by zman818 »

Hello.

I've read as much as I can find here regarding CPU stats accuracy, and I have a few questions of sorts. I began this entire episode this morning in an effort to include IOWait on my CPU graphs. My apologies if I've missed something that makes the following banter redundant..


First off, thanks to Mr. Heisenberg, the notion of measuring CPU usage on the cacti box itself is suspect since the polling operation is going to introduce some cpu noise.

Also, measuring a 2 CPU box via snmp generates values that are roughly double, as ssCPURawXXXX doesn't handle multiple CPU's effectively (simple solution is to divide the SNMP cpu values by 2 via a CDEF).

I've also noticed, unfortunately, that my particular build (Centos 3) doesn't seem to support SNMP gathering of iowait and IRQ values. Unfortunate, since IOWait is what I was going for. I suppose I could graph what I have available (which includes CPUIdle) and assume the difference is IOWait/IRQ. I played with that a bit, but it's an ugly hack.

I began playing with SAR results and graphing them. I like what I see, though I noticed that SAR -u 1 is much, much more sensitive to measurement influence than the SNMP values (I would assume because the values being retrieved through SNMP queries are actually recorded at a different time, and not impacted by the polling operation, while my SAR call is in the midst of the polling battle). I adjusted it to SAR -u 5 1, which seems to give the system a little time to settle down before my value is generated (I have single-threaded all of my data requests).

I suppose I could just use the trailing line from SAR itself, though I currently only have that gathering every 10 minutes and am loathe to double my sa1 logs.

Just curious what other people have done and if I'm being an idiot about anything in terms of my exploration and assumptions.

For what it's worth, here's what my SAR-based graph is looking like.
Attachments
screen 2006-08-21 13_29_49.jpg
screen 2006-08-21 13_29_49.jpg (35.17 KiB) Viewed 3877 times
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

I'm using quite the same approach with iostat. I'm taking 16 cycles and average the last 15 one's via some sh/awk magic. Due to the long runtime, I have to submit those commands by crontabs on target machines, putting the results into some magic files and accessing them via http.
Its very cruel, and I do not like it very much.
Reinhard
rr
Posts: 19
Joined: Fri Oct 06, 2006 9:17 am

Comparing SNMP counters to sar -u 5 1

Post by rr »

I have compared hp-ux snmp agent (default agent from hp-ux)
with sar -u 5 1.

sar -u 5 1 is a 5sec average every poll,
snmp values are a 5min average every poll.

See attachment images for differences.
Attachments
sar -u 5 1 CPU utilization Host 1
sar -u 5 1 CPU utilization Host 1
SAR_1.png (23.82 KiB) Viewed 3572 times
SNMP CPU utilization Host 1
SNMP CPU utilization Host 1
SNMP_1.png (15.04 KiB) Viewed 3572 times
sar -u 5 1 CPU utilization Host 2
sar -u 5 1 CPU utilization Host 2
SAR_2.png (19.08 KiB) Viewed 3572 times
SNMP CPU utilization Host 2
SNMP CPU utilization Host 2
SNMP_2.png (16.26 KiB) Viewed 3572 times
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

Well, you are comparing very different things, even it seems not to be that obvious.

If done well (as with cacti) snmp data gathering should access RawCPU OIDs. They will increase with each pocessor "tick" that's executed. So it will be able to gather even those cycles between 2 polls (5 min difference), as the RawCounter is increased even when cacti is not looking at it.
AFAIK, the sar command looks at the "current" load for the CPU at hence will not see what's going on in between

just my 2 cents
Reinhard
knobdy
Cacti User
Posts: 495
Joined: Wed Sep 28, 2005 1:39 pm

Post by knobdy »

gandalf wrote:I'm using quite the same approach with iostat. I'm taking 16 cycles and average the last 15 one's via some sh/awk magic. Due to the long runtime, I have to submit those commands by crontabs on target machines, putting the results into some magic files and accessing them via http.
Its very cruel, and I do not like it very much.
Reinhard
I just posted a link to a way of getting io stats from snmp - have you looked into doing all of that? It looks like a lot of work for us on over 100 AIX machines but might well be worth it in the end.
http://forums.cacti.net/viewtopic.php?t=6072&start=75
bbice
Cacti User
Posts: 71
Joined: Mon May 13, 2002 6:53 pm

Post by bbice »

You mentioned that on a 2 cpu system the stats were all doubled and you had to divide by two. I've found that this depends heavily on what platform net-snmp is running on and what OS is on that platform. As I recall, on multi-cpu solaris machines the net-snmp cpu stats all added up to 100%, on most (all?) linux boxes it added up to Num_Cpus*100%, and on some other systems it was Num_Cpus*10*100%. I think there was even one other odd variation (a really old AIX or HPUX maybe?) where I had to multiply by 2 for a single processor machine, multiply by 4 for a 4 processor machine, etc. The more the processors, the smaller fraction of 100% the values would add up to. Weird.

Another thing I found was that I needed to monitor different cpu stats on different platforms to get everything to add up to 100%. On some systems, for instance, net-snmp returns separate values for System and Kernel. Some systems, as I recall, some values returned are the sum of 2 or 3 other values you can query (ie, System being a sum of Kernel and IOWait).

Unfortunately, all the templates I did for the different platforms were at a previous employer. I'll see if I can get 'em to give me copies for public release -- or you can just do what I did and run a few snmpwalks on each platform 'n see what adds up to 100% and what values are returned on which platforms. :-)

Brent
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests