CPU utilisation - user procs dissapeared ?

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

stucky101
Posts: 20
Joined: Sun Dec 02, 2007 12:31 am
Contact:

CPU utilisation - user procs dissapeared ?

Post by stucky101 »

Guys

The below graph is from an oracle database on a PE 6850 that serves as a backend for siebel.
Look at the area between 8am and 10am. We had a major problem with some queries that were causing the clients to hang.
The problem is this : usually you get a high count of user procs since all oracle procs run under the user 'oracle'.
Assuming that "system procs" are considered the ones running under root I don't understand the graph.
When I logged on at around 8.30 and did a top I saw tons of oracle procs using up most of the cpu.

Q1 - Why do I see an increase in system procs instead of user procs ?
Q2 - Why are all the user procs completely gone in that time period (no blue graph at all)

How can there be no user procs at all within this time period ?

I saw this one single time before 2 years ago on 2 nodes of an oracle RAC system. Same exact thing - after a reboot the blue came back.

I'm trying to interpret this for the dba's but I'm not sure I can.
Can anyone help me ?

DB runs Oracle 10g on RHEL4-U5 using net-snmp to poll. The PE 6850 has 4 sockets - each dual-cored. HT per core is turned off so linux sees 8 procs.

Thanks

--stucky
Attachments
CPU util
CPU util
cpustats.JPG (35.37 KiB) Viewed 3234 times
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

Please zoom into the graph at exactly the timespan of that problem and re-post the graph
Reinhard
stucky101
Posts: 20
Joined: Sun Dec 02, 2007 12:31 am
Contact:

Post by stucky101 »

Gandalf

Dh'ou - I totally forgot about cacti's zooming capabilities - thx for reminding me.
Attached are 2 zooms
1. The weird one.
2. A normal one from a day later.
I'm not sure it reveals anyhing more other than that something was very different on day one.
Could it be that if system procs were higher than user procs the red would overshadow the blue ?
Not that this could ever happen on a busy database with 300 plus oracle procs running but if it was the case I wonder how cacti would graph it.
I've been poking though other graphs and I can't find any that don't show at least a little blue except this one we're talking about.
Attachments
cacti_sbl.JPG
cacti_sbl.JPG (79.97 KiB) Viewed 3208 times
stucky101
Posts: 20
Joined: Sun Dec 02, 2007 12:31 am
Contact:

Post by stucky101 »

Ok I looked at values and they don't make sense to me.

According to my last post the system procs were in fact higher than user procs.
1. I don't see how that could be when a top showed oracle totally hogging the cpu.
2. I attached another graph from when we had bad queries before and they look totally like I'd expect them to look.
What I don't understand here though is that all 3 values for system and user procs are nearly identical - yet we have red graphs shortly under 40% and blue graphs in the 80% range.
I'm probably not reading this right. Hope this info helps.
Attachments
cacti_sbl_high_normal.JPG
cacti_sbl_high_normal.JPG (45.97 KiB) Viewed 3205 times
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

Please visit that graph at Graph Management and switch to DEBUG mode. Please post the whole rrdtool graph statement
Reinhard
stucky101
Posts: 20
Joined: Sun Dec 02, 2007 12:31 am
Contact:

Post by stucky101 »

RRDTool Command:

/usr/bin/rrdtool graph - \
--imgformat=PNG \
--start=-86400 \
--end=-300 \
--title="SBLPRD 2 - CPU Usage" \
--rigid \
--base=1000 \
--height=120 \
--width=500 \
--alt-autoscale-max \
--lower-limit=0 \
--vertical-label="percent" \
--slope-mode \
DEF:a="/var/www/html/cacti-0.8.6j/rra/sblprd_2_cpu_system_875.rrd":cpu_system:AVERAGE \
DEF:b="/var/www/html/cacti-0.8.6j/rra/sblprd_2_cpu_user_876.rrd":cpu_user:AVERAGE \
DEF:c="/var/www/html/cacti-0.8.6j/rra/sblprd_2_cpu_nice_874.rrd":cpu_nice:AVERAGE \
CDEF:cdefbc=TIME,1196707863,GT,a,a,UN,0,a,IF,IF,TIME,1196707863,GT,b,b,UN,0,b,IF,IF,TIME,1196707863,GT,c,c,UN,0,c,IF,IF,+,+ \
AREA:a#FF0000:"System" \
GPRINT:a:LAST:"Current\:%8.2lf %s" \
GPRINT:a:AVERAGE:"Average\:%8.2lf %s" \
GPRINT:a:MAX:"Maximum\:%8.2lf %s\n" \
AREA:b#0000FF:"User":STACK \
GPRINT:b:LAST:" Current\:%8.2lf %s" \
GPRINT:b:AVERAGE:"Average\:%8.2lf %s" \
GPRINT:b:MAX:"Maximum\:%8.2lf %s\n" \
AREA:c#00FF00:"Nice":STACK \
GPRINT:c:LAST:" Current\:%8.2lf %s" \
GPRINT:c:AVERAGE:"Average\:%8.2lf %s" \
GPRINT:c:MAX:"Maximum\:%8.2lf %s\n" \
LINE1:cdefbc#000000:"Total" \
GPRINT:cdefbc:LAST:" Current\:%8.2lf %s" \
GPRINT:cdefbc:AVERAGE:"Average\:%8.2lf %s" \
GPRINT:cdefbc:MAX:"Maximum\:%8.2lf %s"

RRDTool Says:

OK
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

Ok, those are AREA/STACKs as should be. No error here. You may of course dump the rrd file's contents (e.g. using dataquery, else rrdtool fetch) to make sure the rrd file does not contain other numbers.
Aah, stop, wait, ...
Is this a multi core CPU? In this case, "user" proc may have exceeded the value of 100 which is the (wrong) default MAXIMUM of the proc data source. Upper the MAXIMUM to "number of cores * 100" and apply same change to all existing rrd files of this type using "rrdtool tune"
Reinhard
User avatar
fmangeant
Cacti Guru User
Posts: 2345
Joined: Fri Sep 19, 2003 8:36 am
Location: Sophia-Antipolis, France
Contact:

Post by fmangeant »

gandalf wrote:Aah, stop, wait, ...
Is this a multi core CPU? In this case, "user" proc may have exceeded the value of 100 which is the (wrong) default MAXIMUM of the proc data source. Upper the MAXIMUM to "number of cores * 100" and apply same change to all existing rrd files of this type using "rrdtool tune"
Hi

Reinhard is right : with Net-SNMP < 5.4 on Linux boxes, CPU usage goes from 0 to 100 x number of procs.

You can use the template in my signature for 2, 4 and 8 CPU systems.
[size=84]
[color=green]HOWTOs[/color] :
[list][*][url=http://forums.cacti.net/viewtopic.php?t=15353]Install and configure the Net-SNMP agent for Unix[/url]
[*][url=http://forums.cacti.net/viewtopic.php?t=26151]Install and configure the Net-SNMP agent for Windows[/url]
[*][url=http://forums.cacti.net/viewtopic.php?t=28175]Graph multiple servers using an SNMP proxy[/url][/list]
[color=green]Templates[/color] :
[list][*][url=http://forums.cacti.net/viewtopic.php?t=15412]Multiple CPU usage for Linux[/url]
[*][url=http://forums.cacti.net/viewtopic.php?p=125152]Memory & swap usage for Unix[/url][/list][/size]
stucky101
Posts: 20
Joined: Sun Dec 02, 2007 12:31 am
Contact:

Post by stucky101 »

Thanks everybody - I'm trying the 2/4/8 way templates right now.
The PE 6850 has 8 cores but HT can be turned on per core so linux sees
16 cpus. Your templates refer to cores only right ?
I know HT is not the same thing as a core but then again linux doesn't make a difference. It sees it all as a cpu.
Thoughts ?

PS: Time to update the default templates for cacti maybe ? I mean who runs single core systems these days ?
User avatar
fmangeant
Cacti Guru User
Posts: 2345
Joined: Fri Sep 19, 2003 8:36 am
Location: Sophia-Antipolis, France
Contact:

Post by fmangeant »

To know how many CPU your Linux system sees, run that :

Code: Select all

$ grep -c ^processor /proc/cpuinfo
[size=84]
[color=green]HOWTOs[/color] :
[list][*][url=http://forums.cacti.net/viewtopic.php?t=15353]Install and configure the Net-SNMP agent for Unix[/url]
[*][url=http://forums.cacti.net/viewtopic.php?t=26151]Install and configure the Net-SNMP agent for Windows[/url]
[*][url=http://forums.cacti.net/viewtopic.php?t=28175]Graph multiple servers using an SNMP proxy[/url][/list]
[color=green]Templates[/color] :
[list][*][url=http://forums.cacti.net/viewtopic.php?t=15412]Multiple CPU usage for Linux[/url]
[*][url=http://forums.cacti.net/viewtopic.php?p=125152]Memory & swap usage for Unix[/url][/list][/size]
stucky101
Posts: 20
Joined: Sun Dec 02, 2007 12:31 am
Contact:

Post by stucky101 »

Well if you really go by what linux sees as a cpu then it's 16. I already knew that cause when you do a 'top' and press '1' it shows 16 cpu's there ( your test confirms too) but I thought you were referring to cores only and this box has 8. I always thought a core is real whereas HT is not considered quite the same as a core. Then again linux can't seem to distinguish.
This means I need another template then right ?
User avatar
fmangeant
Cacti Guru User
Posts: 2345
Joined: Fri Sep 19, 2003 8:36 am
Location: Sophia-Antipolis, France
Contact:

Post by fmangeant »

stucky101 wrote:This means I need another template then right ?
Yes :(

I'll try to post a template tomorrow.
[size=84]
[color=green]HOWTOs[/color] :
[list][*][url=http://forums.cacti.net/viewtopic.php?t=15353]Install and configure the Net-SNMP agent for Unix[/url]
[*][url=http://forums.cacti.net/viewtopic.php?t=26151]Install and configure the Net-SNMP agent for Windows[/url]
[*][url=http://forums.cacti.net/viewtopic.php?t=28175]Graph multiple servers using an SNMP proxy[/url][/list]
[color=green]Templates[/color] :
[list][*][url=http://forums.cacti.net/viewtopic.php?t=15412]Multiple CPU usage for Linux[/url]
[*][url=http://forums.cacti.net/viewtopic.php?p=125152]Memory & swap usage for Unix[/url][/list][/size]
stucky101
Posts: 20
Joined: Sun Dec 02, 2007 12:31 am
Contact:

Post by stucky101 »

Guys what do you think about this thread ?

http://forums.cacti.net/viewtopic.php?t ... &start=120

Sure tempting to get each cpu graphed separately as well...
stucky101
Posts: 20
Joined: Sun Dec 02, 2007 12:31 am
Contact:

Post by stucky101 »

nah never mind. I tried it and it's a pain. Some of those don't import at all and the others require too much manually mangling.
Besides they look messy on 16 way box anyway.
I'd rather wait on your 16 way template.

thanks

--stucky
stucky101
Posts: 20
Joined: Sun Dec 02, 2007 12:31 am
Contact:

Post by stucky101 »

Guys

Any news on the 16-way template ?
Also I have found that the new templates don't graph quite the way the old ones did.
It used to adjust the max based on the average of the utilisation.
Since I applied the new graps it always shows the 100% mark even if utilization is very low.
Doesn't make for as good a graph in my opinion. Can I adjust that so it graphs like before except with the correct number of cores ?
Attachments
cpu-8way.png
cpu-8way.png (136.22 KiB) Viewed 2950 times
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests