Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 156
Posts: 156   Pages: 16   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 461324 times and has 155 replies Next Thread
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

The trick to stop switching is to set it to longer than the longest run times, been possible on web device profiles and clients for quite some time. Mentioned a few times and incorporated in one of the FAQs. The 999 number (seconds) limitation makes me think of the old checkpoint maximum. That limitation has been gone for a while and the field certainly stores 999999 seconds **. You can set it so a task never checkpoints (if the science app is compiled to ask, which I tested and CEP2 is not... it writes all the 16 jobs regardless). If set too longer than the longest run time, the only disk i/o during a run would be to the VM (swapfile) and at the end when the result is stored.

** Only takes effect at start of new task or after client restart.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Oct 13, 2010 12:46:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
RaymondFO
Veteran Cruncher
USA
Joined: Nov 30, 2004
Post Count: 561
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

RaymondFO, I guess you are running CEP2 on an i7, with all 8 threads.
What version of Linux are you using, is it a hard drive install or USB install, and how much RAM do you have?

I noticed that GPUGrid tasks use lots of CPU on Kubuntu, and other Linux versions, so I just use CPU 3 cores and get reasonable performance. If I just used 2 cores it would be 95%+ for CPU time/Elapsed time. As it is, with 3 cores, I get about 92% runtime efficiency.
Just going by 2 complete but unreported tasks I have CPU time 6.10, elapsed time 6.43 and CPU time 6.11, elapsed time 6.44. Q6600, 4GB, HDD install, Kubuntu x64.

I noticed that 3 running tasks seem to be less efficient at around 18 to 20% complete their CPU usage/Elapsed time efficiency was between 81% and 89%. This suggests that the inefficiencies occur more at the early stages.

Although these tasks have a VM size of 307MB their working set is only 61MB. However if a task is suspended to RAM and you have many tasks partially completed memory could be an issue. I generally use a low cache to avoid this.
[edited this last bit; I sometimes use a high number to avoid switching when running some non-WCG tasks, that reset/fail on system restarts in order to force them to complete]


I was running CEP2 on Ubuntu 10.4 LTS and for a short time 10.10 RC, both 32 bit. This project was run on I7 with and without hyper threading, and on an Intel Q9400 (775 socket 2.66 quad) chip. I ran CEP2 by itself, and with other project WU's (for i7, hyper threading was both on and off) with no change in results. The OS was running on a hard drive solely dedicated for Linux and was not physically attached to the Windows 7 hard drive. I tried everything to make this work, including upgrading the OS to 10.10 and BOINC 6.10.58, installing a fresh clean hard drive, installing 10.4 64bit version, and I upgraded the processor from Q9400 to a more powerful CPU. As for RAM, 4 gigs is the norm for each box, and these were originally crunched with CEP2 WU's LAM setting off and later on, and no change in results occurred. Nothing has worked. Wall clock times are between 6-11 hours with CPU time between 1.7- under 3 hours. I have already posted these huge gaps and provide other examples, my favorite being CPU time greater than wall clock time. That occurred after the OS crashed, but please note that this OS crash did not cause the poor crunching results. The poor crunching results were already in effect before the crash.

The one common denominator the computers usually (except for one computer) did share before the WU's became problematic is the OS crashed. The results were never the same for CEP2, but for any other WU project, no problem. That is why I used another clean hard drive to see if the OS or BOINC client were corrupted. Did not work.

If anyone has other other ideas or wishes to try an experiment with CEP2 work units, I am open as I still have an extra hard drive. What I do find interesting is initially the CEP2 crunching works fine for about one hour, then at some point thereafter the CPU and wall clock times diverge.
[Oct 13, 2010 12:54:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

You could try Kubuntu.
[Oct 13, 2010 1:10:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
RaymondFO
Veteran Cruncher
USA
Joined: Nov 30, 2004
Post Count: 561
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

Done. I will download and install Kubuntu 32bit version tomorrow night (very busy tonight). I will put this on a non-hyper threading computer and run CEP2. I will report back as the results occur.
----------------------------------------
[Edit 2 times, last edit by RaymondFO at Oct 13, 2010 2:20:37 PM]
[Oct 13, 2010 1:19:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
yose-ue
Cruncher
Joined: Dec 27, 2008
Post Count: 21
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

The problem is not a constant loss of cpu time. The cpu time decreases at or around the time when the job checkpoints.

I know I don't have enough to do I was checking progress every ten minutes.

I hope that may be helpful in locating the problem.
[Oct 14, 2010 12:35:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

Yup, that is the known point where this seems to happen. Wish we knew why. Is it because some BOINC are not quick enough to register the cumulative CPU time and pass that to the next job in a task. It's almost as if only the job setup time is being recorded.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Oct 14, 2010 1:01:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

My take on this problem is that a CPU poling process loses track, stops running, or more likely a separate process used to log the CPU usage times out. If so, could the priority of the polling process be raised or could this timeout be increased? Per chance is setting process affinity (the slot) a culprit here?

Whatever the cause, different operating systems seem to perform quite differently using the same hardware, so moving to a different Linux version might be a work round for some. Kubuntu 10.10 was released on Oct 10th, and it is easy to install and setup for Boinc & WCG, so that might be worth trying, but there are plenty of other versions.
[Oct 14, 2010 8:00:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
X-Files 27
Senior Cruncher
Canada
Joined: May 21, 2007
Post Count: 391
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

Whatever the cause, different operating systems seem to perform quite differently using the same hardware, so moving to a different Linux version might be a work round for some. Kubuntu 10.10 was released on Oct 10th, and it is easy to install and setup for Boinc & WCG, so that might be worth trying, but there are plenty of other versions.

Different versions or you mean different distribution biggrin

Every linux kernel has different effect on how it handle things - cpu and i/o scheduling.

Right now I have 3 test kernels on 3 rigs to determine whc one is better.
1) ubuntu 10.10 server, using 2.6.35.7 kernel (customize kernel - removed amd cpu features) and no x-window
2) ubuntu 10.10 server, using 2.6.35.7 kernel (stripped down kernel (9MB using localmodconfig) and no x-window
3) ubuntu 10.04.1 desktop, using default kernel and with x-window.

Now its waiting time to gather results whc one is better.
----------------------------------------

[Oct 14, 2010 10:15:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

Yeah, spotted. The distribution would be more important than the version.
[Oct 14, 2010 10:47:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
RaymondFO
Veteran Cruncher
USA
Joined: Nov 30, 2004
Post Count: 561
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

You could try Kubuntu.


Update:

I successfully installed Kubuntu 10.10 32bit version and I am now running the following WU's:

E200454_ 062_ A.24.C20H13NS3.189.0.set1d06_ 1-- Catseye In Progress 10/15/10 01:43:37 10/25/10 01:43:37 0.00 0.0 / 0.0
E200454_ 101_ A.24.C18H13N3S2Si.178.1.set1d06_ 0-- Catseye In Progress 10/15/10 01:43:37 10/25/10 01:43:37 0.00 0.0 / 0.0
E200454_ 003_ A.24.C19H13NOS2Si.189.4.set1d06_ 0-- Catseye In Progress 10/15/10 01:23:00 10/25/10 01:23:00 0.00 0.0 / 0.0
HFCC_ n1_ 02420243_ n1_ 0001_ 0-- Catseye In Progress 10/15/10 01:43:54 10/25/10 01:43:54 0.00 0.0 / 0.0

I will report the results of this experiment tomorrow.

Edit: CPU time: 27:05 min, elapsed time 1:33:21. The WU was not far behind then the WU apparently reached checkpoint. I will run all CEP2 downloaded WU's, but this is not encouraging.
Please note the HFCC WU: CPU time: 1:36:02, elapsed time: 1:37:17.
All WU's running concurrently.
----------------------------------------
[Edit 1 times, last edit by RaymondFO at Oct 15, 2010 3:23:52 AM]
[Oct 15, 2010 1:55:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 156   Pages: 16   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread