Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 27
Posts: 27   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3006 times and has 26 replies Next Thread
Dataman
Ace Cruncher
Joined: Nov 16, 2004
Post Count: 4865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CEP2 Job Running Very Slow

........CEP2 would do on a Pentium D ... it did not do well. biggrin I will be recycled.


you are forgiven this time, but next time there will be no reprive biggrin laughing


You will assimilated. Resistance is futile.


Hahaha ... what a difference a "t" makes ... but come to think of it ... One day "I" will be recycled. laughing biggrin laughing
whistling

cowboy
----------------------------------------


[Sep 26, 2010 9:18:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
NixChix
Veteran Cruncher
United States
Joined: Apr 29, 2007
Post Count: 1187
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CEP2 Job Running Very Slow

[I prepared a post yesterday, but I must not have clicked the POST button]

The WU eventually completed, but quit in the middle of job 13 after 12 hours of cpu time. The WU ran for between 36 and 48 wall-clock hours. The disparity between the cpu time and the wall-clock hours is very unusual to me. I don't have any throttling and the computer is not often used.

The wingman's WU also quit in the middle of job 13 after 12 hours of cpu time.

Cheers coffee
----------------------------------------

[Sep 27, 2010 11:33:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: CEP2 Job Running Very Slow

Make sure "Leave Application in Memory is Preempted/Suspended" is ticked as recommended by uplinger for CEP2.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Sep 28, 2010 7:33:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
NixChix
Veteran Cruncher
United States
Joined: Apr 29, 2007
Post Count: 1187
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CEP2 Job Running Very Slow

Make sure "Leave Application in Memory is Preempted/Suspended" is ticked as recommended by uplinger for CEP2.
Would that be the cause of the large difference between cpu time and wallclock hours?

Cheers coffee
----------------------------------------

[Sep 28, 2010 2:51:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: CEP2 Job Running Very Slow

Don't know for sure... you're soon to find out if you do select the option (now or soon a footnote on the System Requirements page)
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Sep 28, 2010 4:06:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
NixChix
Veteran Cruncher
United States
Joined: Apr 29, 2007
Post Count: 1187
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CEP2 Job Running Very Slow

Thanks for responding Sekerob.

I have now enabled the option on all my profiles. That may have not been the problem. My other desktop got 5 beta WUs and I dd not notice such a long wallclock time, nor did the WUs timeout. That particular computer got only 1 WU this time around so I have nothing else to compare to. I'll watch to see if it gets any more CEP beta WUs.

I did note that the wingman also timed out in the same part of the job if that means anything.

Cheers coffee
----------------------------------------

[Sep 29, 2010 2:32:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Jason1478963
Senior Cruncher
United States
Joined: Sep 18, 2005
Post Count: 295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CEP2 Job Running Very Slow

I'm pulling some of my linux machines from this project because of this same issue. They are having an unacceptable difference between wall clock time and cpu time. These are dedicated crunchers that are seeing these issues. I don't like the idea of losing up to 25% of the cpu time on a dedicated cruncher.
----------------------------------------

[Sep 29, 2010 2:58:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: CEP2 Job Running Very Slow

I'm pulling some of my linux machines from this project because of this same issue. They are having an unacceptable difference between wall clock time and cpu time. These are dedicated crunchers that are seeing these issues. I don't like the idea of losing up to 25% of the cpu time on a dedicated cruncher.

So happens to have found a September discussion thread over at the developers of BOINC to limit instances for a science or an overall project (a long long wished for feature which has major scheduling implications for x-grid crunchers). That would have to be science level for umbrella grids such as WCG of course, but we've been noting many times here on the WCG forums that running a mix improves efficiency. My Linux quad is not allowed to do more than 2 and the 'overhead' that shows is in the 30-40 minutes. Mind you daytime it's used to. The techs are aware, but BOINC simply does not record things as time for disk I/O. And then there is the case of the ''lost'' time of what was suspected to be whole jobs inside the task. Is being investigated too and remedied should it be the case too on the Windows implementation.

6.19 cep2 E200379_176_A.24.C20H14N2S2.208.1.set1d06_2 07:46:44 (07:09:52) 29-09-2010 05:18 29-09-2010 06:23 Reported: OK
6.19 cep2 E200348_398_A.24.C16H9N5S3.77.2.set1d06_4 09:03:02 (08:30:28) 29-09-2010 03:28 29-09-2010 03:33 Reported: OK (u)
6.19 cep2 E200379_343_A.24.C20H14N2S2.239.2.set1d06_2 07:24:59 (06:45:10) 29-09-2010 00:16 29-09-2010 00:20 Reported: OK (u)
6.19 cep2 E200379_264_A.24.C20H14N2S2.224.3.set1d06_2 08:05:49 (07:23:02) 28-09-2010 18:51 28-09-2010 18:56 Reported: OK (u)
6.19 cep2 E200379_328_A.24.C20H14N2S2.236.2.set1d06_2 08:01:17 (07:16:21) 28-09-2010 16:51 28-09-2010 16:56 Reported: OK
6.19 cep2 E200385_737_A.24.C21H15NS2.56.3.set1d06_0 08:02:23 (07:23:28) 28-09-2010 08:55 28-09-2010 08:59 Reported: OK (u)
6.19 cep2 E200385_534_A.24.C21H15NS2.19.1.set1d06_1 08:17:33 (07:36:32) 27-09-2010 18:48 27-09-2010 18:53 Reported: OK

My meager duo had 6 Beta and was only allowed, later could only get n / 2 cores, so it did not run them concurrent for the last few and this got out (CPU time in brackets):

6.25 beta11 BETA_E200499_938_A.25.C18H10N4S2Se.178.1.set1d06_1 12:02:20 (10:59:11) 23-09-2010 07:54 23-09-2010 08:00 Reported: OK
6.25 beta11 BETA_E200495_123_A.25.C21H14N2OS.82.3.set1d06_0 12:48:53 (12:00:00) 22-09-2010 15:56 22-09-2010 16:02 Reported: OK
6.25 beta11 BETA_E200368_482_A.24.C19H12N2S3.47.3.set1d06_0 06:36:45 (06:09:56) 16-09-2010 23:46 16-09-2010 23:51 Reported: OK (u)
6.25 beta11 BETA_E200366_555_A.24.C19H12N2OS2.261.1.set1d06_1 10:31:55 (09:25:30) 16-09-2010 17:02 16-09-2010 17:07 Reported: OK (u)
6.25 beta11 BETA_E200360_771_A.24.C18H12N4S2.30.2.set1d06_1 11:09:00 (09:50:39) 16-09-2010 04:10 16-09-2010 04:16 Reported: OK (u)
beta11 BETA_E200360_593_A.24.C18H12N4S2.129.0.set1d06_1 13:27:52 (11:22:53) 15-09-2010 22:44 15-09-2010 22:50 Reported: OK (u)

The time difference on the first 2 is huge, so I've already decided that this one wont be running this science because it's my main working tool. Only 1 of 6 betas hit the 12 hour mark.

To underline, in any of the cases I can't say to notice any material impact on me as user... sometimes the momentary lapse when starting a new job. Funny enough I feel HFCC running more. To be fair, Clean Water can also have quite a bit of Elapsed loss, but not when the system is left alone. Then get 99.8% efficiency. Barely a minute lost on about 2.6 hours runtime under Linux 64 bit.

Summary, from the above not seeing 25% loss AT ALL during dedicated crunch time and only material gaps when system is used. Left alone, half an hour is probably the median and that's after 556 of these. Ideally, if the techs could get that "wait state" time recorded, BOINC is after all working, no one would have a problem from the reporting end, and not even know. It's a big model.

The manual OPT-IN was not for nothing :)

PS, the Linux machine has no particular state of the art HD, but does use ext4 and separate Swap File partition. Running it off a USB stick... I've got my doubts on that solution for any bigger science. Wonder what Brfts is going to show in improvement. Seemingly not to appear until Ubuntu 11

How I manage presently. Buffer a few days work, excluding CEP2. Then swap profile briefly to fetch a days worth of CEP2. Then through suspend/release action have them process a few at the time. Wish WCG could control the feeder to give true mix ... 4 sciences, 4 tasks, 1 of each received in that order, but that due the varying times on different sciences is very hard to do, so think it needs to be something inside the client itself. Without managing, receiving large swats of CEP2 interspersed with few of the rest... think the weight cannot be controlled at platform level. I'd be okay for n / 2 cores, but it would not be okay for those who only wish to exclusively do Clean Energy... that is lost time to WCG, so the routine needs to fill all cores if explicitly selected, with a warning on the System Requirements page maybe.

Early morning ramblings (and surely little news over what's been said before in these forums).
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Sep 29, 2010 5:50:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Jason1478963
Senior Cruncher
United States
Joined: Sep 18, 2005
Post Count: 295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CEP2 Job Running Very Slow

Thanks for the great reply Sekerob.

I have also tried to get a mix of work units and seem to get more clean energy then a mix. I imagine this will change when the windows client is released and windows clients can help balance the mix.

Why do we need to keep track of CPU time vs wall clock time? Is it really worth the extra overhead? Did it solve more problems then it created? I'm sure its a small amount of CPU time, but it is more overhead and boinc manager seems to have to wait more then the old clients trying to connect to the client. I watch most of my machines with boinview and like to see CPU efficiency above 95% and some projects are easier to achieve this then others.

It seems my older client on 7.10 Ubuntu 64(octicore) bit gives me less problems running 8 Clean Energy then my quad cores running 4 with the newer clients and Ubuntu 10.4

just my observations and thoughts
----------------------------------------

[Sep 29, 2010 5:08:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
NixChix
Veteran Cruncher
United States
Joined: Apr 29, 2007
Post Count: 1187
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CEP2 Job Running Very Slow

The computer that I was having problems with started freezing up the same night that I first got a CEP-2 beta. That, combined with having received a Windows update, had me wondering as to the cause. Since the freeze-ups continued after the beta WUs had completed, I knew that was not it the beta WUs. I even ran deep scans with MS Security Essentials and Lavasoft Ad-Aware to rule out those types of baddies. First it seemed that the SATA controller for my 2nd hard disk had failed since the freezing problem worsened and got to the point where the computer wouldn't boot at all until the 2nd hard disk was unplugged.

Even after I unplugged the 2nd hard disk the problem got worse and it is now to the point where it won't boot at all from just the primary PATA drive. I've taken that out and am testing it on another computer. So far no problems detected, so I think that the failure is the disk controller on the mobo. skull The initial soft failures of the controller were most likely responsible for the slowdown I was experiencing with the beta, especially since CEP2 heavily accesses the hard disk.

Final conclusion - never mind. biggrin

Cheers coffee
----------------------------------------

[Oct 9, 2010 7:21:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 27   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread