Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: AfricanClimate@Home Thread: I aborted one of these work units today |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 40
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I aborted task ach1_8_62_10 (I think this is correct) today. The
task had accumulated over 15 hours of CPU time, and the 'To completion' time for the task was increasing all day long instead of decreasing. -Brenda Helminen, Michigan Tech |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I don't know what went wrong, but I think Abort is the right thing to do in such cases. It immediately tells the srver to try another computer with a fresh copy.
|
||
|
David_L6
Senior Cruncher USA Joined: Aug 24, 2006 Post Count: 296 Status: Offline Project Badges: |
I just aborted 3 work units today that were doing that. I left one running that has now been running 23.5 hours, has 2 hours remaining, and is 91% complete just to see what the result is.
----------------------------------------These were aborted: ach1_16_26 (11.54 hrs CPU time) ach1_18_44 (16.99 hrs CPU time) ach1_18_52 (10.34 hrs CPU time) This one is still running: ach1_18_43 Hope some this info helps........ BTW, the computer these are/were running on has a QX6700 CPU, 3GB RAM, and 200GB hard drive space dedicated to WCG. Memory Settings: Use no more than: 60 % of memory while computer in use Use no more than: 80 % of memory while computer idle Use no more than: 80 % of virtual memory |
||
|
David_L6
Senior Cruncher USA Joined: Aug 24, 2006 Post Count: 296 Status: Offline Project Badges: |
This isn't looking good at all....... Now up to 25.5 hrs run time, 94% progress, and 1.5 hrs till completion.
---------------------------------------- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
There is a possibility that these work units are just really, really huge.
If they are larger than predicted, then the behaviour with the estimated time increasing and the progress increasing very slowly is exactly what we would expect to see. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
See other thread regarding this project and errors incurred. The longest did here 20:48 hours on a P4 2.53ghz and yes, the remaining time was increasing until it stabilised ar 21 hours and when using the system the predicted times went longer.... stopped using and it slowly crept down. It's an intense job, hungry for lots of CPU L2 cache and the less there is the more it slows down. Do 14 of these and you got the sunshine badge to show off. To compare, on the Q6600 it was done in 7 hours.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Jan 25, 2008 5:18:34 PM] |
||
|
David_L6
Senior Cruncher USA Joined: Aug 24, 2006 Post Count: 296 Status: Offline Project Badges: |
There is a possibility that these work units are just really, really huge. If they are larger than predicted, then the behaviour with the estimated time increasing and the progress increasing very slowly is exactly what we would expect to see. If the problem is that they are that huge and not something else, they never should have been sent out in the first place. Just think of the time it would take to run one sent to a computer that is slower than mine. Now at 26.5 hrs run time, 95.193 completed, 1 hr 17+ minutes to go..... |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Did WCG not warn that the reference machine took 14 hours to do them and they were tough to chew? Just consider the accomplishment
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
David_L6
Senior Cruncher USA Joined: Aug 24, 2006 Post Count: 296 Status: Offline Project Badges: |
See other thread regarding this project and errors incurred. The longest did here 20:48 hours on a P4 2.53ghz and yes, the remaining time was increasing until it stabilised ar 21 hours and when using the system the predicted times went longer.... stopped using and it slowly crept down. It's an intense job, hungry for lots of CPU L2 cache and the less there is the more it slows down. Do 14 of these and you got the sunshine badge to show off. To compare, on the Q6600 it was done in 7 hours. A P4 @ 2.53GHz doesn't even come close to comparing to a QX6700 @ 3.5GHz. Appears to me that there's a problem with some of these work units. Maybe there isn't??? I don't know for sure. That's why I posted the information that I posted. I'm not complaining - just posting my experiences with this project with the hope that it will help. I've two of these work units that were completed on a machine with a Q6700, 3GB RAM, and ~100GB of hard drive space. Those two are: ach1_17_63 (6.2 hours) and ach1_17_45 (6.19 hours) |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
WCG aim for 10 hours per work unit, but it's not always possible. Even in a batch of normal work units, there are sometimes monster units.
I don't know enough about ACH to say whether this work unit is unusual for the batch or not. Sometimes, the only way we can find out is by trying the work units, and watching for problems. |
||
|
|