Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Help Cure Muscular Dystrophy - Phase 2 Forum Thread: Monster WU on the loose... |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 98
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
For your information, I have had two WUs complete on one system where Vista is the Windows operating system. I bring this up due to the name of each WU.
CMD2_ 0002-RADIA.clustersOccur-RADIA.clustersOccur_ 2383_ 0-- DAVETHOMPSON-PC Pending Validation 5/27/09 20:56:56 5/28/09 11:41:48 3.36 53.1 / 0.0 CMD2_ 0002-RADIA.clustersOccur-RADIA.clustersOccur_ 788_ 0-- DAVETHOMPSON-PC Pending Validation 5/27/09 19:13:18 5/28/09 03:37:25 1.25 19.7 / 0.0 |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: |
bono_vox,
----------------------------------------I am also running a P4 HT at 2.40 GHz (Northwood according to Everest) with two tasks and my advice would be to run yours with two tasks whichever percentage you choose to use. A 2-core processor runs at a lower temperature if you run only one task because half of the processor is really not used, but on a P4 HT the whole processor is used anyway, therefore the difference of temperature is very small or null. I have just suspended the second task of mine and the difference of temperature seems to be less than 1 °C from 55 °C to 54 °C after a while. Before deciding to run two tasks I have made a few tests and I found that the global throughput is about 25-30 % higher in dual mode versus single. By the way, unless your type of processor is known to run hot you might consider removing the dust from your processor heatsink. Cheers. Jean. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well, for now I'll keep running 2 threads @ 50% on my HT P4s
|
||
|
TXR13
Cruncher Canada Joined: Dec 5, 2005 Post Count: 36 Status: Offline Project Badges: |
I've been gone for the weekend, so here's the update on my monster workunit.
----------------------------------------CPU Time: 3 days, 20 hours, 20 minutes, 55 seconds Percent complete: 16.352% Current time: 19:30 Last checkpointed: 18:02:36 May 31 Previous checkpoints at: 13:52:54 May 31; 09:45:09; 05:37:48; 01:48:21; 21:19:33 May 30; 17:09:47; 13:01:05; 12:59:53; 12:58:19; 08:51:29; 04:42:09... Obviously, there are some short-running positions that are handled very nicely. Just as obviously, many of these positions take some time to compute. This does not appear to be typical, as the other CMD2 unit I have running on this machine is checkpointing nicely every 20 minutes, like clockwork. Also for interest, from the client state file on the monster WU... <rsc_fpops_est>11145059248785.000000</rsc_fpops_est> <rsc_fpops_bound>1114505924878500.000000</rsc_fpops_bound> I'm really scratching my head on this one. I mean, I thought the techs had set a time limit so that huge monsters like this wouldn't be an issue. Is this just a weird fluke error, or what? Does anybody even know? It's not an issue for me, I'm just really curious why this particular unit seems so loaded down with tough positions, while the other one keeps clicking along and incrementing its progress every 20 minutes. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello TXR13,
HCMD2 6.13 allows an optional time limit to be added to the work unit. A few days ago I think that knreed said that there were only a small (20+?) number of old work units without time limits. ' 'Lawrence |
||
|
TXR13
Cruncher Canada Joined: Dec 5, 2005 Post Count: 36 Status: Offline Project Badges: |
Ha! that would just be my luck, pulling one of those units without the time limit...
---------------------------------------- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I have one of these bad boyz and mine has been crunching for 1.5 hours at 0%. Edit: 2.25 hours still 0% I gave up on it... CMD2_ 0002-RADIA.clustersOccur-TPM1A.clustersOccur_ 4259_ 1-- 613 Aborted 5/28/09 20:53:21 6/1/09 13:18:59 86.80 2,586.6 / 0.0 |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: |
TXR13,
----------------------------------------The machine you are mentioning seems rather slow. Mine usually checkpoints about every minute (if I don't limit it via the "write to disk every xxx seconds " parameter) and when it hits a tough position it stays on it for about 4 hours. Which translates to 80 hours in your case. I hope I am crunching the last one of these monster WUs and it will be at about 23 hours when it passes the current position at 91.200 %. If you have such a WU that would mean about 20 days of crunching... Maybe you should consider aborting it if it encounters another tough position after its current one. Cheers. Jean. |
||
|
eviltoad
Senior Cruncher Australia Joined: Nov 5, 2005 Post Count: 190 Status: Offline Project Badges: |
I've just had a monster WU error out after nearly 78 hours:
----------------------------------------CMD2_ 0001-1HCI_ A.clustersOccur-2O72_ A.clustersOccur_ 2386_ 2-- 613 Error 5/29/09 13:58:44 6/1/09 23:22:18 77.87 1,320.9 / 0.0 because the maximum CPU time was exceeded. The system in question is built around a Q9400 (2.66GHz) with 3GB RAM and runs Ubuntu Jaunty. I have another such WU on the same system: CMD2_ 0002-RADIA.clustersOccur-U2AF1A.clustersOccur_ 44_ 0-- still in progress at 91% after 68+ hours. Is it worth keeping going? |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello eviltoad,
Why not keep going. Either it will time out or return successfully in another half day. Lawrence |
||
|
|