World Community Grid - View Thread - Monster WU on the loose...

World Community Grid Forums

Category: Completed Research

Forum: Help Cure Muscular Dystrophy - Phase 2 Forum

Thread: Monster WU on the loose...

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 98

[ ]

Author

This topic has been viewed 15624 times and has 97 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Monster WU on the loose...

For your information, I have had two WUs complete on one system where Vista is the Windows operating system. I bring this up due to the name of each WU.

CMD2_ 0002-RADIA.clustersOccur-RADIA.clustersOccur_ 2383_ 0-- DAVETHOMPSON-PC Pending Validation 5/27/09 20:56:56 5/28/09 11:41:48 3.36 53.1 / 0.0
CMD2_ 0002-RADIA.clustersOccur-RADIA.clustersOccur_ 788_ 0-- DAVETHOMPSON-PC Pending Validation 5/27/09 19:13:18 5/28/09 03:37:25 1.25 19.7 / 0.0

[May 30, 2009 11:44:33 PM]

JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

1 year badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project

1 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

10 year badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

180 day badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Monster WU on the loose...

bono_vox,
I am also running a P4 HT at 2.40 GHz (Northwood according to Everest) with two tasks and my advice would be to run yours with two tasks whichever percentage you choose to use. A 2-core processor runs at a lower temperature if you run only one task because half of the processor is really not used, but on a P4 HT the whole processor is used anyway, therefore the difference of temperature is very small or null. I have just suspended the second task of mine and the difference of temperature seems to be less than 1 °C from 55 °C to 54 °C after a while.

Before deciding to run two tasks I have made a few tests and I found that the global throughput is about 25-30 % higher in dual mode versus single.

By the way, unless your type of processor is known to run hot you might consider removing the dust from your processor heatsink.

Cheers. Jean.

----------------------------------------

Team--> Decrypthon -->Statistics/Join -->Thread

[May 31, 2009 1:43:19 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Monster WU on the loose...

Well, for now I'll keep running 2 threads @ 50% on my HT P4s worried

[May 31, 2009 3:46:17 PM]

TXR13
Cruncher
Canada
Joined: Dec 5, 2005
Post Count: 36
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

180 day badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

1 year badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

1 year badge for Influenza Antiviral Drug Search

5 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for Discovering Dengue Drugs - Together - Phase 2

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

10 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

20 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Monster WU on the loose...

I've been gone for the weekend, so here's the update on my monster workunit.

CPU Time: 3 days, 20 hours, 20 minutes, 55 seconds
Percent complete: 16.352%
Current time: 19:30
Last checkpointed: 18:02:36 May 31
Previous checkpoints at: 13:52:54 May 31; 09:45:09; 05:37:48; 01:48:21; 21:19:33 May 30; 17:09:47; 13:01:05; 12:59:53; 12:58:19; 08:51:29; 04:42:09...

Obviously, there are some short-running positions that are handled very nicely. Just as obviously, many of these positions take some time to compute.

This does not appear to be typical, as the other CMD2 unit I have running on this machine is checkpointing nicely every 20 minutes, like clockwork.

Also for interest, from the client state file on the monster WU...
<rsc_fpops_est>11145059248785.000000</rsc_fpops_est>
<rsc_fpops_bound>1114505924878500.000000</rsc_fpops_bound>

I'm really scratching my head on this one. I mean, I thought the techs had set a time limit so that huge monsters like this wouldn't be an issue. Is this just a weird fluke error, or what? Does anybody even know? It's not an issue for me, I'm just really curious why this particular unit seems so loaded down with tough positions, while the other one keeps clicking along and incrementing its progress every 20 minutes.

----------------------------------------

[Jun 1, 2009 2:43:53 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Monster WU on the loose...

Hello TXR13,
HCMD2 6.13 allows an optional time limit to be added to the work unit. A few days ago I think that knreed said that there were only a small (20+?) number of old work units without time limits.
'
'Lawrence

[Jun 1, 2009 3:18:51 AM]

TXR13
Cruncher
Canada
Joined: Dec 5, 2005
Post Count: 36
Status: Offline
Project Badges:


Re: Monster WU on the loose...

Ha! that would just be my luck, pulling one of those units without the time limit... tongue

----------------------------------------

[Jun 1, 2009 6:00:04 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Monster WU on the loose...

I have one of these bad boyz and mine has been crunching for 1.5 hours at 0%. worried

Edit: 2.25 hours still 0%

I gave up on it...
CMD2_ 0002-RADIA.clustersOccur-TPM1A.clustersOccur_ 4259_ 1-- 613 Aborted 5/28/09 20:53:21 6/1/09 13:18:59 86.80 2,586.6 / 0.0
frustrated

[Jun 1, 2009 1:53:58 PM]

JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:


Re: Monster WU on the loose...

TXR13,
The machine you are mentioning seems rather slow. Mine usually checkpoints about every minute (if I don't limit it via the "write to disk every xxx seconds " parameter) and when it hits a tough position it stays on it for about 4 hours. Which translates to 80 hours in your case. I hope I am crunching the last one of these monster WUs and it will be at about 23 hours when it passes the current position at 91.200 %. If you have such a WU that would mean about 20 days of crunching...

Maybe you should consider aborting it if it encounters another tough position after its current one.

Cheers. Jean.

----------------------------------------

Team--> Decrypthon -->Statistics/Join -->Thread

[Jun 1, 2009 2:32:39 PM]

eviltoad
Senior Cruncher
Australia
Joined: Nov 5, 2005
Post Count: 190
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding - Phase 2

180 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

5 year badge for Nutritious Rice for the World

2 year badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

5 year badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for Drug Search for Leishmaniasis

10 year badge for GO Fight Against Malaria

5 year badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

20 year badge for Outsmart Ebola Together

20 year badge for Smash Childhood Cancer

20 year badge for Africa Rainfall Project


Re: Monster WU on the loose...

I've just had a monster WU error out after nearly 78 hours:
CMD2_ 0001-1HCI_ A.clustersOccur-2O72_ A.clustersOccur_ 2386_ 2-- 613 Error 5/29/09 13:58:44 6/1/09 23:22:18 77.87 1,320.9 / 0.0

because the maximum CPU time was exceeded.

The system in question is built around a Q9400 (2.66GHz) with 3GB RAM and runs Ubuntu Jaunty.

I have another such WU on the same system:
CMD2_ 0002-RADIA.clustersOccur-U2AF1A.clustersOccur_ 44_ 0--
still in progress at 91% after 68+ hours.

Is it worth keeping going?

----------------------------------------

[Jun 2, 2009 1:28:03 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Monster WU on the loose...

Hello eviltoad,
Why not keep going. Either it will time out or return successfully in another half day.

Lawrence

[Jun 2, 2009 1:33:52 AM]

[ ]