World Community Grid - View Thread - Dilemma disabling Duration Correction Factor

World Community Grid Forums

Category: Support

Forum: BOINC Agent Support

Thread: Dilemma disabling Duration Correction Factor

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 6

[ ]

Author

This topic has been viewed 764 times and has 5 replies

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1322
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

90 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Dilemma disabling Duration Correction Factor

Since the GPU-crunching is gamma, WCG has disabled the use of the Duration Correction Factor by using <dont_use_dcf/> on the server.

I understand that you have to do this because of the discrepancy you otherwise will have between GPU- and CPU-tasks and flooding machines with too much CPU-tasks with a bit oversized buffer.

Is the server learning and adjusting the estimated run times on the GPU's?

My 3 cards:
GT240: estimated 8m28s -> runtime 14m15s
HD 7770: estimated 1hr10m -> runtime 0.02hr
A6 4400M APU: estimated 4hr55m -> runtime 0.27hr

[Oct 11, 2012 9:17:13 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Dilemma disabling Duration Correction Factor

Really?... well you'd know since you can see in the control files. The discussion on this stop-gap solution over at the developers mailing list started up again and the specific cases brought forward, again, for the mixed CPU/GPU environment in particular.

Unfortunately, IIRC, the chief developer has a strong antipathy against DCF since years, which is to me the main reason why DCF *per-science* was never implemented [someone did in a private build and has it working so that large variability of one science does not affect the buffering of another]. Looks like the hand was forced FTM for WCG, though per a post by Ingleside in the last 24 hours, this flooding is supposedly fixed with client 7.0.36 and up (am running 7.0.42 now). Trouble is, even if the affected could upgrade, it would have to be an individual advise [same as the one in the FAQs to go v7 to gain per-GPU card control on multicard devices]. Not setting the <don't_use_dcf> would affect those on 6 and earlier clients too, and of those there are hundreds of thousands, so as per what you said "I understand you have to do this".

The server learns mean run times, and sticks those in the task headers, but do not know if this is for CPU and GPU separately. Assuming they're not [yet] separate feeds, the mean run time for HCC will drop quite a bit in the coming days. It was 1.52 hours just before launch, so we'll watch. Should be visible in the next stats run, if not already in the last midnight run.

P.S. Can't speak for WCG, but think the techs were loath to use this <don't...> and were looking for a different [better] solution.

edit: On run time average, indeed the few hours and not many knowing yet were enough to drop the mean runtime outside of trend from 1.52 to 1.48. Noon will give a firmer indicator.

----------------------------------------
[Edit 1 times, last edit by Former Member at Oct 11, 2012 9:47:16 AM]

[Oct 11, 2012 9:42:39 AM]

mikey
Veteran Cruncher
Joined: May 10, 2009
Post Count: 822
Status: Offline
Project Badges:

180 day badge for Discovering Dengue Drugs - Together

180 day badge for The Clean Energy Project

1 year badge for Influenza Antiviral Drug Search

180 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

5 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

20 year badge for Microbiome Immunity Project

10 year badge for OpenPandemics - COVID-19


Re: Dilemma disabling Duration Correction Factor

I thought several months ago the conversation was about stopping the cpu units for HCC and going to strictly gpu units after a period of time. The period of time frame was never discussed here on the boards that I saw though. If that happens wouldn't this become a moot point?

----------------------------------------

[Oct 11, 2012 1:29:44 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Dilemma disabling Duration Correction Factor

Whoever launched the idea on the forums that HCC would go GPU only, was basely speculating. If HCC would go GPU only, then eventually the projected FPOPS [or whatever they are called for a GPU WU], would settle and the DCF would do too, *but*, DCF is a general problem for WCG. DCF was designed for 1 grid, 1 project. Most of our projects are non-deterministic in nature. Can't accurately predict an individuals WU runtime. Case in point, in the last week I've had SN2S from 17 minutes to 650 minutes, on the same machine. So WCG has an algorithm running that follows the science mean runtime and places the latest in the headers of new work send out. That is what's blowing the fuses with the GPU tasks having joined. Run GPU only for HCC and and other sciences on CPU maybe a stable DCF comes out after a while. Don't know. Will watch closely, but since <don't_use_dcf/> was put in place [found it in one of the scheduler_ files, DCF is presumably now dead for WCG.

----------------------------------------
[Edit 1 times, last edit by Former Member at Oct 11, 2012 1:46:33 PM]

[Oct 11, 2012 1:45:46 PM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:

1 year badge for The Clean Energy Project

180 day badge for Influenza Antiviral Drug Search

1 year badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

20 year badge for OpenPandemics - COVID-19


Re: Dilemma disabling Duration Correction Factor

So WCG has an algorithm running that follows the science mean runtime and places the latest in the headers of new work send out. That is what's blowing the fuses with the GPU tasks having joined. Run GPU only for HCC and and other sciences on CPU maybe a stable DCF comes out after a while. Don't know. Will watch closely, but since <don't_use_dcf/> was put in place [found it in one of the scheduler_ files, DCF is presumably now dead for WCG.

It looks like WCG's auto-adjusting interferes with the per-computer, per-application, per-plan-class adjusting being part of v7-server-code, since the GPU-tasks is still stuck on the severely overflated 1+ hour, it's actually increased a little instead of the huge drop should have happened after 10 validated results.

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

----------------------------------------
[Edit 1 times, last edit by Ingleside at Oct 11, 2012 4:21:22 PM]

[Oct 11, 2012 4:20:19 PM]

knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding

90 day badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Computing for Clean Water

14 day badge for Uncovering Genome Mysteries

45 day badge for Outsmart Ebola Together

180 day badge for FightAIDS@Home - Phase 2

1 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

180 day badge for OpenPandemics - COVID-19


Re: Dilemma disabling Duration Correction Factor

Out 'auto-adjusting' mechanism has no impact on what has happened in the past 48 hours. This is pure BOINC code playing out here.

[Oct 11, 2012 5:02:59 PM]

[ ]