Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 6
|
![]() |
Author |
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1322 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Since the GPU-crunching is gamma, WCG has disabled the use of the Duration Correction Factor by using <dont_use_dcf/> on the server.
I understand that you have to do this because of the discrepancy you otherwise will have between GPU- and CPU-tasks and flooding machines with too much CPU-tasks with a bit oversized buffer. Is the server learning and adjusting the estimated run times on the GPU's? My 3 cards: GT240: estimated 8m28s -> runtime 14m15s HD 7770: estimated 1hr10m -> runtime 0.02hr A6 4400M APU: estimated 4hr55m -> runtime 0.27hr |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Really?... well you'd know since you can see in the control files. The discussion on this stop-gap solution over at the developers mailing list started up again and the specific cases brought forward, again, for the mixed CPU/GPU environment in particular.
----------------------------------------Unfortunately, IIRC, the chief developer has a strong antipathy against DCF since years, which is to me the main reason why DCF *per-science* was never implemented [someone did in a private build and has it working so that large variability of one science does not affect the buffering of another]. Looks like the hand was forced FTM for WCG, though per a post by Ingleside in the last 24 hours, this flooding is supposedly fixed with client 7.0.36 and up (am running 7.0.42 now). Trouble is, even if the affected could upgrade, it would have to be an individual advise [same as the one in the FAQs to go v7 to gain per-GPU card control on multicard devices]. Not setting the <don't_use_dcf> would affect those on 6 and earlier clients too, and of those there are hundreds of thousands, so as per what you said "I understand you have to do this". The server learns mean run times, and sticks those in the task headers, but do not know if this is for CPU and GPU separately. Assuming they're not [yet] separate feeds, the mean run time for HCC will drop quite a bit in the coming days. It was 1.52 hours just before launch, so we'll watch. Should be visible in the next stats run, if not already in the last midnight run. P.S. Can't speak for WCG, but think the techs were loath to use this <don't...> and were looking for a different [better] solution. edit: On run time average, indeed the few hours and not many knowing yet were enough to drop the mean runtime outside of trend from 1.52 to 1.48. Noon will give a firmer indicator. [Edit 1 times, last edit by Former Member at Oct 11, 2012 9:47:16 AM] |
||
|
mikey
Veteran Cruncher Joined: May 10, 2009 Post Count: 822 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Really?... well you'd know since you can see in the control files. The discussion on this stop-gap solution over at the developers mailing list started up again and the specific cases brought forward, again, for the mixed CPU/GPU environment in particular. Unfortunately, IIRC, the chief developer has a strong antipathy against DCF since years, which is to me the main reason why DCF *per-science* was never implemented [someone did in a private build and has it working so that large variability of one science does not affect the buffering of another]. Looks like the hand was forced FTM for WCG, though per a post by Ingleside in the last 24 hours, this flooding is supposedly fixed with client 7.0.36 and up (am running 7.0.42 now). Trouble is, even if the affected could upgrade, it would have to be an individual advise [same as the one in the FAQs to go v7 to gain per-GPU card control on multicard devices]. Not setting the <don't_use_dcf> would affect those on 6 and earlier clients too, and of those there are hundreds of thousands, so as per what you said "I understand you have to do this". The server learns mean run times, and sticks those in the task headers, but do not know if this is for CPU and GPU separately. Assuming they're not [yet] separate feeds, the mean run time for HCC will drop quite a bit in the coming days. It was 1.52 hours just before launch, so we'll watch. Should be visible in the next stats run, if not already in the last midnight run. P.S. Can't speak for WCG, but think the techs were loath to use this <don't...> and were looking for a different [better] solution. edit: On run time average, indeed the few hours and not many knowing yet were enough to drop the mean runtime outside of trend from 1.52 to 1.48. Noon will give a firmer indicator. I thought several months ago the conversation was about stopping the cpu units for HCC and going to strictly gpu units after a period of time. The period of time frame was never discussed here on the boards that I saw though. If that happens wouldn't this become a moot point? ![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Whoever launched the idea on the forums that HCC would go GPU only, was basely speculating. If HCC would go GPU only, then eventually the projected FPOPS [or whatever they are called for a GPU WU], would settle and the DCF would do too, *but*, DCF is a general problem for WCG. DCF was designed for 1 grid, 1 project. Most of our projects are non-deterministic in nature. Can't accurately predict an individuals WU runtime. Case in point, in the last week I've had SN2S from 17 minutes to 650 minutes, on the same machine. So WCG has an algorithm running that follows the science mean runtime and places the latest in the headers of new work send out. That is what's blowing the fuses with the GPU tasks having joined. Run GPU only for HCC and and other sciences on CPU maybe a stable DCF comes out after a while. Don't know. Will watch closely, but since <don't_use_dcf/> was put in place [found it in one of the scheduler_ files, DCF is presumably now dead for WCG.
----------------------------------------[Edit 1 times, last edit by Former Member at Oct 11, 2012 1:46:33 PM] |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So WCG has an algorithm running that follows the science mean runtime and places the latest in the headers of new work send out. That is what's blowing the fuses with the GPU tasks having joined. Run GPU only for HCC and and other sciences on CPU maybe a stable DCF comes out after a while. Don't know. Will watch closely, but since <don't_use_dcf/> was put in place [found it in one of the scheduler_ files, DCF is presumably now dead for WCG. It looks like WCG's auto-adjusting interferes with the per-computer, per-application, per-plan-class adjusting being part of v7-server-code, since the GPU-tasks is still stuck on the severely overflated 1+ hour, it's actually increased a little instead of the huge drop should have happened after 10 validated results. ![]() "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." [Edit 1 times, last edit by Ingleside at Oct 11, 2012 4:21:22 PM] |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So WCG has an algorithm running that follows the science mean runtime and places the latest in the headers of new work send out. That is what's blowing the fuses with the GPU tasks having joined. Run GPU only for HCC and and other sciences on CPU maybe a stable DCF comes out after a while. Don't know. Will watch closely, but since <don't_use_dcf/> was put in place [found it in one of the scheduler_ files, DCF is presumably now dead for WCG. It looks like WCG's auto-adjusting interferes with the per-computer, per-application, per-plan-class adjusting being part of v7-server-code, since the GPU-tasks is still stuck on the severely overflated 1+ hour, it's actually increased a little instead of the huge drop should have happened after 10 validated results. Out 'auto-adjusting' mechanism has no impact on what has happened in the past 48 hours. This is pure BOINC code playing out here. |
||
|
|
![]() |