| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 18
|
|
| Author |
|
|
KSMooney
Cruncher Joined: Jun 10, 2007 Post Count: 1 Status: Offline Project Badges:
|
Not all cards finish in under 2 minutes. If the application would checkpoint every 2 minutes, or even every 1 minute, that'd be wonderful.
|
||
|
|
thebestjaspreet
Cruncher Canada Joined: Jun 16, 2011 Post Count: 10 Status: Offline Project Badges:
|
Do you have LAIM(Leave application in memory) active? AFAIK, the GPU app has no checkpoints thus it goes back to the beginning when stopped I believe you can check Computation allowed while computer is in use as well check Use GPU while computer is in use in preferences of boinc client and use something like TTHrottle (http://efmer.eu/boinc/ to control the temperatures. This will eliminate the problem with checkpointing and will help you to use the computer also. Hope it helps. |
||
|
|
cristipurdel
Senior Cruncher Joined: Dec 13, 2008 Post Count: 158 Status: Offline Project Badges:
|
Do you have LAIM(Leave application in memory) active? AFAIK, the GPU app has no checkpoints thus it goes back to the beginning when stopped I believe you can check Computation allowed while computer is in use as well check Use GPU while computer is in use in preferences of boinc client and use something like TTHrottle (http://efmer.eu/boinc/ to control the temperatures. This will eliminate the problem with checkpointing and will help you to use the computer also. Hope it helps. Tried it and it is not working with tthrottle, since it does not have "exclude gpu app" option. Tip of the Hat to KSMooney, it would be nice if the application would checkpoint at 2 or 4 minutes, so the faster cards would not be bother with this, while helping the slower cards. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
With the moans on lag, somehow perceive this wanted checkpointing to only add to that problem for the affected users. Probability of implementation, lest the Techs put multiply CPU WU in a single GPU WU: Small, more than small.
|
||
|
|
cristipurdel
Senior Cruncher Joined: Dec 13, 2008 Post Count: 158 Status: Offline Project Badges:
|
With the moans on lag, somehow perceive this wanted checkpointing to only add to that problem for the affected users. Probability of implementation, lest the Techs put multiply CPU WU in a single GPU WU: Small, more than small. I do not think it is hard to implement something that it is already done in other projects, and it doesn't require to reinvent the wheel. If this is too cumbersome to do, then I suggest that future apps should run one WU on each Compute Unit, so that every card should run them in "almost" the same amount of time. If the fastest cards can run one task in under 2 minutes, than wrap 16 of them, and do a chekpoint every 2 minutes so that everybody is happy. P.S. If memory serves me right, back when there were some hints about GPUs being used in WCG, you were saying that it cannot be done so easily or the speedup will not be as significant. |
||
|
|
cristipurdel
Senior Cruncher Joined: Dec 13, 2008 Post Count: 158 Status: Offline Project Badges:
|
I may have found a partial solution:
----------------------------------------1. From Boinc > Use GPU always 2. Install TThrottle, From Programs set temperatures for CPU & GPU very low, around 20-30 degrees From Preference check "If the computer is not used for" and I put 120 seconds (so if there is no activity the crunching is resumed) and set the temperatures as high as you would trust your hardware. From Expert put 10 seconds for "Rebuild list after 10 seconds" The disadvantages are: 1. If you are watching a movie with vlc, moc or flash, the computer will think that you are away and tthrottle is stoped 2. There is a small lag after you return from idle for 3-10 seconds but not that critical. Also while using the computer, from time to time, there is a small lag, .5 seconds but is not that annoying. Possible complete solutions: 1. HCC GPU should use checkpoint, if the developers want this. 2. BOINC re-enables LAIM for GPU, not wanted at the moment. 3. TThrottle gets an option like: continue crunching if the following processes are running. What is nice about TThrottle is that it does not suspend the taks like BOINC does, but it "keeps them alive" until throttle is stopped. ---------------------------------------- [Edit 2 times, last edit by cristipurdel at Oct 20, 2012 12:24:28 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
With the moans on lag, somehow perceive this wanted checkpointing to only add to that problem for the affected users. Probability of implementation, lest the Techs put multiply CPU WU in a single GPU WU: Small, more than small. I do not think it is hard to implement something that it is already done in other projects, and it doesn't require to reinvent the wheel. If this is too cumbersome to do, then I suggest that future apps should run one WU on each Compute Unit, so that every card should run them in "almost" the same amount of time. If the fastest cards can run one task in under 2 minutes, than wrap 16 of them, and do a chekpoint every 2 minutes so that everybody is happy. P.S. If memory serves me right, back when there were some hints about GPUs being used in WCG, you were saying that it cannot be done so easily or the speedup will not be as significant. There's checkpointing requested on the single WU per GPU task, which is what I'm talking about [think my post was clear on that]. If the plan, yes there was a plan, and multiple were packaged in a GPU job, then checkpointing becomes sensible [too noted in my post]. What I said few years ago and what I'm saying now? Was talking as said a single HCC WU task and some members having a lag issue where checkpointing at arbitrary points is likely to worsen the situation. And YES, it depends on which WCG project is brought to the GPU. The HCC is integer intense [none/little FPO dealt with I guess in the CPU phases], integer is easy peasy for a GPU card, yet to get it to production took how long and what was all needed to be put in place? Some research done by scientists have shown the case that little to none will be gained by porting other sciences to the GPU [search posts on the forums]. If there was, and there would be enough time to make a "bring it to the grid" viable, it would have been done. For Rice a GPU development was done, as post-processing project and it was not deemed big enough or too involved to bring it to WCG, so the scientists are doing it in-house. The future... anything GPU able in this type of research is substantially to be moved off grid is one expectation. Much easier to have a homogenous set of [latest] hardware and let it churn through in a few months than trying to cater for a zillion different volunteer requirements and afterwards having to deal with just enough statistical variance that your set of results is just not optimal. This is the most major concern with public, distributed computing. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Just to share a pretty good A to a Q over at the Berkeley forum, why paused GPU tasks *cant* resume where they were suspended
It appears that Leave Applications in Memory is not working with GPU tasks. Any plans to introduce the feature also for GPU applications? No - it was deliberately taken out. CPU (main system) memory is routinely swapped out to a pagefile by the operating system if things get tight - so LAIM has minimal effect on system performance. But GPU memory has no swapfile system - so anything left in memory, is in memory. That bit of BOINC was written when 512MB GPUs were common, and many projects (including Einstein, and SETI with cuda 2.3 and above) can only fit one task in that little VRAM. Leaving an app behind in VRAM would prevent any other GPU task running. |
||
|
|
|