| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 781
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I'm experiencing some strange behaviour after modifying the app_config file. I forced BOINC to run up to 8 GPU workunits in parallel: <gpu_usage>0.125</gpu_usage> <cpu_usage>0.25</cpu_usage> This works absolutely fine. I run both GPU and CPU workunits and my GPU and CPU are able to process that many in parallel. This obviously has a dramatic effect on throughput. However the BOINC client is not able to fetch GPU workunits anymore. It tries to fetch both CPU and GPU workunits but only receives CPU workunits. Anybody who experienced the same? That is very likely the old BOINC problem that the scheduler gets confused when you try to run both CPU and GPU work units from the same project. It has something to do with the "duration correction factor" (DCF) as I recall. You have the same problem on Einstein or MilkyWay when you try to run both CPU and GPU. It is as old as the hills. Maybe a BOINC expert (are you there Richard?) can illuminate it further. I thought DCF was turned off at WCG and it is handled using an algorithm on the server From 2017: As DCF is locked to 1.000000 by WCG on standard clients, meaning the client does not adapt/adjust runtime to real-time throughput, the only messing happening is server driven. Combined with the lapse rate between work generation, the point where fpops are slotted in, and current average runtime used as base for setting those fpops, at science level, makes for chaos on any science that has large variability in their runtime durations, HST1 neither a stranger to the issue. [Edit 1 times, last edit by Former Member at Apr 29, 2021 5:01:37 PM] |
||
|
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges:
|
I thought DCF was turned off at WCG and it is handled using an algorithm on the server That looks to be the case. All my DCF show 1.00000000000. But I don't think that prevents the server from creating the problem, does it? It may not be a problem here, as I noted above, due to the different task names. But I run only GPU for OPN, and see no point using the CPU. EDIT: Then of course I can't run any WCG CPU projects, since I have to set CPU to "off". But there are plenty of other worthwhile projects. For COVID-19, there is always Rosetta and SiDock. And plenty of non-COVID projects. [Edit 2 times, last edit by Jim1348 at Apr 29, 2021 5:26:07 PM] |
||
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
Good afternoon,
We are going to be making some changes to the work units that are being sent out. This is to help a storage issue that we are trying to prevent on the backend. Without these changes, we would more than likely have to stop the stress test before all 30k batches are complete. The changes we are making is setting the deadlines to 3 days instead of 7 days that have been previously sent out. All new work downloaded will have the 3 day deadline. Also, because we would like to hit the plateau of work being packaged sooner, we are going to over schedule about 7000 work units that are preventing about 2,000 batches from completing. This allows us to start seeing where a steady state with a 3 day deadline is at, as well as starts the later stages of the pipeline for sending results back to the researchers to happen at a consistent pace. Note: For this to happen, I will be turning off validation and the feeder for a few minutes. Thanks, -Uplinger |
||
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
Hello again,
Feeder and validators have been re-enabled. Thanks, -Uplinger |
||
|
|
kittyman
Advanced Cruncher Joined: May 14, 2020 Post Count: 140 Status: Offline Project Badges:
|
You just might toss some Boinc clients into panic mode, and they will start processing the new, shorter deadline WUs first.................
----------------------------------------Just meowin'. Meow ![]() |
||
|
|
Richard Haselgrove
Senior Cruncher United Kingdom Joined: Feb 19, 2021 Post Count: 360 Status: Offline Project Badges:
|
I'm experiencing some strange behaviour after modifying the app_config file. I forced BOINC to run up to 8 GPU workunits in parallel: <gpu_usage>0.125</gpu_usage> <cpu_usage>0.25</cpu_usage> This works absolutely fine. I run both GPU and CPU workunits and my GPU and CPU are able to process that many in parallel. This obviously has a dramatic effect on throughput. However the BOINC client is not able to fetch GPU workunits anymore. It tries to fetch both CPU and GPU workunits but only receives CPU workunits. Anybody who experienced the same? That is very likely the old BOINC problem that the scheduler gets confused when you try to run both CPU and GPU work units from the same project. It has something to do with the "duration correction factor" (DCF) as I recall. You have the same problem on Einstein or MilkyWay when you try to run both CPU and GPU. It is as old as the hills. Maybe a BOINC expert (are you there Richard?) can illuminate it further. I thought DCF was turned off at WCG and it is handled using an algorithm on the server From 2017: As DCF is locked to 1.000000 by WCG on standard clients, meaning the client does not adapt/adjust runtime to real-time throughput, the only messing happening is server driven. Combined with the lapse rate between work generation, the point where fpops are slotted in, and current average runtime used as base for setting those fpops, at science level, makes for chaos on any science that has large variability in their runtime durations, HST1 neither a stranger to the issue. |
||
|
|
Richard Haselgrove
Senior Cruncher United Kingdom Joined: Feb 19, 2021 Post Count: 360 Status: Offline Project Badges:
|
You just might toss some Boinc clients into panic mode, and they will start processing the new, shorter deadline WUs first................. I think we're fairly safe on that score. These tasks are so short that they make it from the back of the cache to the front in about 2.5 hours.Just meowin'. Meow |
||
|
|
Pandelta
Advanced Cruncher Joined: Jun 24, 2012 Post Count: 55 Status: Offline Project Badges:
|
I hope you all can greatly increase GPU units after the stress test and keep this going. I am highly tempted to go buy an overpriced card. From the numbers I have seen, the higher-end cards don't get you much more performance. Maybe someone here with an RTX, for example, could show what they are getting. After fine tuning my card, I got 17M points yesterday with my RTX 3080. I might be able to get it to 20M. There's still headroom, because it's not running at 100% all the time. Holy Smokes! I thought I was doing good Lol That's awesome! |
||
|
|
kittyman
Advanced Cruncher Joined: May 14, 2020 Post Count: 140 Status: Offline Project Badges:
|
You just might toss some Boinc clients into panic mode, and they will start processing the new, shorter deadline WUs first................. I think we're fairly safe on that score. These tasks are so short that they make it from the back of the cache to the front in about 2.5 hours.Just meowin'. Meow Granted. But there are some awfully slow GPUs out there.....LOL. Meow ![]() |
||
|
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2508 Status: Recently Active Project Badges:
|
You just might toss some Boinc clients into panic mode, and they will start processing the new, shorter deadline WUs first................. I think we're fairly safe on that score. These tasks are so short that they make it from the back of the cache to the front in about 2.5 hours.Just meowin'. Meow Granted. But there are some awfully slow GPUs out there.....LOL. Meow Exactly. My slow GTX 660M had a cache of 18 WU's with a deadline of May 5th and 6th. It just got a new WU with a deadline of May 2. Big panic mode, and it immediately started running the one with the May 2 deadline. That was really unnecessary, because those 18 cached would have been finished by tomorrow. Boinc is not especially smart when it comes to things like this. [Edit 1 times, last edit by Grumpy Swede at Apr 29, 2021 6:00:04 PM] |
||
|
|
|