Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 781
Posts: 781   Pages: 79   [ Previous Page | 41 42 43 44 45 46 47 48 49 50 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 945861 times and has 780 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

I'm experiencing some strange behaviour after modifying the app_config file.

I forced BOINC to run up to 8 GPU workunits in parallel:

<gpu_usage>0.125</gpu_usage>
<cpu_usage>0.25</cpu_usage>


This works absolutely fine. I run both GPU and CPU workunits and my GPU and CPU are able to process that many in parallel. This obviously has a dramatic effect on throughput.

However the BOINC client is not able to fetch GPU workunits anymore. It tries to fetch both CPU and GPU workunits but only receives CPU workunits. Anybody who experienced the same?

That is very likely the old BOINC problem that the scheduler gets confused when you try to run both CPU and GPU work units from the same project. It has something to do with the "duration correction factor" (DCF) as I recall. You have the same problem on Einstein or MilkyWay when you try to run both CPU and GPU.

It is as old as the hills. Maybe a BOINC expert (are you there Richard?) can illuminate it further.

I thought DCF was turned off at WCG and it is handled using an algorithm on the server

From 2017:
As DCF is locked to 1.000000 by WCG on standard clients, meaning the client does not adapt/adjust runtime to real-time throughput, the only messing happening is server driven. Combined with the lapse rate between work generation, the point where fpops are slotted in, and current average runtime used as base for setting those fpops, at science level, makes for chaos on any science that has large variability in their runtime durations, HST1 neither a stranger to the issue.
----------------------------------------
[Edit 1 times, last edit by Former Member at Apr 29, 2021 5:01:37 PM]
[Apr 29, 2021 4:58:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

I thought DCF was turned off at WCG and it is handled using an algorithm on the server

That looks to be the case. All my DCF show 1.00000000000.
But I don't think that prevents the server from creating the problem, does it?

It may not be a problem here, as I noted above, due to the different task names. But I run only GPU for OPN, and see no point using the CPU.

EDIT: Then of course I can't run any WCG CPU projects, since I have to set CPU to "off". But there are plenty of other worthwhile projects. For COVID-19, there is always Rosetta and SiDock. And plenty of non-COVID projects.
----------------------------------------
[Edit 2 times, last edit by Jim1348 at Apr 29, 2021 5:26:07 PM]
[Apr 29, 2021 5:23:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Good afternoon,

We are going to be making some changes to the work units that are being sent out. This is to help a storage issue that we are trying to prevent on the backend. Without these changes, we would more than likely have to stop the stress test before all 30k batches are complete.

The changes we are making is setting the deadlines to 3 days instead of 7 days that have been previously sent out. All new work downloaded will have the 3 day deadline. Also, because we would like to hit the plateau of work being packaged sooner, we are going to over schedule about 7000 work units that are preventing about 2,000 batches from completing. This allows us to start seeing where a steady state with a 3 day deadline is at, as well as starts the later stages of the pipeline for sending results back to the researchers to happen at a consistent pace.

Note: For this to happen, I will be turning off validation and the feeder for a few minutes.

Thanks,
-Uplinger
[Apr 29, 2021 5:30:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Hello again,

Feeder and validators have been re-enabled.

Thanks,
-Uplinger
[Apr 29, 2021 5:33:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
kittyman
Advanced Cruncher
Joined: May 14, 2020
Post Count: 140
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

You just might toss some Boinc clients into panic mode, and they will start processing the new, shorter deadline WUs first.................
Just meowin'.

Meow
----------------------------------------

[Apr 29, 2021 5:41:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

I'm experiencing some strange behaviour after modifying the app_config file.

I forced BOINC to run up to 8 GPU workunits in parallel:

<gpu_usage>0.125</gpu_usage>
<cpu_usage>0.25</cpu_usage>


This works absolutely fine. I run both GPU and CPU workunits and my GPU and CPU are able to process that many in parallel. This obviously has a dramatic effect on throughput.

However the BOINC client is not able to fetch GPU workunits anymore. It tries to fetch both CPU and GPU workunits but only receives CPU workunits. Anybody who experienced the same?

That is very likely the old BOINC problem that the scheduler gets confused when you try to run both CPU and GPU work units from the same project. It has something to do with the "duration correction factor" (DCF) as I recall. You have the same problem on Einstein or MilkyWay when you try to run both CPU and GPU.

It is as old as the hills. Maybe a BOINC expert (are you there Richard?) can illuminate it further.

I thought DCF was turned off at WCG and it is handled using an algorithm on the server

From 2017:
As DCF is locked to 1.000000 by WCG on standard clients, meaning the client does not adapt/adjust runtime to real-time throughput, the only messing happening is server driven. Combined with the lapse rate between work generation, the point where fpops are slotted in, and current average runtime used as base for setting those fpops, at science level, makes for chaos on any science that has large variability in their runtime durations, HST1 neither a stranger to the issue.
I could take a look, but I'd need to find a quiet space to think about it. I'd feel more comfortable in https://boinc.berkeley.edu/forum_forum.php?id=10 where there's a bit more space to move and we won't get buried in the flood of posts here. Could someone come across there, please, and explain the problem from the beginning?
[Apr 29, 2021 5:43:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

You just might toss some Boinc clients into panic mode, and they will start processing the new, shorter deadline WUs first.................
Just meowin'.

Meow
I think we're fairly safe on that score. These tasks are so short that they make it from the back of the cache to the front in about 2.5 hours.
[Apr 29, 2021 5:46:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Pandelta
Advanced Cruncher
Joined: Jun 24, 2012
Post Count: 55
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

I hope you all can greatly increase GPU units after the stress test and keep this going. I am highly tempted to go buy an overpriced card.

From the numbers I have seen, the higher-end cards don't get you much more performance. Maybe someone here with an RTX, for example, could show what they are getting.


After fine tuning my card, I got 17M points yesterday with my RTX 3080. I might be able to get it to 20M. There's still headroom, because it's not running at 100% all the time.


Holy Smokes! I thought I was doing good Lol That's awesome!
[Apr 29, 2021 5:47:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
kittyman
Advanced Cruncher
Joined: May 14, 2020
Post Count: 140
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

You just might toss some Boinc clients into panic mode, and they will start processing the new, shorter deadline WUs first.................
Just meowin'.

Meow
I think we're fairly safe on that score. These tasks are so short that they make it from the back of the cache to the front in about 2.5 hours.

Granted. But there are some awfully slow GPUs out there.....LOL.

Meow
----------------------------------------

[Apr 29, 2021 5:49:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2508
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

You just might toss some Boinc clients into panic mode, and they will start processing the new, shorter deadline WUs first.................
Just meowin'.

Meow
I think we're fairly safe on that score. These tasks are so short that they make it from the back of the cache to the front in about 2.5 hours.

Granted. But there are some awfully slow GPUs out there.....LOL.

Meow

Exactly. My slow GTX 660M had a cache of 18 WU's with a deadline of May 5th and 6th. It just got a new WU with a deadline of May 2. Big panic mode, and it immediately started running the one with the May 2 deadline. That was really unnecessary, because those 18 cached would have been finished by tomorrow. Boinc is not especially smart when it comes to things like this.
----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Apr 29, 2021 6:00:04 PM]
[Apr 29, 2021 5:59:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 781   Pages: 79   [ Previous Page | 41 42 43 44 45 46 47 48 49 50 | Next Page ]
[ Jump to Last Post ]
Post new Thread