World Community Grid - View Thread - OpenPandemics

World Community Grid Forums

Category: Active Research

Forum: OpenPandemics - COVID-19 Project

Thread: OpenPandemics - GPU Stress Test

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 781

[ ]

Author

This topic has been viewed 946406 times and has 780 replies

kittyman
Advanced Cruncher
Joined: May 14, 2020
Post Count: 140
Status: Offline
Project Badges:

1 year badge for Microbiome Immunity Project

1 year badge for OpenPandemics - COVID-19


Re: OpenPandemics - GPU Stress Test

Is it just the kitties' imagination, or are there a fair amount of more completed kibbles going into pending verification status today than before? Sure seems like I am seeing less valid status right after completion now.

Meow?

----------------------------------------

[Apr 29, 2021 3:20:54 PM]

Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2511
Status: Offline
Project Badges:

10 year badge for Mapping Cancer Markers

14 day badge for FightAIDS@Home - Phase 2

90 day badge for Africa Rainfall Project

2 year badge for OpenPandemics - COVID-19


Re: OpenPandemics - GPU Stress Test

I think the validator is falling behind now Kittyman.

[Apr 29, 2021 3:28:39 PM]

nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

5 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

20 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

5 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

20 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: OpenPandemics - GPU Stress Test

@nanoprobe - How many consecutive wu's are you running with your AMD cards?
I'm currently only running 1 wu at a time with my R9 280x.

I've done a fair bit of trolling through the stats and about 98% of those stats I looked at were NVIDIA cards.
Curious to hear what other Radeon cards are being used and the results seen.

For the smaller 4 digit tasks I ran 8 concurrently on my R9 280x. With these larger tasks I dialed it back 1 task at a time to see what was optimum. On the larger 5 digit tasks I found that 4 concurrently worked best for me. It would still run more but there was no run time advantage with more than 4. I will adjust accordingly if necessary after the stress test is done and we go back to the previous distribution mode.

----------------------------------------

In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.

----------------------------------------
[Edit 1 times, last edit by nanoprobe at Apr 29, 2021 3:34:16 PM]

[Apr 29, 2021 3:30:35 PM]

kittyman
Advanced Cruncher
Joined: May 14, 2020
Post Count: 140
Status: Offline
Project Badges:


Re: OpenPandemics - GPU Stress Test

I think the validator is falling behind now Kittyman.

And if I recall, somebody said that WCG doesn't have a server status page like Seti did to check up on that?

Meow?

----------------------------------------

[Apr 29, 2021 3:33:40 PM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding

2 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

20 year badge for Nutritious Rice for the World

2 year badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

20 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

50 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

100 year badge for FightAIDS@Home - Phase 2

50 year badge for Microbiome Immunity Project

50 year badge for OpenPandemics - COVID-19


Re: OpenPandemics - GPU Stress Test

is this a record? 1253 jobs. ran for over an hour on an RTX 2080. and after all that, it errored because the file size was too big.

OpenCL device: GeForce RTX 2080
INFO:[05:02:51] End AutoDock...
INFO:[05:02:52] Start AutoDock for OB3ZINC000020085894--7jji_002_mgltools--TYR380_inert.dpf(Job #1247)...
OpenCL device: GeForce RTX 2080
INFO:[05:02:55] End AutoDock...
INFO:[05:02:56] Start AutoDock for OB3ZINC000027705450--7jji_002_mgltools--TYR380_inert.dpf(Job #1248)...
OpenCL device: GeForce RTX 2080
INFO:[05:02:58] End AutoDock...
INFO:[05:02:59] Start AutoDock for OB3ZINC000001396771--7jji_002_mgltools--TYR380_inert.dpf(Job #1249)...
OpenCL device: GeForce RTX 2080
INFO:[05:03:01] End AutoDock...
INFO:[05:03:02] Start AutoDock for OB3ZINC000002483285--7jji_002_mgltools--TYR380_inert.dpf(Job #1250)...
OpenCL device: GeForce RTX 2080
INFO:[05:03:05] End AutoDock...
INFO:[05:03:06] Start AutoDock for OB3ZINC000307946267--7jji_002_mgltools--TYR380_inert.dpf(Job #1251)...
OpenCL device: GeForce RTX 2080
INFO:[05:03:09] End AutoDock...
INFO:[05:03:10] Start AutoDock for OB3ZINC000100229739--7jji_002_mgltools--TYR380_inert.dpf(Job #1252)...
OpenCL device: GeForce RTX 2080
INFO:[05:03:12] End AutoDock...
INFO:[05:03:13] Start AutoDock for OB3ZINC000064503714_1--7jji_002_mgltools--TYR380_inert.dpf(Job #1253)...
OpenCL device: GeForce RTX 2080
INFO:[05:03:16] End AutoDock...
INFO:Cpu time = 4109.314626
05:03:18 (1245982): called boinc_finish(0)

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>OPNG_0013370_00004_1_r303777678_0</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>
</message>

kind of sucks to waste over an hour on this and get nothing due to technicality (file size too big). maybe don't send out jobs this big? you'll never get the results.

as a comparison, I had another one of these run on the same system, but only 1029 jobs, ran for just under an hour and it wasn't "too big". and still only got the ~1700 credit reward.
OPNG_0013370_00005

so tasks that take ~1hr to run isn't worth more credits than tasks that run for ~1-2mins? something definitely needs to be looked at in terms of effort vs reward. either fix the slow running tasks or up the reward.

Sorry about this...I'm going to increase the size of the file that is sent back. It was max sized with limits of like 1000 ligands in a single work unit. I will fix this up for the future but can not fix what has already happened. We are going to discuss what we can do in the future. These include limiting it 500 ligands in a work unit or having the researchers randomize the ligands that are packaged in a batch. This way single batches don't swing the expected performance between them.

Thanks,
-Uplinger

[Apr 29, 2021 3:53:37 PM]

Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:

45 day badge for Nutritious Rice for the World

1 year badge for Help Fight Childhood Cancer

20 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

1 year badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project


Re: OpenPandemics - GPU Stress Test

I'm experiencing some strange behaviour after modifying the app_config file.

I forced BOINC to run up to 8 GPU workunits in parallel:

<gpu_usage>0.125</gpu_usage>
<cpu_usage>0.25</cpu_usage>

This works absolutely fine. I run both GPU and CPU workunits and my GPU and CPU are able to process that many in parallel. This obviously has a dramatic effect on throughput.

However the BOINC client is not able to fetch GPU workunits anymore. It tries to fetch both CPU and GPU workunits but only receives CPU workunits. Anybody who experienced the same?

That is very likely the old BOINC problem that the scheduler gets confused when you try to run both CPU and GPU work units from the same project. It has something to do with the "duration correction factor" (DCF) as I recall. You have the same problem on Einstein or MilkyWay when you try to run both CPU and GPU.

It is as old as the hills. Maybe a BOINC expert (are you there Richard?) can illuminate it further.

----------------------------------------
[Edit 1 times, last edit by Jim1348 at Apr 29, 2021 4:08:42 PM]

[Apr 29, 2021 4:08:03 PM]

erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 300
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

14 day badge for Discovering Dengue Drugs - Together

180 day badge for The Clean Energy Project - Phase 2

180 day badge for Uncovering Genome Mysteries

1 year badge for FightAIDS@Home - Phase 2

180 day badge for Microbiome Immunity Project

14 day badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: OpenPandemics - GPU Stress Test

... I will fix this up for the future but can not fix what has already happened. We are going to discuss what we can do in the future. These include limiting it 500 ligands in a work unit or having the researchers randomize the ligands that are packaged in a batch. This way single batches don't swing the expected performance between them.
Thanks,
-Uplinger

will you also address the topic/problem of excessive SSD use as discussed here a few days ago? I guess the suggestion was to use more RAM instead.

[Apr 29, 2021 4:18:01 PM]

m0320174
Cruncher
Joined: Feb 13, 2021
Post Count: 11
Status: Offline
Project Badges:


Re: OpenPandemics - GPU Stress Test

In the mean time the client downloaded a huge bunch of GPU workunits in 1 single shot.

I could imagine that this is cause by the way BOINC distributes work:
- initially I only processed X number of GPU workunits per hour.
- after modifying the settings I processed many more.
- Maybe it takes some time before the scheduler realizes that I can actually process much more than before.

[Apr 29, 2021 4:20:34 PM]

Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:


Re: OpenPandemics - GPU Stress Test

I think Uplinger said that they used different work unit names (e.g., OPNG) for the GPU work units to avoid the problem, so maybe you can do both CPU and GPU without interference, as though they were from two separate projects.

[Apr 29, 2021 4:29:28 PM]

spRocket
Senior Cruncher
Joined: Mar 25, 2020
Post Count: 280
Status: Offline
Project Badges:

100 year badge for Mapping Cancer Markers


Re: OpenPandemics - GPU Stress Test

I'm seeing more, too, but they're wingman units.

[Apr 29, 2021 4:37:27 PM]

[ ]