Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 781
Posts: 781   Pages: 79   [ Previous Page | 40 41 42 43 44 45 46 47 48 49 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 946406 times and has 780 replies Next Thread
kittyman
Advanced Cruncher
Joined: May 14, 2020
Post Count: 140
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Is it just the kitties' imagination, or are there a fair amount of more completed kibbles going into pending verification status today than before? Sure seems like I am seeing less valid status right after completion now.

Meow?
----------------------------------------

[Apr 29, 2021 3:20:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2511
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

I think the validator is falling behind now Kittyman.
[Apr 29, 2021 3:28:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

@nanoprobe - How many consecutive wu's are you running with your AMD cards?
I'm currently only running 1 wu at a time with my R9 280x.

I've done a fair bit of trolling through the stats and about 98% of those stats I looked at were NVIDIA cards.
Curious to hear what other Radeon cards are being used and the results seen.

For the smaller 4 digit tasks I ran 8 concurrently on my R9 280x. With these larger tasks I dialed it back 1 task at a time to see what was optimum. On the larger 5 digit tasks I found that 4 concurrently worked best for me. It would still run more but there was no run time advantage with more than 4. I will adjust accordingly if necessary after the stress test is done and we go back to the previous distribution mode.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


----------------------------------------
[Edit 1 times, last edit by nanoprobe at Apr 29, 2021 3:34:16 PM]
[Apr 29, 2021 3:30:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
kittyman
Advanced Cruncher
Joined: May 14, 2020
Post Count: 140
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

I think the validator is falling behind now Kittyman.

And if I recall, somebody said that WCG doesn't have a server status page like Seti did to check up on that?

Meow?
----------------------------------------

[Apr 29, 2021 3:33:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

is this a record? 1253 jobs. ran for over an hour on an RTX 2080. and after all that, it errored because the file size was too big.

OpenCL device: GeForce RTX 2080
INFO:[05:02:51] End AutoDock...
INFO:[05:02:52] Start AutoDock for OB3ZINC000020085894--7jji_002_mgltools--TYR380_inert.dpf(Job #1247)...
OpenCL device: GeForce RTX 2080
INFO:[05:02:55] End AutoDock...
INFO:[05:02:56] Start AutoDock for OB3ZINC000027705450--7jji_002_mgltools--TYR380_inert.dpf(Job #1248)...
OpenCL device: GeForce RTX 2080
INFO:[05:02:58] End AutoDock...
INFO:[05:02:59] Start AutoDock for OB3ZINC000001396771--7jji_002_mgltools--TYR380_inert.dpf(Job #1249)...
OpenCL device: GeForce RTX 2080
INFO:[05:03:01] End AutoDock...
INFO:[05:03:02] Start AutoDock for OB3ZINC000002483285--7jji_002_mgltools--TYR380_inert.dpf(Job #1250)...
OpenCL device: GeForce RTX 2080
INFO:[05:03:05] End AutoDock...
INFO:[05:03:06] Start AutoDock for OB3ZINC000307946267--7jji_002_mgltools--TYR380_inert.dpf(Job #1251)...
OpenCL device: GeForce RTX 2080
INFO:[05:03:09] End AutoDock...
INFO:[05:03:10] Start AutoDock for OB3ZINC000100229739--7jji_002_mgltools--TYR380_inert.dpf(Job #1252)...
OpenCL device: GeForce RTX 2080
INFO:[05:03:12] End AutoDock...
INFO:[05:03:13] Start AutoDock for OB3ZINC000064503714_1--7jji_002_mgltools--TYR380_inert.dpf(Job #1253)...
OpenCL device: GeForce RTX 2080
INFO:[05:03:16] End AutoDock...
INFO:Cpu time = 4109.314626
05:03:18 (1245982): called boinc_finish(0)

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>OPNG_0013370_00004_1_r303777678_0</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>
</message>


kind of sucks to waste over an hour on this and get nothing due to technicality (file size too big). maybe don't send out jobs this big? you'll never get the results.

as a comparison, I had another one of these run on the same system, but only 1029 jobs, ran for just under an hour and it wasn't "too big". and still only got the ~1700 credit reward.
OPNG_0013370_00005

so tasks that take ~1hr to run isn't worth more credits than tasks that run for ~1-2mins? something definitely needs to be looked at in terms of effort vs reward. either fix the slow running tasks or up the reward.


Sorry about this...I'm going to increase the size of the file that is sent back. It was max sized with limits of like 1000 ligands in a single work unit. I will fix this up for the future but can not fix what has already happened. We are going to discuss what we can do in the future. These include limiting it 500 ligands in a work unit or having the researchers randomize the ligands that are packaged in a batch. This way single batches don't swing the expected performance between them.

Thanks,
-Uplinger
[Apr 29, 2021 3:53:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

I'm experiencing some strange behaviour after modifying the app_config file.

I forced BOINC to run up to 8 GPU workunits in parallel:

<gpu_usage>0.125</gpu_usage>
<cpu_usage>0.25</cpu_usage>


This works absolutely fine. I run both GPU and CPU workunits and my GPU and CPU are able to process that many in parallel. This obviously has a dramatic effect on throughput.

However the BOINC client is not able to fetch GPU workunits anymore. It tries to fetch both CPU and GPU workunits but only receives CPU workunits. Anybody who experienced the same?

That is very likely the old BOINC problem that the scheduler gets confused when you try to run both CPU and GPU work units from the same project. It has something to do with the "duration correction factor" (DCF) as I recall. You have the same problem on Einstein or MilkyWay when you try to run both CPU and GPU.

It is as old as the hills. Maybe a BOINC expert (are you there Richard?) can illuminate it further.
----------------------------------------
[Edit 1 times, last edit by Jim1348 at Apr 29, 2021 4:08:42 PM]
[Apr 29, 2021 4:08:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 300
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

... I will fix this up for the future but can not fix what has already happened. We are going to discuss what we can do in the future. These include limiting it 500 ligands in a work unit or having the researchers randomize the ligands that are packaged in a batch. This way single batches don't swing the expected performance between them.
Thanks,
-Uplinger

will you also address the topic/problem of excessive SSD use as discussed here a few days ago? I guess the suggestion was to use more RAM instead.
[Apr 29, 2021 4:18:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
m0320174
Cruncher
Joined: Feb 13, 2021
Post Count: 11
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

In the mean time the client downloaded a huge bunch of GPU workunits in 1 single shot.

I could imagine that this is cause by the way BOINC distributes work:
- initially I only processed X number of GPU workunits per hour.
- after modifying the settings I processed many more.
- Maybe it takes some time before the scheduler realizes that I can actually process much more than before.
[Apr 29, 2021 4:20:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

In the mean time the client downloaded a huge bunch of GPU workunits in 1 single shot.

I could imagine that this is cause by the way BOINC distributes work:
- initially I only processed X number of GPU workunits per hour.
- after modifying the settings I processed many more.
- Maybe it takes some time before the scheduler realizes that I can actually process much more than before.

I think Uplinger said that they used different work unit names (e.g., OPNG) for the GPU work units to avoid the problem, so maybe you can do both CPU and GPU without interference, as though they were from two separate projects.
[Apr 29, 2021 4:29:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
spRocket
Senior Cruncher
Joined: Mar 25, 2020
Post Count: 280
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Is it just the kitties' imagination, or are there a fair amount of more completed kibbles going into pending verification status today than before? Sure seems like I am seeing less valid status right after completion now.


I'm seeing more, too, but they're wingman units.
[Apr 29, 2021 4:37:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 781   Pages: 79   [ Previous Page | 40 41 42 43 44 45 46 47 48 49 | Next Page ]
[ Jump to Last Post ]
Post new Thread