| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 781
|
|
| Author |
|
|
kittyman
Advanced Cruncher Joined: May 14, 2020 Post Count: 140 Status: Offline Project Badges:
|
Is it just the kitties' imagination, or are there a fair amount of more completed kibbles going into pending verification status today than before? Sure seems like I am seeing less valid status right after completion now.
----------------------------------------Meow? ![]() |
||
|
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2511 Status: Offline Project Badges:
|
I think the validator is falling behind now Kittyman.
|
||
|
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges:
|
@nanoprobe - How many consecutive wu's are you running with your AMD cards? I'm currently only running 1 wu at a time with my R9 280x. I've done a fair bit of trolling through the stats and about 98% of those stats I looked at were NVIDIA cards. Curious to hear what other Radeon cards are being used and the results seen. For the smaller 4 digit tasks I ran 8 concurrently on my R9 280x. With these larger tasks I dialed it back 1 task at a time to see what was optimum. On the larger 5 digit tasks I found that 4 concurrently worked best for me. It would still run more but there was no run time advantage with more than 4. I will adjust accordingly if necessary after the stress test is done and we go back to the previous distribution mode.
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
----------------------------------------![]() ![]() [Edit 1 times, last edit by nanoprobe at Apr 29, 2021 3:34:16 PM] |
||
|
|
kittyman
Advanced Cruncher Joined: May 14, 2020 Post Count: 140 Status: Offline Project Badges:
|
I think the validator is falling behind now Kittyman. And if I recall, somebody said that WCG doesn't have a server status page like Seti did to check up on that? Meow? ![]() |
||
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
is this a record? 1253 jobs. ran for over an hour on an RTX 2080. and after all that, it errored because the file size was too big. OpenCL device: GeForce RTX 2080 INFO:[05:02:51] End AutoDock... INFO:[05:02:52] Start AutoDock for OB3ZINC000020085894--7jji_002_mgltools--TYR380_inert.dpf(Job #1247)... OpenCL device: GeForce RTX 2080 INFO:[05:02:55] End AutoDock... INFO:[05:02:56] Start AutoDock for OB3ZINC000027705450--7jji_002_mgltools--TYR380_inert.dpf(Job #1248)... OpenCL device: GeForce RTX 2080 INFO:[05:02:58] End AutoDock... INFO:[05:02:59] Start AutoDock for OB3ZINC000001396771--7jji_002_mgltools--TYR380_inert.dpf(Job #1249)... OpenCL device: GeForce RTX 2080 INFO:[05:03:01] End AutoDock... INFO:[05:03:02] Start AutoDock for OB3ZINC000002483285--7jji_002_mgltools--TYR380_inert.dpf(Job #1250)... OpenCL device: GeForce RTX 2080 INFO:[05:03:05] End AutoDock... INFO:[05:03:06] Start AutoDock for OB3ZINC000307946267--7jji_002_mgltools--TYR380_inert.dpf(Job #1251)... OpenCL device: GeForce RTX 2080 INFO:[05:03:09] End AutoDock... INFO:[05:03:10] Start AutoDock for OB3ZINC000100229739--7jji_002_mgltools--TYR380_inert.dpf(Job #1252)... OpenCL device: GeForce RTX 2080 INFO:[05:03:12] End AutoDock... INFO:[05:03:13] Start AutoDock for OB3ZINC000064503714_1--7jji_002_mgltools--TYR380_inert.dpf(Job #1253)... OpenCL device: GeForce RTX 2080 INFO:[05:03:16] End AutoDock... INFO:Cpu time = 4109.314626 05:03:18 (1245982): called boinc_finish(0) </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>OPNG_0013370_00004_1_r303777678_0</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> </message> kind of sucks to waste over an hour on this and get nothing due to technicality (file size too big). maybe don't send out jobs this big? you'll never get the results. as a comparison, I had another one of these run on the same system, but only 1029 jobs, ran for just under an hour and it wasn't "too big". and still only got the ~1700 credit reward. OPNG_0013370_00005 so tasks that take ~1hr to run isn't worth more credits than tasks that run for ~1-2mins? something definitely needs to be looked at in terms of effort vs reward. either fix the slow running tasks or up the reward. Sorry about this...I'm going to increase the size of the file that is sent back. It was max sized with limits of like 1000 ligands in a single work unit. I will fix this up for the future but can not fix what has already happened. We are going to discuss what we can do in the future. These include limiting it 500 ligands in a work unit or having the researchers randomize the ligands that are packaged in a batch. This way single batches don't swing the expected performance between them. Thanks, -Uplinger |
||
|
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges:
|
I'm experiencing some strange behaviour after modifying the app_config file. I forced BOINC to run up to 8 GPU workunits in parallel: <gpu_usage>0.125</gpu_usage> <cpu_usage>0.25</cpu_usage> This works absolutely fine. I run both GPU and CPU workunits and my GPU and CPU are able to process that many in parallel. This obviously has a dramatic effect on throughput. However the BOINC client is not able to fetch GPU workunits anymore. It tries to fetch both CPU and GPU workunits but only receives CPU workunits. Anybody who experienced the same? That is very likely the old BOINC problem that the scheduler gets confused when you try to run both CPU and GPU work units from the same project. It has something to do with the "duration correction factor" (DCF) as I recall. You have the same problem on Einstein or MilkyWay when you try to run both CPU and GPU. It is as old as the hills. Maybe a BOINC expert (are you there Richard?) can illuminate it further. [Edit 1 times, last edit by Jim1348 at Apr 29, 2021 4:08:42 PM] |
||
|
|
erich56
Senior Cruncher Austria Joined: Feb 24, 2007 Post Count: 300 Status: Offline Project Badges:
|
... I will fix this up for the future but can not fix what has already happened. We are going to discuss what we can do in the future. These include limiting it 500 ligands in a work unit or having the researchers randomize the ligands that are packaged in a batch. This way single batches don't swing the expected performance between them. Thanks, -Uplinger will you also address the topic/problem of excessive SSD use as discussed here a few days ago? I guess the suggestion was to use more RAM instead. |
||
|
|
m0320174
Cruncher Joined: Feb 13, 2021 Post Count: 11 Status: Offline Project Badges:
|
In the mean time the client downloaded a huge bunch of GPU workunits in 1 single shot.
I could imagine that this is cause by the way BOINC distributes work: - initially I only processed X number of GPU workunits per hour. - after modifying the settings I processed many more. - Maybe it takes some time before the scheduler realizes that I can actually process much more than before. |
||
|
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges:
|
In the mean time the client downloaded a huge bunch of GPU workunits in 1 single shot. I could imagine that this is cause by the way BOINC distributes work: - initially I only processed X number of GPU workunits per hour. - after modifying the settings I processed many more. - Maybe it takes some time before the scheduler realizes that I can actually process much more than before. I think Uplinger said that they used different work unit names (e.g., OPNG) for the GPU work units to avoid the problem, so maybe you can do both CPU and GPU without interference, as though they were from two separate projects. |
||
|
|
spRocket
Senior Cruncher Joined: Mar 25, 2020 Post Count: 280 Status: Offline Project Badges:
|
Is it just the kitties' imagination, or are there a fair amount of more completed kibbles going into pending verification status today than before? Sure seems like I am seeing less valid status right after completion now. I'm seeing more, too, but they're wingman units. |
||
|
|
|