| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 781
|
|
| Author |
|
|
kittyman
Advanced Cruncher Joined: May 14, 2020 Post Count: 140 Status: Offline Project Badges:
|
You just might toss some Boinc clients into panic mode, and they will start processing the new, shorter deadline WUs first................. I think we're fairly safe on that score. These tasks are so short that they make it from the back of the cache to the front in about 2.5 hours.Just meowin'. Meow Granted. But there are some awfully slow GPUs out there.....LOL. Meow Exactly. My slow GTX 660M had a cache of 18 WU's with a deadline of May 5th and 6th. It just got a new WU with a deadline of May 2. Big panic mode, and it immediately started running the one with the May 2 deadline. That was really unnecessary, because those 18 cached would have been finished by tomorrow. Boinc is not especially smart when it comes to things like this. Looks like the kitties called that one, eh? Pretty smart, them kitties. Meow! ![]() |
||
|
|
Ian-n-Steve C.
Senior Cruncher United States Joined: May 15, 2020 Post Count: 180 Status: Offline Project Badges:
|
is this a record? 1253 jobs. ran for over an hour on an RTX 2080. and after all that, it errored because the file size was too big. OpenCL device: GeForce RTX 2080 INFO:[05:02:51] End AutoDock... INFO:[05:02:52] Start AutoDock for OB3ZINC000020085894--7jji_002_mgltools--TYR380_inert.dpf(Job #1247)... OpenCL device: GeForce RTX 2080 INFO:[05:02:55] End AutoDock... INFO:[05:02:56] Start AutoDock for OB3ZINC000027705450--7jji_002_mgltools--TYR380_inert.dpf(Job #1248)... OpenCL device: GeForce RTX 2080 INFO:[05:02:58] End AutoDock... INFO:[05:02:59] Start AutoDock for OB3ZINC000001396771--7jji_002_mgltools--TYR380_inert.dpf(Job #1249)... OpenCL device: GeForce RTX 2080 INFO:[05:03:01] End AutoDock... INFO:[05:03:02] Start AutoDock for OB3ZINC000002483285--7jji_002_mgltools--TYR380_inert.dpf(Job #1250)... OpenCL device: GeForce RTX 2080 INFO:[05:03:05] End AutoDock... INFO:[05:03:06] Start AutoDock for OB3ZINC000307946267--7jji_002_mgltools--TYR380_inert.dpf(Job #1251)... OpenCL device: GeForce RTX 2080 INFO:[05:03:09] End AutoDock... INFO:[05:03:10] Start AutoDock for OB3ZINC000100229739--7jji_002_mgltools--TYR380_inert.dpf(Job #1252)... OpenCL device: GeForce RTX 2080 INFO:[05:03:12] End AutoDock... INFO:[05:03:13] Start AutoDock for OB3ZINC000064503714_1--7jji_002_mgltools--TYR380_inert.dpf(Job #1253)... OpenCL device: GeForce RTX 2080 INFO:[05:03:16] End AutoDock... INFO:Cpu time = 4109.314626 05:03:18 (1245982): called boinc_finish(0) </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>OPNG_0013370_00004_1_r303777678_0</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> </message> kind of sucks to waste over an hour on this and get nothing due to technicality (file size too big). maybe don't send out jobs this big? you'll never get the results. as a comparison, I had another one of these run on the same system, but only 1029 jobs, ran for just under an hour and it wasn't "too big". and still only got the ~1700 credit reward. OPNG_0013370_00005 so tasks that take ~1hr to run isn't worth more credits than tasks that run for ~1-2mins? something definitely needs to be looked at in terms of effort vs reward. either fix the slow running tasks or up the reward. Sorry about this...I'm going to increase the size of the file that is sent back. It was max sized with limits of like 1000 ligands in a single work unit. I will fix this up for the future but can not fix what has already happened. We are going to discuss what we can do in the future. These include limiting it 500 ligands in a work unit or having the researchers randomize the ligands that are packaged in a batch. This way single batches don't swing the expected performance between them. Thanks, -Uplinger thanks for the response. I'm not overly concerned with credit being awarded for this, I just wanted to bring it to your attention as a problematic edge case that should be addressed. I also don't really care how much credit you award these tasks, I just think it should be standardized. a task with 100 ligands and runs for 5 mins really shouldn't get the same credit reward as a task with 1000 ligands and runs for an hour. if you don't want to have variable credit reward for runtime, then standardize the WU size to some average number of ligands (with some small standard deviation) to avoid these massive swings in effort vs reward. ![]() EPYC 7V12 / [5] RTX A4000 EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060 [2] EPYC 7642 / [2] RTX 2080Ti |
||
|
|
erich56
Senior Cruncher Austria Joined: Feb 24, 2007 Post Count: 300 Status: Offline Project Badges:
|
Big panic mode, and it immediately started running the one with the May 2 deadline. That was really unnecessary, because those 18 cached would have been finished by tomorrow. Boinc is not especially smart when it comes to things like this. same thing happened here, on all of my machines |
||
|
|
davidBAM
Cruncher Joined: Aug 14, 2018 Post Count: 10 Status: Offline Project Badges:
|
Can I clarify please - are all existing 7 day deadlined WU still good for the 7 days ?
Asking for a friend |
||
|
|
kittyman
Advanced Cruncher Joined: May 14, 2020 Post Count: 140 Status: Offline Project Badges:
|
None of the deadlines for WUs in my cache before the change appear to have been affected, so I am sure that they are still good as issued.
----------------------------------------Meow ![]() |
||
|
|
Pandelta
Advanced Cruncher Joined: Jun 24, 2012 Post Count: 55 Status: Offline Project Badges:
|
I hope you all can greatly increase GPU units after the stress test and keep this going. I am highly tempted to go buy an overpriced card. From the numbers I have seen, the higher-end cards don't get you much more performance. Maybe someone here with an RTX, for example, could show what they are getting. I have a RTX 2080. I can only get to 100% GPU load if I run 16 concurrently and use 16 vCPUs to support it. Takes both to 100% virtually non-stop. I tried 12, 8, 4, 2 and 1. 16 seems to be the sweet spot but being both are almost always 100% I don't think I can get more out of it. |
||
|
|
spRocket
Senior Cruncher Joined: Mar 25, 2020 Post Count: 280 Status: Offline Project Badges:
|
I'm seeing a ton of the new short-deadline units in my queue now, but there are only eight of the old units remaining for me, and it didn't go into panic mode. I'm using 0.1 days for my "store at least" and "store up to an additional" queue sizes.
|
||
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
Yes, existing deadlines of 7 days are still good. This is for all new work units that are sent out.
Thanks, -Uplinger |
||
|
|
davidBAM
Cruncher Joined: Aug 14, 2018 Post Count: 10 Status: Offline Project Badges:
|
Thank you
|
||
|
|
Ian-n-Steve C.
Senior Cruncher United States Joined: May 15, 2020 Post Count: 180 Status: Offline Project Badges:
|
I hope you all can greatly increase GPU units after the stress test and keep this going. I am highly tempted to go buy an overpriced card. From the numbers I have seen, the higher-end cards don't get you much more performance. Maybe someone here with an RTX, for example, could show what they are getting. I have a RTX 2080. I can only get to 100% GPU load if I run 16 concurrently and use 16 vCPUs to support it. Takes both to 100% virtually non-stop. I tried 12, 8, 4, 2 and 1. 16 seems to be the sweet spot but being both are almost always 100% I don't think I can get more out of it. what a waste of resources. ![]() EPYC 7V12 / [5] RTX A4000 EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060 [2] EPYC 7642 / [2] RTX 2080Ti |
||
|
|
|