Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 781
Posts: 781   Pages: 79   [ Previous Page | 42 43 44 45 46 47 48 49 50 51 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 946633 times and has 780 replies Next Thread
kittyman
Advanced Cruncher
Joined: May 14, 2020
Post Count: 140
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

You just might toss some Boinc clients into panic mode, and they will start processing the new, shorter deadline WUs first.................
Just meowin'.

Meow
I think we're fairly safe on that score. These tasks are so short that they make it from the back of the cache to the front in about 2.5 hours.

Granted. But there are some awfully slow GPUs out there.....LOL.

Meow

Exactly. My slow GTX 660M had a cache of 18 WU's with a deadline of May 5th and 6th. It just got a new WU with a deadline of May 2. Big panic mode, and it immediately started running the one with the May 2 deadline. That was really unnecessary, because those 18 cached would have been finished by tomorrow. Boinc is not especially smart when it comes to things like this.

Looks like the kitties called that one, eh?
Pretty smart, them kitties.

Meow!
----------------------------------------

[Apr 29, 2021 6:04:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ian-n-Steve C.
Senior Cruncher
United States
Joined: May 15, 2020
Post Count: 180
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

is this a record? 1253 jobs. ran for over an hour on an RTX 2080. and after all that, it errored because the file size was too big.

OpenCL device: GeForce RTX 2080
INFO:[05:02:51] End AutoDock...
INFO:[05:02:52] Start AutoDock for OB3ZINC000020085894--7jji_002_mgltools--TYR380_inert.dpf(Job #1247)...
OpenCL device: GeForce RTX 2080
INFO:[05:02:55] End AutoDock...
INFO:[05:02:56] Start AutoDock for OB3ZINC000027705450--7jji_002_mgltools--TYR380_inert.dpf(Job #1248)...
OpenCL device: GeForce RTX 2080
INFO:[05:02:58] End AutoDock...
INFO:[05:02:59] Start AutoDock for OB3ZINC000001396771--7jji_002_mgltools--TYR380_inert.dpf(Job #1249)...
OpenCL device: GeForce RTX 2080
INFO:[05:03:01] End AutoDock...
INFO:[05:03:02] Start AutoDock for OB3ZINC000002483285--7jji_002_mgltools--TYR380_inert.dpf(Job #1250)...
OpenCL device: GeForce RTX 2080
INFO:[05:03:05] End AutoDock...
INFO:[05:03:06] Start AutoDock for OB3ZINC000307946267--7jji_002_mgltools--TYR380_inert.dpf(Job #1251)...
OpenCL device: GeForce RTX 2080
INFO:[05:03:09] End AutoDock...
INFO:[05:03:10] Start AutoDock for OB3ZINC000100229739--7jji_002_mgltools--TYR380_inert.dpf(Job #1252)...
OpenCL device: GeForce RTX 2080
INFO:[05:03:12] End AutoDock...
INFO:[05:03:13] Start AutoDock for OB3ZINC000064503714_1--7jji_002_mgltools--TYR380_inert.dpf(Job #1253)...
OpenCL device: GeForce RTX 2080
INFO:[05:03:16] End AutoDock...
INFO:Cpu time = 4109.314626
05:03:18 (1245982): called boinc_finish(0)

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>OPNG_0013370_00004_1_r303777678_0</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>
</message>


kind of sucks to waste over an hour on this and get nothing due to technicality (file size too big). maybe don't send out jobs this big? you'll never get the results.

as a comparison, I had another one of these run on the same system, but only 1029 jobs, ran for just under an hour and it wasn't "too big". and still only got the ~1700 credit reward.
OPNG_0013370_00005

so tasks that take ~1hr to run isn't worth more credits than tasks that run for ~1-2mins? something definitely needs to be looked at in terms of effort vs reward. either fix the slow running tasks or up the reward.


Sorry about this...I'm going to increase the size of the file that is sent back. It was max sized with limits of like 1000 ligands in a single work unit. I will fix this up for the future but can not fix what has already happened. We are going to discuss what we can do in the future. These include limiting it 500 ligands in a work unit or having the researchers randomize the ligands that are packaged in a batch. This way single batches don't swing the expected performance between them.

Thanks,
-Uplinger


thanks for the response. I'm not overly concerned with credit being awarded for this, I just wanted to bring it to your attention as a problematic edge case that should be addressed.

I also don't really care how much credit you award these tasks, I just think it should be standardized. a task with 100 ligands and runs for 5 mins really shouldn't get the same credit reward as a task with 1000 ligands and runs for an hour. if you don't want to have variable credit reward for runtime, then standardize the WU size to some average number of ligands (with some small standard deviation) to avoid these massive swings in effort vs reward.
----------------------------------------

EPYC 7V12 / [5] RTX A4000
EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti
EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060
[2] EPYC 7642 / [2] RTX 2080Ti
[Apr 29, 2021 6:10:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 300
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Big panic mode, and it immediately started running the one with the May 2 deadline. That was really unnecessary, because those 18 cached would have been finished by tomorrow. Boinc is not especially smart when it comes to things like this.
same thing happened here, on all of my machines
[Apr 29, 2021 6:11:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
davidBAM
Cruncher
Joined: Aug 14, 2018
Post Count: 10
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Can I clarify please - are all existing 7 day deadlined WU still good for the 7 days ?

Asking for a friend
[Apr 29, 2021 6:19:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
kittyman
Advanced Cruncher
Joined: May 14, 2020
Post Count: 140
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

None of the deadlines for WUs in my cache before the change appear to have been affected, so I am sure that they are still good as issued.

Meow
----------------------------------------

[Apr 29, 2021 6:22:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Pandelta
Advanced Cruncher
Joined: Jun 24, 2012
Post Count: 55
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

I hope you all can greatly increase GPU units after the stress test and keep this going. I am highly tempted to go buy an overpriced card.

From the numbers I have seen, the higher-end cards don't get you much more performance. Maybe someone here with an RTX, for example, could show what they are getting.


I have a RTX 2080. I can only get to 100% GPU load if I run 16 concurrently and use 16 vCPUs to support it. Takes both to 100% virtually non-stop. I tried 12, 8, 4, 2 and 1. 16 seems to be the sweet spot but being both are almost always 100% I don't think I can get more out of it.
[Apr 29, 2021 6:39:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
spRocket
Senior Cruncher
Joined: Mar 25, 2020
Post Count: 280
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

I'm seeing a ton of the new short-deadline units in my queue now, but there are only eight of the old units remaining for me, and it didn't go into panic mode. I'm using 0.1 days for my "store at least" and "store up to an additional" queue sizes.
[Apr 29, 2021 6:52:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Yes, existing deadlines of 7 days are still good. This is for all new work units that are sent out.

Thanks,
-Uplinger
[Apr 29, 2021 6:53:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
davidBAM
Cruncher
Joined: Aug 14, 2018
Post Count: 10
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Thank you
[Apr 29, 2021 6:58:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ian-n-Steve C.
Senior Cruncher
United States
Joined: May 15, 2020
Post Count: 180
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

I hope you all can greatly increase GPU units after the stress test and keep this going. I am highly tempted to go buy an overpriced card.

From the numbers I have seen, the higher-end cards don't get you much more performance. Maybe someone here with an RTX, for example, could show what they are getting.


I have a RTX 2080. I can only get to 100% GPU load if I run 16 concurrently and use 16 vCPUs to support it. Takes both to 100% virtually non-stop. I tried 12, 8, 4, 2 and 1. 16 seems to be the sweet spot but being both are almost always 100% I don't think I can get more out of it.


what a waste of resources.
----------------------------------------

EPYC 7V12 / [5] RTX A4000
EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti
EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060
[2] EPYC 7642 / [2] RTX 2080Ti
[Apr 29, 2021 7:03:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 781   Pages: 79   [ Previous Page | 42 43 44 45 46 47 48 49 50 51 | Next Page ]
[ Jump to Last Post ]
Post new Thread