| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 781
|
|
| Author |
|
|
Chooka
Cruncher Australia Joined: Jan 25, 2017 Post Count: 49 Status: Offline Project Badges:
|
I wonder if changing my project weight to 0 might keep the work flowing and not run into this limit?
----------------------------------------The downside is, if there's not a lot of GPU work, I'll have idle pc's. My 3950X now has 100 GPU wu's and I guess that's because I have 2 GPU's in this system. The 1950X still has no GPU work. I've just culled a heap of CPU wu's. Lets see if that adds more GPU tasks. *edit - Culling the cpu tasks has allowed more GPU work. Hmm...now how to keep the GPU tasks constant. @MindCrimeZ - I've got a Radeon VII but I'm limited it to 3 wu's concurrently. I'm not keen on giving up too many CPU threads to support it. @Uplinger - Thank you for the testing. We all appreciate your time and effort and fully understand this is in its infancy. ![]() ![]() [Edit 1 times, last edit by Chooka at Apr 30, 2021 3:26:31 AM] |
||
|
|
Chooka
Cruncher Australia Joined: Jan 25, 2017 Post Count: 49 Status: Offline Project Badges:
|
wow...One of these GPU tasks has been running for 47 min running 3 concurrent tasks with a Radeon VII. Sounds a bit much?
----------------------------------------![]() ![]() |
||
|
|
Ian-n-Steve C.
Senior Cruncher United States Joined: May 15, 2020 Post Count: 180 Status: Offline Project Badges:
|
wow...One of these GPU tasks has been running for 47 min running 3 concurrent tasks with a Radeon VII. Sounds a bit much? It happens. Some of them are just really long. I have one that ran for over an hour on an RTX 2080, running only 1 task on the GPU. ![]() EPYC 7V12 / [5] RTX A4000 EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060 [2] EPYC 7642 / [2] RTX 2080Ti |
||
|
|
flynryan
Senior Cruncher United States Joined: Aug 15, 2006 Post Count: 235 Status: Offline Project Badges:
|
wow...One of these GPU tasks has been running for 47 min running 3 concurrent tasks with a Radeon VII. Sounds a bit much? Yep, are you running any other workloads on the GPU? |
||
|
|
zdnko
Senior Cruncher Joined: Dec 1, 2005 Post Count: 235 Status: Offline Project Badges:
|
First error from the beginning of stress test:
https://www.worldcommunitygrid.org/ms/device/...og.do?resultId=1662839852 - Unhandled Exception Record - |
||
|
|
erich56
Senior Cruncher Austria Joined: Feb 24, 2007 Post Count: 300 Status: Offline Project Badges:
|
yesterday evening, I noticed that quite a number of tasks were running the full time, but the "Status" finally showed "error". When clicking on "error", it shows this:
<core_client_version>7.16.11</core_client_version> <![CDATA[ <message> (unknown error) - exit code 3765269347 (0xe06d7363)</message> <stderr_txt> projects/www.worldcommunitygrid.org/wcgrid_opng_autodockgpu_7.28_windows_x86_64__opencl_nvidia_102 -jobs OPNG_0019629_00019.job -input OPNG_0019629_00019.zip -seed 40279370 -wcgruns 4550 -wcgdpf 91 INFO: Using gpu device from app init data 0 anyone any idea what kind of error code this is? |
||
|
|
Chooka
Cruncher Australia Joined: Jan 25, 2017 Post Count: 49 Status: Offline Project Badges:
|
wow...One of these GPU tasks has been running for 47 min running 3 concurrent tasks with a Radeon VII. Sounds a bit much? Yep, are you running any other workloads on the GPU? Nope, only WCG task. It might be a hardware issue. I got a AMD pop up to say there was a device hanging or something. Nevermind. ![]() ![]() |
||
|
|
zdnko
Senior Cruncher Joined: Dec 1, 2005 Post Count: 235 Status: Offline Project Badges:
|
Is it normal for a wu to be sent to a second wingman 4 days before the deadline?
Project Name: OpenPandemics - COVID-19 - GPUMaybe the deadline has remained unchanged on the wu but the server considers the new 3 day limit? |
||
|
|
goben_2003
Advanced Cruncher Joined: Jun 16, 2006 Post Count: 146 Status: Offline Project Badges:
|
The statement of keeping the GPU tasks as close to the CPU tasks is correct. This helps in multiple ways. It allows us to verify that things are working as they should without adding too many variables to the mix. These work units use the same method of starting and stopping each job (ligand) in the workunit. All that was modified in the way they were generated was I said assume it's allowed to run 20x longer than CPU. Not much else changed beyond that. Keeping the pipelines from the researchers to us and then to you similar allows for us to decrease the number of variables that we introduce into the equation of differences. Yes, there are differences in the GPU code that is not the same as CPU, but these were vetted and tested by the researchers before we took the application to grid enable it. There are multiple options that we are in discussions with the researchers about. How long it'll take to get those implemented from the WCG end is unknown. I can not promise when an updated version will be released. We have heard members commenting on the GPU version using too much IO and other complaints, such as the polar opposite of it causing them to have issues on their displays...Some members commenting on bandwidth usage, etc... The purpose of this stress test was to determine where some of the bottlenecks were in the system. We have heard the comments and suggestions about the application. We have made changes to our load balancer to help handle a lot more work units. We have identified that the small ligand files cause issues with the inodes of the filesystem filling up. All of these are stresses of the system. Some may be easily addressed, others take lots of time and effort. Releasing a new science application does not come easy and quickly as you would hope, this is distributed to thousands of people and needs to be properly vetted and tested. All of that is to say while supporting and running other application and trying to get some sleep in there. This stress test has been very exciting for us and our team. We are in constant communication with the researchers and they are also very excited about the test so far. Thank you to everyone for your help on making this a successful test. Please try to keep comments positive and helpful towards everyone in the forums and not combative. We try to make things run as best as they can, but we do not have unlimited resources. Thanks, -Uplinger This stress test has been exciting for me too! I Thank you! I am grateful for all the work that you and the rest of the team have put into it! ![]() ![]() |
||
|
|
|