Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Support Forum: GPU Support Forum Thread: GPU Work Units - Post Your Tech Support Questions Here |
No member browsing this thread |
Thread Status: Active Thread Type: Sticky Thread Total posts in this thread: 290
|
Author |
|
jjch
Cruncher Joined: Nov 15, 2013 Post Count: 7 Status: Offline Project Badges: |
OPNG jobs are failing on NVIDIA Grid K2 cards.
I have a few older HP servers with the Grid K2 cards that I was hoping to use for WCG. I just set them up with Windows server 2019 and the jobs are failing with an error status https://www.worldcommunitygrid.org/ms/device/...og.do?resultId=1912531250 The Grid K2 cards seem to support Open CL 1.2 but the latest driver I could find was version 370.41 from January 2020. NVIDIA GRID K2 (4095MB) driver: 370.41 OpenCL: 1.2 Is this driver too old for WCG GPU tasks or is there another problem I could address? Note: The cards work fine with MilkyWay@home jobs. So I was hoping there is something that can be done with WGC OPNG. |
||
|
Acibant
Advanced Cruncher USA Joined: Apr 15, 2020 Post Count: 126 Status: Offline Project Badges: |
That work unit information will go away after a period of time so it's best to paste the important bits in the thread so they are not lost. I'll immediately paste yours here so it's not lost.
----------------------------------------Result Name: OPNG_ 0084929_ 00245_ 0-- |
||
|
jjch
Cruncher Joined: Nov 15, 2013 Post Count: 7 Status: Offline Project Badges: |
Good to know the links expire. Here is another one from today. I just copied down to the beginning of the debugger as you did.
Result Name: OPNG_ 0085389_ 00135_ 0-- <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> (unknown error) - exit code 3221225477 (0xc0000005)</message> <stderr_txt> projects/www.worldcommunitygrid.org/wcgrid_opng_autodockgpu_7.28_windows_x86_64__opencl_nvidia_102 -jobs OPNG_0085389_00135.job -input OPNG_0085389_00135.zip -seed 1074566228 -wcgruns 1250 -wcgdpf 25 INFO: Using gpu device from app init data 0 INFO:[11:07:42] Start AutoGrid... autogrid4: Successful Completion. INFO:[11:10:31] End AutoGrid... INFO:[11:10:31] Start AutoDock for OB3ZINC000465878275--7jji_001--ALYS417_inert.dpf(Job #0)... OpenCL device: GRID K2 INFO:[11:12:49] End AutoDock... INFO:[11:12:49] Start AutoDock for OB3ZINC000494195197--7jji_001--ALYS417_inert.dpf(Job #1)... OpenCL device: GRID K2 INFO:[11:14:48] End AutoDock... INFO:[11:14:49] Start AutoDock for OB3ZINC000438113644--7jji_001--ALYS417_inert.dpf(Job #2)... OpenCL device: GRID K2 INFO:[11:15:41] End AutoDock... INFO:[11:15:41] Start AutoDock for OB3ZINC000544961887--7jji_001--ALYS417_inert.dpf(Job #3)... OpenCL device: GRID K2 INFO:[11:17:45] End AutoDock... INFO:[11:17:46] Start AutoDock for OB3ZINC000062343614--7jji_001--ALYS417_inert.dpf(Job #4)... OpenCL device: GRID K2 INFO:[11:22:25] End AutoDock... INFO:[11:22:25] Start AutoDock for OB3ZINC000447625606--7jji_001--ALYS417_inert.dpf(Job #5)... OpenCL device: GRID K2 INFO:[11:24:45] End AutoDock... INFO:[11:24:46] Start AutoDock for OB3ZINC000544963834--7jji_001--ALYS417_inert.dpf(Job #6)... OpenCL device: GRID K2 INFO:[11:24:57] End AutoDock... INFO:[11:24:57] Start AutoDock for OB3ZINC000555430956--7jji_001--ALYS417_inert.dpf(Job #7)... OpenCL device: GRID K2 INFO:[11:27:16] End AutoDock... INFO:[11:27:16] Start AutoDock for OB3ZINC000538161709--7jji_001--ALYS417_inert.dpf(Job #8)... OpenCL device: GRID K2 INFO:[11:28:16] End AutoDock... INFO:[11:28:16] Start AutoDock for OB3ZINC000486359533--7jji_001--ALYS417_inert.dpf(Job #9)... OpenCL device: GRID K2 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00007FF61D36DEE0 read attempt to address 0x00000000 This one appears to have run on GPU 0 and it ran up to Job 9 before failing. Just in case it may matter, each of the Grid K2 cards contain two GPU's and there are two cards in the system for a total of 4 GPU's. |
||
|
jjch
Cruncher Joined: Nov 15, 2013 Post Count: 7 Status: Offline Project Badges: |
I found a note on the GPU info page that may be a clue to why the Grid cards don't work.
Which GPUs can participate in OpenPandemics - COVID-19? GPU work units for OpenPandemics - COVID-19 are designed to run on OpenCL version 1.2 and above. However, there are certain cards that still have issues due to having GPU drivers that aren't 100% compatible with OpenCL 1.2. Most of the issues are with cards that were released before 2016. Please check our GPU forum for a list of GPUs that are known to not work. The K2 cards were released in 2013 so they may just be too old. I couldn't find the list of GPU's on the forum so if anyone knows where it is let me know. Meanwhile I will see if I can figure out the code to disable them from WCG but let them run MW. |
||
|
jjch
Cruncher Joined: Nov 15, 2013 Post Count: 7 Status: Offline Project Badges: |
FYI - If anyone is interested.
Here is the cc_config.xml file lines needed to ignore the Nvidia Grid K2 GPU's for the OPNG work units. <cc_config> <options> <exclude_gpu> <url>http://www.worldcommunitygrid.org/</url> <app>opng</app> </exclude_gpu> </options> </cc_config> Note: The original file contains tabs to line up the tags for clarity. For additional info refer to the BOINC client configuration document here: https://boinc.berkeley.edu/wiki/Client_configuration |
||
|
leloft
Cruncher Joined: Jun 8, 2017 Post Count: 23 Status: Offline Project Badges: |
I couldn't find the list of GPU's on the forum so if anyone knows where it is let me know. This would be helpful: I am getting 'Too Late' failures because the results are being returned after between 0.25h and 0.41h (this is between 15 and 25 mins, correct?), but I am also getting 'Valids' after 0.34-0.38h. The 'Too Late' deadline seems overly restrictive as the machine in question is headless and so the graphics card (Quadro K5200) is always free and could return any available OPNG units 24/7 even if it does take 25 mins per unit. How does this compare with a newer card that processes an OPNG unit in <10 mins but does not allow work while computer in use? Does this result in a situation where any OPNG unit being processed on that machine will get a 'Too Late' because the computer was used during that time? Could anyone explain the difference between a 15 minute 'Too Late' deadline and a 23 minute 'Valid' so that I can deploy the GPU more effectively? Many thanks |
||
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 325 Status: Offline Project Badges: |
"too late" status is a catch-all status which is used when no other status is available.
The definition provided is: Too Late - The result was returned to the server a longer time after it was due. Occasionally a result previously marked Pending Validation has the distribution stopped due to too many errors, without a complete quorum [max errors varies per science]. The non-error results are then converted to the status Too Late. Credit is granted as claimed [with delay]. Internally these task results are moved to a take-out list, for later review. Also see the Pending Validation status. In simple terms if BOINC decides that a quorum can never be satisfied then the status is set to 'too late' . |
||
|
leloft
Cruncher Joined: Jun 8, 2017 Post Count: 23 Status: Offline Project Badges: |
Thank you. That is something that I could never have guessed at!
|
||
|
biscotto
Cruncher Italy Joined: Apr 11, 2020 Post Count: 27 Status: Offline Project Badges: |
I haven't received a new gpu task in a while, and the log event page says only
----------------------------------------Requesting new tasks for CPU and not for GPU as it said before. GPU is correctly found by boinc because before that i getOpenCL: AMD/ATI GPU 1: Radeon RX 560 Series (driver version ..., driver version OpenCL 2.0 AMD-APP, ...) I have made sure all the buttons to get gpu work in the preferences are on (boinc says web preferences are correctly applied). What can i do? Thanks in advance.Papa Ryzen 5 3600 / Mama Radeon RX 560 |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2094 Status: Offline Project Badges: |
I haven't received a new gpu task in a while, and the log event page says only Requesting new tasks for CPU and not for GPU as it said before. GPU is correctly found by boinc because before that i getOpenCL: AMD/ATI GPU 1: Radeon RX 560 Series (driver version ..., driver version OpenCL 2.0 AMD-APP, ...) I have made sure all the buttons to get gpu work in the preferences are on (boinc says web preferences are correctly applied). What can i do? Thanks in advance.GPU crunching is suspended. No GPU work is being sent out. See: https://www.worldcommunitygrid.org/forums/wcg...d,43361_offset,140#665509 |
||
|
|