Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Thread Type: Sticky Thread
Total posts in this thread: 290
Posts: 290   Pages: 29   [ Previous Page | 14 15 16 17 18 19 20 21 22 23 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 555784 times and has 289 replies Next Thread
jjch
Cruncher
Joined: Nov 15, 2013
Post Count: 7
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work Units - Post Your Tech Support Questions Here

OPNG jobs are failing on NVIDIA Grid K2 cards.

I have a few older HP servers with the Grid K2 cards that I was hoping to use for WCG.

I just set them up with Windows server 2019 and the jobs are failing with an error status https://www.worldcommunitygrid.org/ms/device/...og.do?resultId=1912531250

The Grid K2 cards seem to support Open CL 1.2 but the latest driver I could find was version 370.41 from January 2020.

NVIDIA GRID K2 (4095MB) driver: 370.41 OpenCL: 1.2

Is this driver too old for WCG GPU tasks or is there another problem I could address?

Note: The cards work fine with MilkyWay@home jobs. So I was hoping there is something that can be done with WGC OPNG.
[Sep 12, 2021 11:11:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Acibant
Advanced Cruncher
USA
Joined: Apr 15, 2020
Post Count: 126
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work Units - Post Your Tech Support Questions Here

That work unit information will go away after a period of time so it's best to paste the important bits in the thread so they are not lost. I'll immediately paste yours here so it's not lost.
Result Name: OPNG_ 0084929_ 00245_ 0--

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 3221225477 (0xc0000005)</message>
<stderr_txt>
projects/www.worldcommunitygrid.org/wcgrid_opng_autodockgpu_7.28_windows_x86_64__opencl_nvidia_102 -jobs OPNG_0084929_00245.job -input OPNG_0084929_00245.zip -seed 875643505 -wcgruns 1200 -wcgdpf 24
INFO: Using gpu device from app init data 1
INFO:[15:45:28] Start AutoGrid...

autogrid4: Successful Completion.
INFO:[15:48:19] End AutoGrid...
INFO:[15:48:20] Start AutoDock for OB3ZINC000533432178--7jji_001--ALYS417_inert.dpf(Job #0)...
OpenCL device: GRID K2
INFO:[15:49:35] End AutoDock...
INFO:[15:49:36] Start AutoDock for OB3ZINC000532243319--7jji_001--ALYS417_inert.dpf(Job #1)...
OpenCL device: GRID K2
INFO:[15:51:41] End AutoDock...
INFO:[15:51:41] Start AutoDock for OB3ZINC000027000823--7jji_001--ALYS417_inert.dpf(Job #2)...
OpenCL device: GRID K2
INFO:[15:54:07] End AutoDock...
INFO:[15:54:08] Start AutoDock for OB3ZINC000486393381--7jji_001--ALYS417_inert.dpf(Job #3)...
OpenCL device: GRID K2


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF61D36DEE0 read attempt to address 0x00000000

----------------------------------------

[Sep 13, 2021 12:29:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jjch
Cruncher
Joined: Nov 15, 2013
Post Count: 7
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work Units - Post Your Tech Support Questions Here

Good to know the links expire. Here is another one from today. I just copied down to the beginning of the debugger as you did.



Result Name: OPNG_ 0085389_ 00135_ 0--


<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 3221225477 (0xc0000005)</message>
<stderr_txt>
projects/www.worldcommunitygrid.org/wcgrid_opng_autodockgpu_7.28_windows_x86_64__opencl_nvidia_102 -jobs OPNG_0085389_00135.job -input OPNG_0085389_00135.zip -seed 1074566228 -wcgruns 1250 -wcgdpf 25
INFO: Using gpu device from app init data 0
INFO:[11:07:42] Start AutoGrid...

autogrid4: Successful Completion.
INFO:[11:10:31] End AutoGrid...
INFO:[11:10:31] Start AutoDock for OB3ZINC000465878275--7jji_001--ALYS417_inert.dpf(Job #0)...
OpenCL device: GRID K2
INFO:[11:12:49] End AutoDock...
INFO:[11:12:49] Start AutoDock for OB3ZINC000494195197--7jji_001--ALYS417_inert.dpf(Job #1)...
OpenCL device: GRID K2
INFO:[11:14:48] End AutoDock...
INFO:[11:14:49] Start AutoDock for OB3ZINC000438113644--7jji_001--ALYS417_inert.dpf(Job #2)...
OpenCL device: GRID K2
INFO:[11:15:41] End AutoDock...
INFO:[11:15:41] Start AutoDock for OB3ZINC000544961887--7jji_001--ALYS417_inert.dpf(Job #3)...
OpenCL device: GRID K2
INFO:[11:17:45] End AutoDock...
INFO:[11:17:46] Start AutoDock for OB3ZINC000062343614--7jji_001--ALYS417_inert.dpf(Job #4)...
OpenCL device: GRID K2
INFO:[11:22:25] End AutoDock...
INFO:[11:22:25] Start AutoDock for OB3ZINC000447625606--7jji_001--ALYS417_inert.dpf(Job #5)...
OpenCL device: GRID K2
INFO:[11:24:45] End AutoDock...
INFO:[11:24:46] Start AutoDock for OB3ZINC000544963834--7jji_001--ALYS417_inert.dpf(Job #6)...
OpenCL device: GRID K2
INFO:[11:24:57] End AutoDock...
INFO:[11:24:57] Start AutoDock for OB3ZINC000555430956--7jji_001--ALYS417_inert.dpf(Job #7)...
OpenCL device: GRID K2
INFO:[11:27:16] End AutoDock...
INFO:[11:27:16] Start AutoDock for OB3ZINC000538161709--7jji_001--ALYS417_inert.dpf(Job #8)...
OpenCL device: GRID K2
INFO:[11:28:16] End AutoDock...
INFO:[11:28:16] Start AutoDock for OB3ZINC000486359533--7jji_001--ALYS417_inert.dpf(Job #9)...
OpenCL device: GRID K2


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF61D36DEE0 read attempt to address 0x00000000

This one appears to have run on GPU 0 and it ran up to Job 9 before failing.

Just in case it may matter, each of the Grid K2 cards contain two GPU's and there are two cards in the system for a total of 4 GPU's.
[Sep 14, 2021 2:59:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
jjch
Cruncher
Joined: Nov 15, 2013
Post Count: 7
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work Units - Post Your Tech Support Questions Here

I found a note on the GPU info page that may be a clue to why the Grid cards don't work.

Which GPUs can participate in OpenPandemics - COVID-19?
GPU work units for OpenPandemics - COVID-19 are designed to run on OpenCL version 1.2 and above. However, there are certain cards that still have issues due to having GPU drivers that aren't 100% compatible with OpenCL 1.2. Most of the issues are with cards that were released before 2016. Please check our GPU forum for a list of GPUs that are known to not work.

The K2 cards were released in 2013 so they may just be too old.

I couldn't find the list of GPU's on the forum so if anyone knows where it is let me know.

Meanwhile I will see if I can figure out the code to disable them from WCG but let them run MW.
[Sep 15, 2021 3:31:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
jjch
Cruncher
Joined: Nov 15, 2013
Post Count: 7
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work Units - Post Your Tech Support Questions Here

FYI - If anyone is interested.

Here is the cc_config.xml file lines needed to ignore the Nvidia Grid K2 GPU's for the OPNG work units.

<cc_config>
<options>
<exclude_gpu>
<url>http://www.worldcommunitygrid.org/</url>
<app>opng</app>
</exclude_gpu>
</options>
</cc_config>

Note: The original file contains tabs to line up the tags for clarity.

For additional info refer to the BOINC client configuration document here: https://boinc.berkeley.edu/wiki/Client_configuration
[Sep 16, 2021 3:56:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
leloft
Cruncher
Joined: Jun 8, 2017
Post Count: 23
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work Units - Post Your Tech Support Questions Here

I couldn't find the list of GPU's on the forum so if anyone knows where it is let me know.

This would be helpful: I am getting 'Too Late' failures because the results are being returned after between 0.25h and 0.41h (this is between 15 and 25 mins, correct?), but I am also getting 'Valids' after 0.34-0.38h. The 'Too Late' deadline seems overly restrictive as the machine in question is headless and so the graphics card (Quadro K5200) is always free and could return any available OPNG units 24/7 even if it does take 25 mins per unit. How does this compare with a newer card that processes an OPNG unit in <10 mins but does not allow work while computer in use? Does this result in a situation where any OPNG unit being processed on that machine will get a 'Too Late' because the computer was used during that time?
Could anyone explain the difference between a 15 minute 'Too Late' deadline and a 23 minute 'Valid' so that I can deploy the GPU more effectively?
Many thanks
[Sep 19, 2021 10:20:30 AM]   Link   Report threatening or abusive post: please login first  Go to top 
ca05065
Senior Cruncher
Joined: Dec 4, 2007
Post Count: 325
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work Units - Post Your Tech Support Questions Here

"too late" status is a catch-all status which is used when no other status is available.

The definition provided is:
Too Late - The result was returned to the server a longer time after it was due. Occasionally a result previously marked Pending Validation has the distribution stopped due to too many errors, without a complete quorum [max errors varies per science]. The non-error results are then converted to the status Too Late. Credit is granted as claimed [with delay]. Internally these task results are moved to a take-out list, for later review. Also see the Pending Validation status.

In simple terms if BOINC decides that a quorum can never be satisfied then the status is set to 'too late' .
[Sep 19, 2021 10:55:53 AM]   Link   Report threatening or abusive post: please login first  Go to top 
leloft
Cruncher
Joined: Jun 8, 2017
Post Count: 23
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work Units - Post Your Tech Support Questions Here

Thank you. That is something that I could never have guessed at!
[Sep 19, 2021 11:39:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
biscotto
Cruncher
Italy
Joined: Apr 11, 2020
Post Count: 27
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work Units - Post Your Tech Support Questions Here

I haven't received a new gpu task in a while, and the log event page says only
Requesting new tasks for CPU
and not for GPU as it said before. GPU is correctly found by boinc because before that i get
OpenCL: AMD/ATI GPU 1: Radeon RX 560 Series (driver version ..., driver version OpenCL 2.0 AMD-APP, ...)
I have made sure all the buttons to get gpu work in the preferences are on (boinc says web preferences are correctly applied). What can i do? Thanks in advance.
----------------------------------------
Papa Ryzen 5 3600 / Mama Radeon RX 560

[Sep 20, 2021 6:25:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2094
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work Units - Post Your Tech Support Questions Here

I haven't received a new gpu task in a while, and the log event page says only
Requesting new tasks for CPU
and not for GPU as it said before. GPU is correctly found by boinc because before that i get
OpenCL: AMD/ATI GPU 1: Radeon RX 560 Series (driver version ..., driver version OpenCL 2.0 AMD-APP, ...)
I have made sure all the buttons to get gpu work in the preferences are on (boinc says web preferences are correctly applied). What can i do? Thanks in advance.

GPU crunching is suspended. No GPU work is being sent out.
See: https://www.worldcommunitygrid.org/forums/wcg...d,43361_offset,140#665509
[Sep 20, 2021 6:32:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 290   Pages: 29   [ Previous Page | 14 15 16 17 18 19 20 21 22 23 | Next Page ]
[ Jump to Last Post ]
Post new Thread