Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ] |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 162
|
Author |
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1258 Status: Offline Project Badges: |
I currently have no work like the majority of people. I am interested to know whether or not anybody noticed the GPU load %? On my RTX 2070 under Windows my tasks seem to jump between 30% 50% and 100%. I am assuming this is because they process the jobs at such a speed that it is impossible to stay at any% for very long before moving on to the next job within the task. For my load percentages I used GPU Z 2.36
---------------------------------------- |
||
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: |
My experience with 2 of i7-3770K HD4000 iGPU, Win 7 x64, HD4000 driver 10.18.10.5161
----------------------------------------All 3 GPU betas that started ran the initial AutoGrid part (on CPU?). One then ran just one GPU sub-task and then hung, the other 2 hung without executing any GPU work - according to the WCG online log files. One WU was server-aborted with a time-limit error, while the other 2 were user-aborted. This behaviour may be caused by a problem similar to that found here by Crystal Pellet with his old AMD laptop. ------ More details: WUs that started and pretended to run on the i7-3770K HD4000 iGPU, Win 7 x64, HD4000 driver 10.18.10.5161, on 2 machines: _ BETA_OPN1_0020015_00040_3 ==> error - exceeded time limit - elapsed time (ET) 7.36h - Device: Deimos _ _ This WU ran one GPU sub-job, taking 1m55s, then the task hung _ _ until it got a breakpoint error, probably when resuming from device sleep. _ BETA_OPNG_0021028_00090_3 ==> user-aborted after ET = 18h - Device: Callisto _ BETA_OPNG_0021025_00285_3 ==> user-aborted after ET = 17.7h - Device: Deimos Other received beta WUs were user-aborted before they started. ------ Here is the main part of the log file from BETA_OPNG_0021028_00090_3: _ <core_client_version>7.16.11</core_client_version> _ <![CDATA[ _ <message> _ aborted by user</message> _ <stderr_txt> _ projects/www.worldcommunitygrid.org/wcgrid_beta29_autodockgpu_7.25_windows_x86_64__opencl_intel_gpu_102 -jobs OPNG_0021028_00090.job -input OPNG_0021028_00090.zip -seed 44376090 -wcgruns 1600 -wcgdpf 32 _ INFO: Using gpu device from app init data 0 _ INFO:[20:29:34] Start AutoGrid... _ autogrid4: Successful Completion. _ INFO:[20:30:29] End AutoGrid... _ INFO:[20:30:30] Start AutoDock for ZINC001297736733_RX1--6y84_001_gln110-rot--CYS156.dpf(Job #0)... _ OpenCL device: Intel(R) HD Graphics 4000 _ </stderr_txt> ------ So in the log file above, no AutoDock runs completed. The WU ran the AutoGrid part, but then just sat in memory, apparently doing nothing. The snippet of the log file for the GPU run in BETA_OPN1_0020015_00040_3 is: _ INFO:[03:53:22] Start AutoDock for ZINC000255623610_RX1--6lu7_001--CYS145_wcgsplit2.dpf(Job #0)... _ OpenCL device: Intel(R) HD Graphics 4000 _ INFO:[03:55:17] End AutoDock... ------ General information that other members may find useful: BOINC Client startup messages when the Intel HD4000 GPU is detected: _ OpenCL: Intel GPU 0: Intel(R) HD Graphics 4000 (driver version 10.18.10.5161, device version OpenCL 1.2, 1195MB, 1195MB available, 147 GFLOPS peak) _ 02-Mar-2021 22:38:34 [---] OpenCL CPU: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 3.0.1.10891, device version OpenCL 1.2 (Build 76427)) If your setup is recognised by BOINC and your WCG Device Profile allows GPU work on your GPU(s), each time your client tries to fetch new work you should see: _ Requesting new tasks for CPU and Intel GPU If you have the cc_config.xml flag coproc_debug set, these 2 lines will appear in your event log (stdoutdae.txt) once per minute while the GPU task exists: _ [coproc] intel_gpu instance 0; 1.000000 pending for BETA_OPN1_0020015_00040_3 _ [coproc] intel_gpu instance 0: confirming 1.000000 instance for BETA_OPN1_0020015_00040_3 ------ Questions arising: The log file snippet above that shows the single AutoDock job completed indicates that the HD4000 is capable of running AutoDock4, but only sometimes. * Why does it not run for every instance? Is a higher GPU RAM allocation, and/or higher GPU voltage needed? * Will the debug messages in the upcoming v1.26 betas give more info on why these WUs hung? ---------- Other Info: The HD4000 iGPUs successfully run all of the OpenCL demo programs accessible via GPU Caps Viewer >> 3D Demos. ---------- Conclusion: The HD4000 GPU may be too slow to warrant putting much effort into getting it crunching OPN1-GPU Wus. However, getting it actually working may provide insights into solving problems with other GPUs. Information gained by me in getting it this far will be useful setting up and running a discrete GPU. - HTH - Rick - [Edit 1 times, last edit by Rickjb at Mar 5, 2021 6:51:05 AM] |
||
|
bozz4science
Advanced Cruncher Germany Joined: May 3, 2020 Post Count: 104 Status: Offline Project Badges: |
I currently have no work like the majority of people. I am interested to know whether or not anybody noticed the GPU load %? On my RTX 2070 under Windows my tasks seem to jump between 30% 50% and 100%. I am assuming this is because they process the jobs at such a speed that it is impossible to stay at any% for very long before moving on to the next job within the task. For my load percentages I used GPU Z 2.36 I noticed sth similar and my intuition to interpreting these sudden jumps in GPU util. is very much the same as yours. See parts of one of my earlier posts on 2nd March. Anyone tried so far running multiple GPU WUs concurenntly on the same GPU? Was wondering if you can increase WU output by forcing the GPU to hold the GPU load more constantly on a high level instead of these short bursts up to 100% and then back to 0%. [...] However, due to the inherent nature of these WUs, the GPUs' VRMs are getting kicked hard. They continiously have to adjust the voltage of the GPU chip up and down according to the short intensive bursts of the computations. For the 1660 Super voltage was all over the place. AMD Ryzen 3700X @ 4.0 GHz / GTX1660S Intel i5-4278U CPU @ 2.60GHz [Edit 1 times, last edit by bozz4science at Mar 5, 2021 10:37:15 AM] |
||
|
koschi
Cruncher Joined: Dec 16, 2007 Post Count: 5 Status: Offline Project Badges: |
Put this into an app_config.xml in your WCG project directory:
<app_config> |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: |
Good morning/evening to everyone.
I am moving batches up to production now and moving version 7.26 to the production environment. I am planning on starting the next round of beta in the next few hours. Note: it will have a new thread for it as it is updated application. Thanks, -Uplinger |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2068 Status: Recently Active Project Badges: |
Thanks Uplinger. I'm starting my GPU cruncher. (iGPU HD4600 + GTX980 Strix)
Not the most modern computer, but when retired one can't upgrade all the time. |
||
|
Richard Haselgrove
Senior Cruncher United Kingdom Joined: Feb 19, 2021 Post Count: 360 Status: Offline Project Badges: |
@ Uplinger,
Are you moving all variants to production at the same time? I think there are some issues still with the intel_gpu OpenCL variant which haven't been answered yet. Rickjb's comments earlier today match my experience with a pair of HD 4600s right at the beginning. Other users have described excessive runtimes and progress display which are consistent with the pseudo-progress generated by BOINC before the first checkpoint. I'd suggest holding the intel_gpu version back for further examination. |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2068 Status: Recently Active Project Badges: |
All tasks for my Intel HD4600 worked perfect. The only strange thing was that the elapsed times since last checkpoint went up to a count of several hours through the run (much higher than the actual runtime), despite it having checkpointed (and checkpoints counted) for each ligand.
----------------------------------------This watched through BoincTasks, which gives more info easily seen, than Boinc Manager. No excessive runtimes either for my HD4600 Edit: Example here: https://www.worldcommunitygrid.org/ms/device/...og.do?resultId=1532147303 Edit: I run the HD4600 with a pretty old Driver: 10.18.10.3907 (if it ain't broke don't fix it) [Edit 3 times, last edit by Grumpy Swede at Mar 5, 2021 5:33:38 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
----CENSORSHIP ALERT----
User Peter Hucker is posting to this forum and his posts are not appearing. They are visible to anyone not logged in, or to himself, but not to any logged in user including me. Cease this purile childish nonsense or all machines will be removed from your project. |
||
|
tegz
Cruncher Joined: Mar 10, 2021 Post Count: 3 Status: Offline Project Badges: |
I'm new to this work so not sure if I'm running the latest/beta version -but I did get work for GTX 950 yesterday and completed ok. I did notice later some file downloading that weren't names like the work files showing later..viz:
2705 World Community Grid 07-04-2021 11:26 Started download of 919d0f5d75e85fc2f4ddc66820555e8e.gpf 2704 World Community Grid 07-04-2021 11:26 Finished download of 89d93b7b5839cec1ff7ab63638cac930.pdbqt 2703 World Community Grid 07-04-2021 11:26 Started download of 89d93b7b5839cec1ff7ab63638cac930.pdbqt 2702 World Community Grid 07-04-2021 11:26 Started download of d8d69c0223b76abbf75e161ba6380318.pdbqt 2701 World Community Grid 07-04-2021 11:26 Project requested delay of 121 seconds 2700 World Community Grid 07-04-2021 11:26 Scheduler request completed: got 1 new tasks 2699 World Community Grid 07-04-2021 11:26 Requesting new tasks for CPU and NVIDIA GPU 2698 World Community Grid 07-04-2021 11:26 Sending scheduler request: To fetch work. 2697 World Community Grid 07-04-2021 10:49 Finished download of mip1.MIP1_00331854.2 2696 World Community Grid 07-04-2021 10:48 Finished download of mip1.MIP1_00331854.cst 2695 World Community Grid 07-04-2021 10:48 Started download of mip1.MIP1_00331854.cst The last 3 MIP1 files are showing in new work- but not the previous numbers. I wondered if zip files were for work- or updates to client? |
||
|
|