Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Locked
Total posts in this thread: 511
Posts: 511   Pages: 52   [ Previous Page | 7 8 9 10 11 12 13 14 15 16 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 736559 times and has 510 replies Next Thread
goben_2003
Advanced Cruncher
Joined: Jun 16, 2006
Post Count: 146
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Edit:I wonder if some people here are running those "hacked" BOINC versions, with which you can trick the server to think that you have as many GPU's as you wish, and thereby download tons of GPU WU's?


Well, I definitely am running an official BOINC release, not some hacked version. There is 1 out of 3 computers with gpu(s) that I have that is getting extra work units. It has 15 at the moment. It is a 4core/8thread i5 with a intel uhd 620 igpu. I have 2 of that series laptop, but the other one has a 4core/8thread i7 with the same igpu. The second one is not getting extra units.

As some have noticed, my theory on why some are getting more than others is incorrect. I'm not worried about some getting more than their threads at the moment, but it does introduce something I should look into for the mechanism which schedules beta work units.

Thanks,
-Uplinger

Hello Keith,

I noticed that the number of sent tasks only are exceeding the number of cpu-threads when the GPU is of the integrated type APU or iGPU.

Good luck!
CP

The 1/3 of my computers with an iGPU has extra. The other 1 with an iGPU did not get extra. The 1 with an RTX 2060 and an iGPU did not either.

Thanks for all the updates Keith. I'm roughly on the other side of the world, which is why I tend to talk during a different part of the day. smile
----------------------------------------

[Mar 27, 2021 8:02:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mamajuanauk
Master Cruncher
United Kingdom
Joined: Dec 15, 2012
Post Count: 1900
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Hi Uplinger
I've been seeing tasks running for a short time 0.03 seconds with a Valid result and NO credit!
    BETA_OPNG_0000096_00184_0 Azalea Valid 3/26/21 20:29:25 3/26/21 20:52:30 0.03 / 0.37 0.0 / 0.0

Is this correct/to be expected as the time is so short?

Thanks


So the points are still an issue, I am working towards fixing that, but I am planning to credit time based on elapsed time and not CPU time as that makes more sense for the GPU project. That would be the 0.37 hours you see on your report.

Note: I am not going to hold up the release of GPU over points...it is on my priority list near the top and I would like to have it fixed before launching.

Thanks,
-Uplinger
Thanks for the update Uplinger

Edit: Will there be credit for the Beta tasks completed?
----------------------------------------
Mamajuanauk is the Name! Crunching is the Game!



----------------------------------------
[Edit 1 times, last edit by Mamajuanauk at Mar 27, 2021 8:04:50 AM]
[Mar 27, 2021 8:03:24 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Studying Intel iGPU results. Interesting case:

https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=590484762

Five of us have tried it, four have failed.
Mine was replication _3. It got to 37.5% progress (16 AutoDock runs), but then stalled. The workunit display shows 0.01 hours CPU, but the host page shows 8.21 hours elapsed.

The next two (Windows 10) machines completed fewer tasks - 4 and 1 respectively - but recorded ~ 3 hours CPU time. I can't see the elapsed time. We were all killed off by the timeout error.

The final user - replication _5 - completed the tasks in 157 seconds CPU, 31 minutes elapsed, but has status 'Too Late'.

I have three other iGPU tasks still shown as 'Running', but in practice doing nothing. They have been active for around 12 hours, and I expect them to time out after about another four hours. I'll gather what further information I can, and write them up later.
[Mar 27, 2021 8:40:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
goben_2003
Advanced Cruncher
Joined: Jun 16, 2006
Post Count: 146
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

The next two (Windows 10) machines completed fewer tasks - 4 and 1 respectively - but recorded ~ 3 hours CPU time. I can't see the elapsed time. We were all killed off by the timeout error.


This is an interesting case. Just a note that yours was the only one that timed out and had completed some of the jobs(15/40). Note the lack of "End AutoGrid..." BETA_ OPNG_ 0000072_ 00175_ 2 did not complete 4 tasks, it appears to have been starting Job 0 over and over.

The one that did complete it is a far more powerful Iris iGPU. Two of the others are 1 gen newer though.
confused

Good luck on the rest of your beta units! smile
----------------------------------------

[Mar 27, 2021 9:18:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TLD
Veteran Cruncher
USA
Joined: Jul 22, 2005
Post Count: 810
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

I ended up getting 11 WUs on the ATI Radeon hd 5770, 1 errored out with the exceeded elapsed time limit. https://www.worldcommunitygrid.org/ms/device/...og.do?resultId=1588786117
and 1 was server aborted. https://www.worldcommunitygrid.org/ms/device/...og.do?resultId=1588882096

The GT 1030 got 4 WUs all validated fine.
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by TLD at Mar 27, 2021 9:22:59 AM]
[Mar 27, 2021 9:22:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2223
Status: Recently Active
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

It's a very interesting problem with the iGPU's My iGPU HD4600 have performed flawlessly through all the different Beta runs, except for one task in the previous run, which errored out due to a too low "max time" setting ("exceeded elapsed time limit")

This last Beta run, it completed 3 WU's in 25:10, 30:56, and 32:33 minutes.
[Mar 27, 2021 9:33:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
goben_2003
Advanced Cruncher
Joined: Jun 16, 2006
Post Count: 146
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

I ended up getting 11 WUs on the ATI Radeon hd 5770, 1 errored out with the exceeded elapsed time limit. https://www.worldcommunitygrid.org/ms/device/...og.do?resultId=1588786117
and 1 was server aborted. https://www.worldcommunitygrid.org/ms/device/...og.do?resultId=1588882096

The GT 1030 got 4 WUs all validated fine.

That is too bad. It was so close to finishing too! This is different than the other timeout ones with 7.28 that I have seen. This one has the timeout exception only 4min and 23s after starting the final job.
----------------------------------------

[Mar 27, 2021 9:39:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
bozz4science
Advanced Cruncher
Germany
Joined: May 3, 2020
Post Count: 104
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Noticed this WU that got "stuck" again over night, and blocking a GPU slot for toughly 1/4th of a day (5.8 hrs) without doing any computational work. All "stuck" issues so far have only ever occured on a 1660 Super card. The proportion of WUs that got stuck so far is rather low; I'd estimate one in 20 WUs or ~5%, but the impact is rather large as within a "locked" period of roughly 6 hrs, I could have compute orders of magnitue more WUs than the one that got stuck. [BETA_OPNG_0000107_00041]. Both wingmans did just fine...

Kind of bothersome as I cannot obviously check the runtime status of WUs at night. Probably will only allow to let GPU WUs compute during the day.
----------------------------------------

AMD Ryzen 3700X @ 4.0 GHz / GTX1660S
Intel i5-4278U CPU @ 2.60GHz
----------------------------------------
[Edit 1 times, last edit by bozz4science at Mar 27, 2021 9:56:55 AM]
[Mar 27, 2021 9:56:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

This is an interesting case. Just a note that yours was the only one that timed out and had completed some of the jobs(15/40). Note the lack of "End AutoGrid..." BETA_ OPNG_ 0000072_ 00175_ 2 did not complete 4 tasks, it appears to have been starting Job 0 over and over.
Thanks for picking that up. I've preserved the two big output files, in case Keith wants to have a look at them, but no error messages are obvious to my eye.
Good luck on the rest of your beta units! smile
Sadly, I already know they're going to fail. Here's the complete stderr output from an example:
projects/www.worldcommunitygrid.org/wcgrid_beta29_autodockgpu_7.28_windows_x86_64__opencl_intel_gpu_102 -jobs OPNG_0000089_00015.job -input OPNG_0000089_00015.zip -seed 1482125208 -wcgruns 1700 -wcgdpf 34 
INFO: Using gpu device from app init data 0
INFO:[20:18:03] Start AutoGrid...

autogrid4: Successful Completion.
INFO:[20:18:08] End AutoGrid...
INFO:[20:18:09] Start AutoDock for ZINC000904329721_2-ACR2.14_RX1--fr2266benz_001--CYS114.dpf(Job #0)...
OpenCL device: Intel(R) HD Graphics 4600

It's been pegged at the placebo limit of 100% since I went to bed last night. The last sign of life was
<active_task>
<project_master_url>http://www.worldcommunitygrid.org/</project_master_url>
<result_name>BETA_OPNG_0000089_00015_1</result_name>
<checkpoint_cpu_time>5.335234</checkpoint_cpu_time>
<checkpoint_elapsed_time>8.230011</checkpoint_elapsed_time>
<fraction_done>0.000000</fraction_done>
<peak_working_set_size>93335552</peak_working_set_size>
<peak_swap_size>90198016</peak_swap_size>
<peak_disk_usage>245937</peak_disk_usage>
</active_task>

Process Explorer shows something's happening, but not a lot. The highlighted line flashes back to zero every other second.


For comparison, here's what a similar machine looks like when it's working on its day job at Einstein@Home:


And I think that's about all I can say. I've preserved (I think) enough of the entrails to construct an offline test run, but I think for that to be useful we would need a binary with better debug instrumentation.
[Mar 27, 2021 11:33:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2223
Status: Recently Active
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

@Richard

Does the two different "State" in Process Explorer for the WCG and Einstein tasks give any clue?

The "State" for the WCG tasks is in your screen dump is "Wait:UserRequest", but the Einstein tasks "State", is "Wait:DelayExecution"

Edit: Also worth mentioning maybe, because my iGPU seems to crunch these WU's well, is that I'm only running one project (WCG), and also only max 1-2 WCG tasks on the CPU ( i7-4790K). Also 1 task on the GTX980.
----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Mar 27, 2021 12:14:12 PM]
[Mar 27, 2021 12:01:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 511   Pages: 52   [ Previous Page | 7 8 9 10 11 12 13 14 15 16 | Next Page ]
[ Jump to Last Post ]
Post new Thread