Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Locked
Total posts in this thread: 511
Posts: 511   Pages: 52   [ Previous Page | 20 21 22 23 24 25 26 27 28 29 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 498650 times and has 510 replies Next Thread
mdxi
Advanced Cruncher
Joined: Dec 6, 2017
Post Count: 109
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Since the last beta I had a very unexpected exchange with an AMD engineer, which led to a bit of research, and long story short: I now have the OpenCL portions of the AMDGPU-PRO drivers deployed on my nodes with AMD GPUs, and this means that those nodes are now OCL 1.2 compliant. Which, in turn, means that OPNG now runs successfully on those nodes (well, only one of the three got WUs, but they're all using the same GPU series, so this result should apply to all of them). 8 WUs completed, all in very reasonable times:

WUs matching 'OPNG' for World Community Grid in past 24 hours: 8
Total CPU time used: 00h 09min 52s
Min runtime: 00h 00min 57s
Max runtime: 00h 01min 25s
Avg runtime: 00h 01min 14s
WUs by quintile:
<= 00h 01min 02s 1 (12.5%)
<= 00h 01min 07s 0 (00.0%)
<= 00h 01min 12s 1 (12.5%)
<= 00h 01min 17s 2 (25.0%)
<= 00h 01min 25s 4 (50.0%)

All validated already. No infinite hangs or other weirdness. Woo!
----------------------------------------

[Mar 31, 2021 3:17:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Keith,
FWIW the 4 resend tasks that I had validated before you released the latest 20 batches of betas all received between roughly 1100 & 1300 points each. The betas after those 4 received half or less the amount of points per task. I'm guessing that's because those 4 resends were more involved tasks. I'm not at all complaining. Just giving you a heads up in case there is an issue that needs addressed. Thanks for all the hard work you do here.


After my last post I was looking into the points and there was a bug in the granting of credit. The amount of credit should be higher. I am working to push that change out now. Your thoughts on those getting closer to 1100 and 1300 points means they ran pretty much 90% on average to all jobs, some of which were probably hitting 100%.

Do you have an example of the work unit name for the 1100 and 1300 point results that you encountered? I can use those to verify on my end what you're seeing.

Thanks,
-Uplinger
[Mar 31, 2021 3:25:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1264
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Thanks for the details
From your comment of getting 600, that sounds like lots of the jobs stopped early. I will be monitoring the values to make sure things look correct over the next few days and early when we go to production.

Task in question had 30 jobs inside it. Time spent on the job was 0.03/0.04 0000162_00228. Looks like between 2 and 4 seconds were spent on each job within the task

I noticed I had another task that had 18 jobs inside and granted me with 0.3/820.7 with the runtime of 0.03/0.03 Does seem excessively high considering my other task only had 30 jobs? 0030007_00530_0 Looks like around 7 seconds were spent on each job with this task
I have included the task names above for you my Device name is DESKTOP-FVL1L8F If you would like to look into it.

I am sure I talk for more than myself when I say this I/we appreciate. The time you spend on things making sure they work correctly.
----------------------------------------

[Mar 31, 2021 3:27:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Keith,
FWIW the 4 resend tasks that I had validated before you released the latest 20 batches of betas all received between roughly 1100 & 1300 points each. The betas after those 4 received half or less the amount of points per task. I'm guessing that's because those 4 resends were more involved tasks. I'm not at all complaining. Just giving you a heads up in case there is an issue that needs addressed. Thanks for all the hard work you do here.


After my last post I was looking into the points and there was a bug in the granting of credit. The amount of credit should be higher. I am working to push that change out now. Your thoughts on those getting closer to 1100 and 1300 points means they ran pretty much 90% on average to all jobs, some of which were probably hitting 100%.

Do you have an example of the work unit name for the 1100 and 1300 point results that you encountered? I can use those to verify on my end what you're seeing.

Thanks,
-Uplinger

Here is the one that was granted the most points. Mine is the Win 7 machine and is the slowest of the 4 I have running.


Project Name: BETA - OpenPandemics - COVID-19 - GPU
Created: 03/26/2021 19:50:39
Name: BETA_OPNG_0000089_00236
Minimum Quorum: 2
Replication: 2


Result Name OS type OS version App Version Number Status Sent Time Time Due /
Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit
BETA_ OPNG_ 0000089_ 00236_ 2-- Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) 728 Valid 3/30/21 20:16:22 3/30/21 20:37:49 0.03 0.0 / 1,321.9
BETA_ OPNG_ 0000089_ 00236_ 1-- Microsoft Windows 10 Professional x64 Edition, (10.00.19042.00) 728 Valid 3/26/21 20:15:44 3/26/21 20:37:45 0.04 0.0 / 1,285.1
BETA_ OPNG_ 0000089_ 00236_ 0-- Microsoft Windows 10 Core x64 Edition, (10.00.19042.00) - No Reply 3/26/21 20:15:35 3/30/21 20:15:35 0.00 0.0 / 0.0
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


----------------------------------------
[Edit 1 times, last edit by nanoprobe at Mar 31, 2021 3:49:47 AM]
[Mar 31, 2021 3:47:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Thanks Speedy,

Those credits look in line with what I'm expecting to see. For a CPU work unit, we estimate they can run X jobs based on what each job has inside of it. This is based off how many atoms are in a given ligand.

( 0.0000000122 * Atoms^2 + 0.0000000751 * Atoms + 0.0000105946 ) * ga_num_evals * ga_run = how long we estimate it'll take for an average cpu.

Each job has a different number of atoms and structure, which changes the equation by evals being different and higher generally with more atoms in a ligand. This is 100% just an estimate but gets us a pretty good average runtime on similar processors.

When a work unit is created, we package multiple jobs together or split them up based on how difficult they are. We try to target say 3 hours per CPU work unit. For the GPU version, we create them with 20 times the difficulty as CPU version. These are split the exact same way, thus they get 20 times more points because they were originally created 20 times harder.

If we ran one of the GPU work units on CPU, it would on average take them 60 hours to complete the same task.

This is the basis for why points are granted the way they are for this application. One thing that is different as I have mentioned just above, is that when a job finds a good answer, it does not need to continue and stops early. This is why you are granted a percentage of the total max points allowed.

Thanks,
-Uplinger
[Mar 31, 2021 4:05:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Keith,
FWIW the 4 resend tasks that I had validated before you released the latest 20 batches of betas all received between roughly 1100 & 1300 points each. The betas after those 4 received half or less the amount of points per task. I'm guessing that's because those 4 resends were more involved tasks. I'm not at all complaining. Just giving you a heads up in case there is an issue that needs addressed. Thanks for all the hard work you do here.


After my last post I was looking into the points and there was a bug in the granting of credit. The amount of credit should be higher. I am working to push that change out now. Your thoughts on those getting closer to 1100 and 1300 points means they ran pretty much 90% on average to all jobs, some of which were probably hitting 100%.

Do you have an example of the work unit name for the 1100 and 1300 point results that you encountered? I can use those to verify on my end what you're seeing.

Thanks,
-Uplinger

Here is the one that was granted the most points. Mine is the Win 7 machine and is the slowest of the 4 I have running.


Project Name: BETA - OpenPandemics - COVID-19 - GPU
Created: 03/26/2021 19:50:39
Name: BETA_OPNG_0000089_00236
Minimum Quorum: 2
Replication: 2


Result Name OS type OS version App Version Number Status Sent Time Time Due /
Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit
BETA_ OPNG_ 0000089_ 00236_ 2-- Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) 728 Valid 3/30/21 20:16:22 3/30/21 20:37:49 0.03 0.0 / 1,321.9
BETA_ OPNG_ 0000089_ 00236_ 1-- Microsoft Windows 10 Professional x64 Edition, (10.00.19042.00) 728 Valid 3/26/21 20:15:44 3/26/21 20:37:45 0.04 0.0 / 1,285.1
BETA_ OPNG_ 0000089_ 00236_ 0-- Microsoft Windows 10 Core x64 Edition, (10.00.19042.00) - No Reply 3/26/21 20:15:35 3/30/21 20:15:35 0.00 0.0 / 0.0


The no reply person has 3 total results that all went no reply. For your workunit, yes, you had to basically do all the calculations in each job. Unfortunately I can not tell the future on these jobs to know what they will do. But this particular work unit was very difficult on every job.

Thanks,
-Uplinger
[Mar 31, 2021 4:22:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
bozz4science
Advanced Cruncher
Germany
Joined: May 3, 2020
Post Count: 104
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Looking good so far my side as bell. Just the same issue repeating for my since the start of the first beta test, that my 1660S always suddenly stops after a certain amount of time. If I dont baby sit these tasks ans suspend/unsuspend them, they run into runtime exceeded errors, while otherwise applying this strategy, sets back runtime by a few minutes and let's them finish within minutes.

No issues on a 970. I'll try a clean driver intstall before going live trying to fix that issue hopefully.
----------------------------------------

AMD Ryzen 3700X @ 4.0 GHz / GTX1660S
Intel i5-4278U CPU @ 2.60GHz
[Mar 31, 2021 9:01:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
maeax
Advanced Cruncher
Joined: May 2, 2007
Post Count: 142
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

The no reply person has 3 total results that all went no reply. For your workunit, yes, you had to basically do all the calculations in each job. Unfortunately I can not tell the future on these jobs to know what they will do. But this particular work unit was very difficult on every job.

Thanks,
-Uplinger

24 Calculations https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=596793907
0.03 / 0.94 0.3 / 891.7
29 Calculations https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=596792236
0.04 / 1.11 0.3 / 1,023.1
The difference seems ok.
----------------------------------------
AMD Ryzen Threadripper PRO 3995WX 64-Cores/ AMD Radeon (TM) Pro W6600. OS Win11pro
[Mar 31, 2021 10:01:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

The no reply person has 3 total results that all went no reply. For your workunit, yes, you had to basically do all the calculations in each job. Unfortunately I can not tell the future on these jobs to know what they will do. But this particular work unit was very difficult on every job.

Thanks,
-Uplinger

Thanks Keith. Glad to see all seems well. Just one more question. When do you sleep? confused smile
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


----------------------------------------
[Edit 1 times, last edit by nanoprobe at Mar 31, 2021 12:18:46 PM]
[Mar 31, 2021 12:18:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

The no reply person has 3 total results that all went no reply. For your workunit, yes, you had to basically do all the calculations in each job. Unfortunately I can not tell the future on these jobs to know what they will do. But this particular work unit was very difficult on every job.

Thanks,
-Uplinger

Thanks Keith. Glad to see all seems well. Just one more question. When do you sleep? confused smile


Sleep is overrated...We are putting in a few extra hours to help push the release of this on my aggressive timeline :)

On a side note, I am planning on releasing about 200 batches of Beta today. I am scheduling them to build here in a few minutes and then after my 9am meeting, I'll work towards loading them into boinc.

Also, it looks like the fix to the validator to award points is working as expected.

Thanks,
-Uplinger
[Mar 31, 2021 1:22:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 511   Pages: 52   [ Previous Page | 20 21 22 23 24 25 26 27 28 29 | Next Page ]
[ Jump to Last Post ]
Post new Thread