Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Active Research Forum: Africa Rainfall Project Thread: Ridiculously short runtime, ridiculously low credit |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 11
|
Author |
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2089 Status: Offline Project Badges: |
Noticed a wingman with a very, very short runtime, something like only 8 minutes, and still Valid according to the stats, and when you take a closer look it appears that the wingman's task really ran for about 9¼ hours in total. Also, their BOINC version appears to be 7.17.0, while mine is 7.16.6 on Linux Ubuntu; 7.17.0 seems to be a pre-release (see here and here).
----------------------------------------workunit 754153404: ($ wcgstats -wIQ -m "20" -a "ARP1" -s "V" -l1+"1") ARP1_0028662_079_0-- Linux Ubuntu 727 Valid 7/22/21 21:28:43 7/22/21 21:36:42 0.13 13.1 / 259.7 Details: ($ wcgstats -wwHQ -m "20" -a "ARP1" -s "V" -l1+"1")Project Name: Africa Rainfall Project (Source 'wcgstats')[Edit 1 times, last edit by adriverhoef at Jul 24, 2021 4:52:23 PM] |
||
|
Acibant
Advanced Cruncher USA Joined: Apr 15, 2020 Post Count: 126 Status: Offline Project Badges: |
Technically 7.16.6 is also a development version as they describe it on the downloads page. They don't even list 7.17.0 for download there. It is perplexing that the recommended version is from 2014, however. I don't even see a 7.17 listed for any operating system in their download directory . So it's clear that your wingman compiled BOINC from the source files. Just my observations.
---------------------------------------- |
||
|
sam6861
Advanced Cruncher Joined: Mar 31, 2020 Post Count: 107 Status: Offline Project Badges: |
There is 2 possibility this can happen:
1. The Stderr_txt log can be fake or lying to us from possibly the client computer's modified app, and is really crunching this fast, possibly multithreading or GPU processing. Would be very interested to know if there would be a GPU version of this app. 2. Possibly a server had a wrong date/time? Or server's database made an error? Sent time and return time cannot be modified by client. The only way for return time to be this fast from sent time is to have the client return this fast, or if the server had some error. |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2089 Status: Offline Project Badges: |
The second one; again a wingman with BOINC version 7.17.0 that chalked up a very, very short runtime, namely under three minutes (wingman's task is labelled _1, mine is the blue coloured one, labelled _2):
workunit 746686391: ($ wcgstats -wIQ -m '20' -a 'ARP1' -s 'V' -p '1' -l1+'0') ARP1_0017095_077_2-- Linux Ubuntu 727 Valid 7/24/21 09:25:53 7/25/21 19:52:05 11.75 702.3 / 353.4 Details: ($ wcgstats -wwHQ -m '20' -a 'ARP1' -s 'V' -p '1' -l1+'0')Project Name: Africa Rainfall Project |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12146 Status: Offline Project Badges: |
The other consequence of this problem is that they only claimed a tiny credit which meant that your reasonable claim got halved.
Mike |
||
|
Dayle Diamond
Senior Cruncher Joined: Jan 31, 2013 Post Count: 450 Status: Offline Project Badges: |
Sam6861, there IS a GPU version of the ARP app, and WCG isn't using it.
In a prior post I found a company in Colorado that uses the same app on GPUs, with a sevenfold increase in efficiency. My observation went nowhere; the techs haven't commented on it or anything. |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2089 Status: Offline Project Badges: |
Sam6861, there IS a GPU version of the ARP app, and WCG isn't using it. In a prior post I found a company in Colorado that uses the same app on GPUs, with a sevenfold increase in efficiency. My observation went nowhere; the techs haven't commented on it or anything. To recall, your post - I found it - had one comment from member entity: "Excellent! Your points are well taken but it isn't going to be worth it for this particular project. Maybe a follow-on project perhaps." Of course, it isn't really a GPU version of the ARP app here run by WCG, it's a GPU version of WRF (Weather Research and Forecasting Model), the model that ARP1 uses. |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2089 Status: Offline Project Badges: |
The third one that has a wingman (_2) who returned their result within ten minutes from receipt:
----------------------------------------workunit 758154914: ($ wcgstats -wIQ -m '20' -a 'ARP1' -s 'V' -p '1' -l1+'0') ARP1_0007371_078_2-- Linux Ubuntu 727 Valid 7/25/21 13:14:19 7/25/21 13:23:38 0.16 10.1 / 314.2 Details: ($ wcgstats -wwHQ -m '20' -a 'ARP1' -s 'V' -p '1' -l1+'0')Project Name: Africa Rainfall Project (Source 'wcgstats') [Edit 1 times, last edit by adriverhoef at Jul 26, 2021 10:06:25 AM] |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2089 Status: Offline Project Badges: |
After deeper investigating three of these results this is my conclusion.
----------------------------------------First, the facts. An ARP1 workunit usually consists of two tasks (quorum 2) and they are sent to clients around the same time. Let's have a closer look at the three workunits. (a) The following two wingmen should have received their copy around the same time, but WCG reports a time difference of nearly 9½ hours for the first workunit. ARP1_0028662_079_0-- Linux Ubuntu 727 Valid 7/22/21 21:28:43 7/22/21 21:36:42 0.13 13.1 / 259.7 (b) The "No Reply" from wingman _1 in the second workunit was determined at 7/24/21 09:07:57 and a new copy was sent out by WCG - 18 minutes later - to wingman _2 at 7/24/21 09:25:53. The first two wingmen (_0 and _1) should have received their copy around the same time, but WCG reports a time difference of about 10 hours. ARP1_0017095_077_2-- Linux Ubuntu 727 Valid 7/24/21 09:25:53 7/25/21 19:52:05 11.75 702.3 / 353.4 (c) The "Error" from wingman _1 in the third workunit was determined at 7/24/21 21:35:18 and a new copy was sent out by WCG, recorded - according to the stats - at 7/25/21 13:14:19. (More than fifteen hours later. That doesn't sound right, right?) ARP1_0007371_078_2-- Linux Ubuntu 727 Valid 7/25/21 13:14:19 7/25/21 13:23:38 0.16 10.1 / 314.2 In the logs of the tasks that were recorded by the wingmen a normal pattern is seen, for BOINC client version 7.16.6 as well as for version 7.17.0, with runtimes ranging around 10 hours on the clock. Careful conclusion: if the tasks from a workunit really get sent to clients around the same time, then it seems that the WCG server isn't reporting this in a proper way for a certain BOINC client version, namely 7.17.0. [Edit 2 times, last edit by adriverhoef at Aug 7, 2021 12:20:31 PM] |
||
|
sam6861
Advanced Cruncher Joined: Mar 31, 2020 Post Count: 107 Status: Offline Project Badges: |
I guess I agree a server side problem with wrong sent time. This causes low CPU time (server reduced this number), and low credits. I took a look at some of Boinc source code.
----------------------------------------Step 1: Somehow a wrong sent time, unsure how this happened. Step 2: Boinc/sched/sched_result.php Line 362: Drops the elapsed time and CPU time to a tiny amount. "impossible elapsed time" and "impossible CPU time". All this dropping of CPU time caused by wrong sent time. (edit: sched_result, not eched_result) Step 3: I guess somewhere in the code, possibly in credit_test.php, there is claimed_credit = cpu_time / wu.rsc_fpops_est. When the cpu_time is very low, the credits also goes very low. [Edit 1 times, last edit by sam6861 at Jul 27, 2021 1:09:51 AM] |
||
|
|