Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 11
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2909 times and has 10 replies Next Thread
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2089
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Ridiculously short runtime, ridiculously low credit

Noticed a wingman with a very, very short runtime, something like only 8 minutes, and still Valid according to the stats, and when you take a closer look it appears that the wingman's task really ran for about 9¼ hours in total. Also, their BOINC version appears to be 7.17.0, while mine is 7.16.6 on Linux Ubuntu; 7.17.0 seems to be a pre-release (see here and here).

workunit 754153404: ($ wcgstats -wIQ -m "20" -a "ARP1" -s "V" -l1+"1")
ARP1_0028662_079_0--   Linux Ubuntu   727   Valid                  7/22/21 21:28:43    7/22/21 21:36:42    0.13       13.1 / 259.7
ARP1_0028662_079_1-- Linux Ubuntu 727 Valid 7/22/21 12:03:33 7/23/21 22:43:25 11.54 506.4 / 259.7
----------------------------------------------------------------------------------------------------------------------------------
Details: ($ wcgstats -wwHQ -m "20" -a "ARP1" -s "V" -l1+"1")
Project Name: Africa Rainfall Project
Created: 07/20/2021 20:25:34
Name: ARP1_0028662_079
Minimum Quorum: 2
Replication: 2
ARP1_0028662_079_0-- Linux Ubuntu 727 Valid 7/22/21 21:28:43 7/22/21 21:36:42 0.13 13.1 / 259.7
<core_client_version>7.17.0</core_client_version>
<![CDATA[
<stderr_txt>
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
[08:23:38] INFO: Checkpoint taken at 2018-12-06_06:00:00
[09:36:01] INFO: Checkpoint taken at 2018-12-06_12:00:00
[10:49:19] INFO: Checkpoint taken at 2018-12-06_18:00:00
[11:37:06] INFO: Checkpoint taken at 2018-12-07_00:00:00
[12:59:14] INFO: Checkpoint taken at 2018-12-07_06:00:00
[14:05:40] INFO: Checkpoint taken at 2018-12-07_12:00:00
[15:13:33] INFO: Checkpoint taken at 2018-12-07_18:00:00
[16:22:13] INFO: Checkpoint taken at 2018-12-08_00:00:00
INFO: Simulation complete compressing output.
16:23:38 (2299655): called boinc_finish(0)

</stderr_txt>
]]>
ARP1_0028662_079_1-- Linux Ubuntu 727 Valid 7/22/21 12:03:33 7/23/21 22:43:25 11.54 506.4 / 259.7
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<stderr_txt>
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
[14:31:02] INFO: Checkpoint taken at 2018-12-06_06:00:00
[16:13:28] INFO: Checkpoint taken at 2018-12-06_12:00:00
[17:52:16] INFO: Checkpoint taken at 2018-12-06_18:00:00
[19:03:25] INFO: Checkpoint taken at 2018-12-07_00:00:00
[20:19:18] INFO: Checkpoint taken at 2018-12-07_06:00:00
[21:54:25] INFO: Checkpoint taken at 2018-12-07_12:00:00
[23:31:12] INFO: Checkpoint taken at 2018-12-07_18:00:00
[00:40:19] INFO: Checkpoint taken at 2018-12-08_00:00:00
INFO: Simulation complete compressing output.
00:42:02 (88975): called boinc_finish(0)

</stderr_txt>
]]>
(Source 'wcgstats')
----------------------------------------
[Edit 1 times, last edit by adriverhoef at Jul 24, 2021 4:52:23 PM]
[Jul 24, 2021 11:03:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Acibant
Advanced Cruncher
USA
Joined: Apr 15, 2020
Post Count: 126
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Ridiculously short runtime, ridiculously low credit

Technically 7.16.6 is also a development version as they describe it on the downloads page. They don't even list 7.17.0 for download there. It is perplexing that the recommended version is from 2014, however. I don't even see a 7.17 listed for any operating system in their download directory . So it's clear that your wingman compiled BOINC from the source files. Just my observations.
----------------------------------------

[Jul 24, 2021 1:23:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sam6861
Advanced Cruncher
Joined: Mar 31, 2020
Post Count: 107
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Ridiculously short runtime, ridiculously low credit

There is 2 possibility this can happen:

1. The Stderr_txt log can be fake or lying to us from possibly the client computer's modified app, and is really crunching this fast, possibly multithreading or GPU processing. Would be very interested to know if there would be a GPU version of this app.

2. Possibly a server had a wrong date/time? Or server's database made an error? Sent time and return time cannot be modified by client. The only way for return time to be this fast from sent time is to have the client return this fast, or if the server had some error.
[Jul 24, 2021 9:47:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2089
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Ridiculously short runtime, ridiculously low credit

The second one; again a wingman with BOINC version 7.17.0 that chalked up a very, very short runtime, namely under three minutes (wingman's task is labelled _1, mine is the blue coloured one, labelled _2):
workunit 746686391: ($ wcgstats -wIQ -m '20' -a 'ARP1' -s 'V' -p '1' -l1+'0')
ARP1_0017095_077_2--   Linux Ubuntu   727   Valid                  7/24/21 09:25:53    7/25/21 19:52:05    11.75      702.3 / 353.4
ARP1_0017095_077_0-- Linux Ubuntu 727 Valid 7/16/21 19:09:22 7/16/21 19:12:19 0.05 4.5 / 353.4
ARP1_0017095_077_1-- Linux Ubuntu - No Reply 7/16/21 09:07:57 7/24/21 09:07:57 0.00 0.0 / 0.0
----------------------------------------------------------------------------------------------------------------------------------
Details: ($ wcgstats -wwHQ -m '20' -a 'ARP1' -s 'V' -p '1' -l1+'0')
Project Name: Africa Rainfall Project
Created: 07/13/2021 04:25:43
Name: ARP1_0017095_077
Minimum Quorum: 2
Replication: 2
ARP1_0017095_077_2-- Linux Ubuntu 727 Valid 7/24/21 09:25:53 7/25/21 19:52:05 11.75 702.3 / 353.4
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<stderr_txt>
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
[11:19:35] INFO: Checkpoint taken at 2018-12-02_06:00:00
[13:01:42] INFO: Checkpoint taken at 2018-12-02_12:00:00
[14:40:40] INFO: Checkpoint taken at 2018-12-02_18:00:00
[16:00:10] INFO: Checkpoint taken at 2018-12-03_00:00:00
[17:18:49] INFO: Checkpoint taken at 2018-12-03_06:00:00
[18:59:28] INFO: Checkpoint taken at 2018-12-03_12:00:00
[20:33:21] INFO: Checkpoint taken at 2018-12-03_18:00:00
[21:48:44] INFO: Checkpoint taken at 2018-12-04_00:00:00
INFO: Simulation complete compressing output.
21:50:30 (103525): called boinc_finish(0)

</stderr_txt>
]]>
ARP1_0017095_077_0-- Linux Ubuntu 727 Valid 7/16/21 19:09:22 7/16/21 19:12:19 0.05 4.5 / 353.4
<core_client_version>7.17.0</core_client_version>
<![CDATA[
<stderr_txt>
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
[05:34:17] INFO: Checkpoint taken at 2018-12-02_06:00:00
[06:45:05] INFO: Checkpoint taken at 2018-12-02_12:00:00
[08:26:57] INFO: Checkpoint taken at 2018-12-02_18:00:00
[09:23:09] INFO: Checkpoint taken at 2018-12-03_00:00:00
[10:18:16] INFO: Checkpoint taken at 2018-12-03_06:00:00
[11:27:44] INFO: Checkpoint taken at 2018-12-03_12:00:00
[13:04:42] INFO: Checkpoint taken at 2018-12-03_18:00:00
[13:56:54] INFO: Checkpoint taken at 2018-12-04_00:00:00
INFO: Simulation complete compressing output.
13:58:18 (2199218): called boinc_finish(0)

</stderr_txt>
]]>
ARP1_0017095_077_1-- Linux Ubuntu - No Reply 7/16/21 09:07:57 7/24/21 09:07:57 0.00 0.0 / 0.0

[Jul 25, 2021 8:18:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12146
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Ridiculously short runtime, ridiculously low credit

The other consequence of this problem is that they only claimed a tiny credit which meant that your reasonable claim got halved.

Mike
[Jul 25, 2021 8:52:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 450
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Ridiculously short runtime, ridiculously low credit

Sam6861, there IS a GPU version of the ARP app, and WCG isn't using it.
In a prior post I found a company in Colorado that uses the same app on GPUs, with a sevenfold increase in efficiency.

My observation went nowhere; the techs haven't commented on it or anything.
[Jul 26, 2021 7:20:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2089
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Ridiculously short runtime, ridiculously low credit

Sam6861, there IS a GPU version of the ARP app, and WCG isn't using it.
In a prior post I found a company in Colorado that uses the same app on GPUs, with a sevenfold increase in efficiency.

My observation went nowhere; the techs haven't commented on it or anything.

To recall, your post - I found it smile - had one comment from member entity: "Excellent! Your points are well taken but it isn't going to be worth it for this particular project. Maybe a follow-on project perhaps."

Of course, it isn't really a GPU version of the ARP app here run by WCG, it's a GPU version of WRF (Weather Research and Forecasting Model), the model that ARP1 uses.
[Jul 26, 2021 9:25:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2089
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Ridiculously short runtime, ridiculously low credit

The third one that has a wingman (_2) who returned their result within ten minutes from receipt:
workunit 758154914: ($ wcgstats -wIQ -m '20' -a 'ARP1' -s 'V' -p '1' -l1+'0')
ARP1_0007371_078_2--   Linux Ubuntu   727   Valid                  7/25/21 13:14:19    7/25/21 13:23:38    0.16       10.1 / 314.2
ARP1_0007371_078_0-- Linux Ubuntu 727 Valid 7/24/21 20:30:59 7/26/21 09:36:16 13.65 618.4 / 314.2
ARP1_0007371_078_1-- FreeBSD 727 Error 7/24/21 20:10:51 7/24/21 21:35:18 0.00 486.7 / 0.0
---------------------------------------------------------------------------------------------------------------------------------------
Details: ($ wcgstats -wwHQ -m '20' -a 'ARP1' -s 'V' -p '1' -l1+'0')
Project Name: Africa Rainfall Project
Created: 07/24/2021 19:25:33
Name: ARP1_0007371_078
Minimum Quorum: 2
Replication: 2
ARP1_0007371_078_2-- Linux Ubuntu 727 Valid 7/25/21 13:14:19 7/25/21 13:23:38 0.16 10.1 / 314.2
<core_client_version>7.17.0</core_client_version>
<![CDATA[
<stderr_txt>
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
[21:41:44] INFO: Checkpoint taken at 2018-12-04_06:00:00
[23:24:03] INFO: Checkpoint taken at 2018-12-04_12:00:00
[00:56:55] INFO: Checkpoint taken at 2018-12-04_18:00:00
[02:06:49] INFO: Checkpoint taken at 2018-12-05_00:00:00
[03:31:34] INFO: Checkpoint taken at 2018-12-05_06:00:00
[05:19:13] INFO: Checkpoint taken at 2018-12-05_12:00:00
[06:54:08] INFO: Checkpoint taken at 2018-12-05_18:00:00
[08:07:40] INFO: Checkpoint taken at 2018-12-06_00:00:00
INFO: Simulation complete compressing output.
08:09:11 (174691): called boinc_finish(0)

</stderr_txt>
]]>
ARP1_0007371_078_0-- Linux Ubuntu 727 Valid 7/24/21 20:30:59 7/26/21 09:36:16 13.65 618.4 / 314.2
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<stderr_txt>
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
[23:25:42] INFO: Checkpoint taken at 2018-12-04_06:00:00
[01:22:19] INFO: Checkpoint taken at 2018-12-04_12:00:00
[03:04:40] INFO: Checkpoint taken at 2018-12-04_18:00:00
[04:26:55] INFO: Checkpoint taken at 2018-12-05_00:00:00
[06:04:18] INFO: Checkpoint taken at 2018-12-05_06:00:00
[08:11:35] INFO: Checkpoint taken at 2018-12-05_12:00:00
[10:04:49] INFO: Checkpoint taken at 2018-12-05_18:00:00
[11:33:05] INFO: Checkpoint taken at 2018-12-06_00:00:00
INFO: Simulation complete compressing output.
11:34:51 (106754): called boinc_finish(0)

</stderr_txt>
]]>
ARP1_0007371_078_1-- FreeBSD 727 Error 7/24/21 20:10:51 7/24/21 21:35:18 0.00 486.7 / 0.0
<core_client_version>7.8.6</core_client_version>
<![CDATA[
<message>
process got signal 11</message>
<stderr_txt>

</stderr_txt>
]]>

(Source 'wcgstats')
----------------------------------------
[Edit 1 times, last edit by adriverhoef at Jul 26, 2021 10:06:25 AM]
[Jul 26, 2021 10:05:44 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2089
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Ridiculously short runtime, ridiculously low credit

After deeper investigating three of these results this is my conclusion.

First, the facts.
An ARP1 workunit usually consists of two tasks (quorum 2) and they are sent to clients around the same time.

Let's have a closer look at the three workunits.

(a)
The following two wingmen should have received their copy around the same time,
but WCG reports a time difference of nearly 9½ hours for the first workunit.
ARP1_0028662_079_0--   Linux Ubuntu   727   Valid                  7/22/21 21:28:43    7/22/21 21:36:42    0.13       13.1 / 259.7
ARP1_0028662_079_1-- Linux Ubuntu 727 Valid 7/22/21 12:03:33 7/23/21 22:43:25 11.54 506.4 / 259.7


(b)
The "No Reply" from wingman _1 in the second workunit was determined at 7/24/21 09:07:57
and a new copy was sent out by WCG - 18 minutes later - to wingman _2 at 7/24/21 09:25:53.
The first two wingmen (_0 and _1) should have received their copy around the same time,
but WCG reports a time difference of about 10 hours.
ARP1_0017095_077_2--   Linux Ubuntu   727   Valid                  7/24/21 09:25:53    7/25/21 19:52:05    11.75      702.3 / 353.4
ARP1_0017095_077_0-- Linux Ubuntu 727 Valid 7/16/21 19:09:22 7/16/21 19:12:19 0.05 4.5 / 353.4
ARP1_0017095_077_1-- Linux Ubuntu - No Reply 7/16/21 09:07:57 7/24/21 09:07:57 0.00 0.0 / 0.0


(c)
The "Error" from wingman _1 in the third workunit was determined at 7/24/21 21:35:18
and a new copy was sent out by WCG, recorded - according to the stats - at 7/25/21 13:14:19.
(More than fifteen hours later. That doesn't sound right, right?)
ARP1_0007371_078_2--   Linux Ubuntu   727   Valid                  7/25/21 13:14:19    7/25/21 13:23:38    0.16       10.1 / 314.2
ARP1_0007371_078_0-- Linux Ubuntu 727 Valid 7/24/21 20:30:59 7/26/21 09:36:16 13.65 618.4 / 314.2
ARP1_0007371_078_1-- FreeBSD 727 Error 7/24/21 20:10:51 7/24/21 21:35:18 0.00 486.7 / 0.0


In the logs of the tasks that were recorded by the wingmen a normal pattern is seen,
for BOINC client version 7.16.6 as well as for version 7.17.0, with runtimes ranging around 10 hours on the clock.

Careful conclusion: if the tasks from a workunit really get sent to clients around the same time,
then it seems that the WCG server isn't reporting this in a proper way for a certain BOINC client version, namely 7.17.0.
----------------------------------------
[Edit 2 times, last edit by adriverhoef at Aug 7, 2021 12:20:31 PM]
[Jul 26, 2021 11:51:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
sam6861
Advanced Cruncher
Joined: Mar 31, 2020
Post Count: 107
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Ridiculously short runtime, ridiculously low credit

I guess I agree a server side problem with wrong sent time. This causes low CPU time (server reduced this number), and low credits. I took a look at some of Boinc source code.

Step 1: Somehow a wrong sent time, unsure how this happened.

Step 2: Boinc/sched/sched_result.php Line 362: Drops the elapsed time and CPU time to a tiny amount. "impossible elapsed time" and "impossible CPU time". All this dropping of CPU time caused by wrong sent time. (edit: sched_result, not eched_result)

Step 3: I guess somewhere in the code, possibly in credit_test.php, there is claimed_credit = cpu_time / wu.rsc_fpops_est. When the cpu_time is very low, the credits also goes very low.
----------------------------------------
[Edit 1 times, last edit by sam6861 at Jul 27, 2021 1:09:51 AM]
[Jul 27, 2021 12:02:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread