Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 45
Posts: 45   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 10665 times and has 44 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: CEP2 beta test July 25, 2016 [Issues Thread]

My remaining WUs all finished with RC = 0x1 in Job #3, taking between 6.3 and 9.6 hours.
[Jul 26, 2016 6:53:59 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CEP2 beta test July 25, 2016 [Issues Thread]

One BETA arrived on the other computer; it isn't running yet, remaining time is 6 hours and 10 minutes for BETA_E236441_216_S.400.C56F1H25N2S1.UHXUUTGPCSCBNJ-UHFFFAOYSA-N.10_s1_14.
Edit:
One BETA just came in on this computer, too; not running yet, remaining time is also 6 hours and 10 minutes (this time for BETA_E236441_619_S.492.C55H21N3O3S6.ZSZKYKRHLLYRCH-UHFFFAOYSA-N.10_s1_14).

(Slowly working my way towards a bronze BETA badge ... keep 'em coming! hugs )

Edit:
Too bad, the first one wasn't very successful, uttering "process got signal 11" after 16 minutes. That's odd. I see 160 Valids for all other WCG-projects on that machine at this moment and only one Error. Here's the Event Log for that WU:

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
process got signal 11
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[07:09:02] Number of jobs = 5
[07:09:02] Starting job 0,CPU time has been restored to 0.000000.
[07:09:06] Starting new Job
[07:09:06] Qink name = fldman
[07:09:08] Qink name = gesman
[07:09:10] Qink name = scfman

</stderr_txt>
]]>

Edit: First wingman got "Killing job because cpu time limit has been exceeded."
Edit: Second wingman got "process exited with code 195 (0xc3, -61)".
Edit: Third and fourth wingman also got "Killing job because cpu time limit has been exceeded."
BETA_E236441_216_S.400.C56F1H25N2S1.UHXUUTGPCSCBNJ-UHFFFAOYSA-N.10_s1_14_4--	Linux	3.13.0-48-generic
700 Error 29/07/16 01:33:54 29/07/16 20:10:59 18.00 534.2 / 534.2
BETA_E236441_216_S.400.C56F1H25N2S1.UHXUUTGPCSCBNJ-UHFFFAOYSA-N.10_s1_14_3-- Linux 4.4.0-31-generic
700 Too Late 28/07/16 06:47:55 29/07/16 01:33:47 18.00 404.0 / 404.0
BETA_E236441_216_S.400.C56F1H25N2S1.UHXUUTGPCSCBNJ-UHFFFAOYSA-N.10_s1_14_2-- Linux 3.13.0-88-generic
700 Error 27/07/16 06:32:19 28/07/16 06:47:52 15.90 594.0 / 594.0
BETA_E236441_216_S.400.C56F1H25N2S1.UHXUUTGPCSCBNJ-UHFFFAOYSA-N.10_s1_14_1-- Linux 4.4.0-31-generic
700 Error 25/07/16 13:45:00 27/07/16 06:23:28 18.00 205.0 / 205.0
BETA_E236441_216_S.400.C56F1H25N2S1.UHXUUTGPCSCBNJ-UHFFFAOYSA-N.10_s1_14_0-- Linux 4.5.7-202.fc23.x86_64
700 Error 25/07/16 13:42:48 27/07/16 06:31:58 0.28 10.5 / 10.5

So everybody got rewarded, in spite of the "Error" message.
----------------------------------------
[Edit 9 times, last edit by adriverhoef at Jul 30, 2016 9:26:07 AM]
[Jul 26, 2016 11:41:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: CEP2 beta test July 25, 2016 [Issues Thread]

Running at 1.8-1.9Ghz [temps are up, speed down], the log times speak for themselves:

Result Name: BETA_ E236441_ 639_ S.486.C47H15N11O4S5.SRPWYKLTNMKEDP-UHFFFAOYSA-N.11_ s1_ 14_ 0--
<core_client_version>7.6.29</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[15:45:55] Number of jobs = 5
[15:45:55] Starting job 0,CPU time has been restored to 0.000000.
[07:58:26] Finished Job #0
[07:58:26] Starting job 1,CPU time has been restored to 49909.306329.
[09:31:11] Finished Job #1
[09:31:11] Starting job 2,CPU time has been restored to 55082.127888.
[10:50:09] Finished Job #2
[10:50:09] Starting job 3,CPU time has been restored to 59240.069342.
Application exited with RC = 0x1
[12:34:34] Finished Job #3
[12:34:34] Starting job 4,CPU time has been restored to 64544.056541.
[12:34:35] Skipping Job #4
12:35:10 (15224): called boinc_finish

</stderr_txt>
]]>

17:55 CPU hours at closing.
[Jul 26, 2016 12:07:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CEP2 beta test July 25, 2016 [Issues Thread]

We are releasing the rest of the workunits now.

Thanks,
armstrdj
[Jul 26, 2016 12:59:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Gurra
Cruncher
Joined: Sep 11, 2006
Post Count: 33
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CEP2 beta test July 25, 2016 [Issues Thread]

Look through the result error log and the client event log as to what file had this [event already happened during task parts download!]. There's a thread on the matter, where the -119 MD5 was resolved by taking the .dms files off the CDN distribution. Both -119 and -120 fall under the -186 [failed download exit code]


The log file has registered the following events (abbreviated) on this machine:
25-Jul-2016 14:42:40 [WCG] Started download of wcgrid_beta11_qchem_prod_win32.exe.7.00
25-Jul-2016 14:42:44 [WCG] Started download of cep2_image02_7.00.tga
25-Jul-2016 14:44:33 [WCG] Temporarily failed download of cep2_image02_7.00.tga: transient HTTP error
25-Jul-2016 14:44:07 [WCG] Temporarily failed download of wcgrid_beta11_qchem_prod_win32.exe.7.00: transient HTTP error
25-Jul-2016 14:44:07 [WCG] Started download of 0bdae771d662bb34ad72c68d70bbb2d9.zip
25-Jul-2016 14:44:08 [WCG] Finished download of 0bdae771d662bb34ad72c68d70bbb2d9.zip
25-Jul-2016 14:44:42 [WCG] Started download of wcgrid_beta11_qchem_prod_win32.exe.7.00
25-Jul-2016 14:46:30 [WCG] Started download of cep2_image02_7.00.tga
25-Jul-2016 14:46:31 [WCG] Finished download of cep2_image02_7.00.tga
25-Jul-2016 14:46:47 [WCG] Finished download of wcgrid_beta11_qchem_prod_win32.exe.7.00

It looks to me as if cep2_image02_7.00.tga has downloaded correctly on the second try.
I did a binary file compare of this cep2_image02_7.00.tga with the same file on another machine that has completed 2 beta WUs successfully. The file size is the same in both cases but there are major differences in file contents.
The file cep2_image01_7.00.tga is equivalent on both machines.

Any idea what brought on this problem?
----------------------------------------

[Jul 26, 2016 1:51:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: CEP2 beta test July 25, 2016 [Issues Thread]

If you set additional log flags you will see where the first and second download attempt came from, possibly one from the cloud, the other directly from the grid server. Guess more CDN pollution.
[Jul 26, 2016 1:59:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: CEP2 beta test July 25, 2016 [Issues Thread]

Hehe, this is what you get when the time to completion on these beta units is way over-estimated:
26/07/2016 21:58:13 | World Community Grid | No tasks are available for the applications you have selected.
26/07/2016 21:58:13 | World Community Grid | Tasks won't finish in time: BOINC runs 100.0% of the time; computation is enabled 99.8% of that
PS Whoever has stolen the missing 0.2%, I'd like it back please.
[Jul 26, 2016 9:09:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
yoro42
Ace Cruncher
United States
Joined: Feb 19, 2011
Post Count: 8979
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CEP2 beta test July 25, 2016 [Issues Thread]

Current Status:

BETA_ E236441_ 998_ S.488.C48H14N6O6S6.SLPQWXOGQDVNQJ-UHFFFAOYSA-N.12_ s1_ 14_ 0-- Ella In Progress 7/27/16 00:52:05 7/31/16 00:52:05 0.00 / 0.00 0.0 / 0.0
BETA_ E236441_ 284_ S.484.C49H14N8O3S6.OYGPFFGZYNTKSW-UHFFFAOYSA-N.17_ s1_ 14_ 0-- Zoot Valid 7/26/16 20:23:17 7/27/16 00:02:26 1.77 / 1.81 68.7 / 68.7
BETA_ E236441_ 287_ S.488.C51H18N4O5S6.CMVFTFJCJRSESD-UHFFFAOYSA-N.1_ s1_ 14_ 2-- Zoot Valid 7/25/16 13:47:41 7/26/16 09:40:47 9.60 / 9.81 377.0 / 379.0
BETA_ E236441_ 646_ S.486.C47H15N11O4S5.SRPWYKLTNMKEDP-UHFFFAOYSA-N.18_ s1_ 14_ 1-- Miles In Progress 7/26/16 20:59:24 7/30/16 20:59:24 0.00 / 0.00 0.0 / 0.0
BETA_ E236441_ 490_ S.400.C51H25N3O4S1.MZLYLDQNOUJPBQ-UHFFFAOYSA-N.5_ s1_ 14_ 1-- StanGetz In Progress 7/26/16 20:20:41 7/30/16 20:20:41 0.00 / 0.00 0.0 / 0.0
BETA_ E236441_ 248_ S.388.C38F1H13N6O2S5.UXKLNPGIPZQNCL-UHFFFAOYSA-N.4_ s1_ 14_ 1-- Lester-Young In Progress 7/26/16 13:01:06 7/30/16 13:01:06 0.00 / 0.00 0.0 / 0.0

BETA_ E236441_ 834_ S.400.C50H24N4O4S1.XRMGMKMCKGIJEQ-UHFFFAOYSA-N.19_ s1_ 14_ 0-- Coltrane Invalid 7/25/16 13:42:50 7/26/16 12:36:22 12.21 / 12.63 251.9 / 251.9
Result Log
Result Name: 
BETA_ E236441_ 834_ S.400.C50H24N4O4S1.XRMGMKMCKGIJEQ-UHFFFAOYSA-N.19_ s1_14_0--
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[16:49:56] Number of jobs = 5
[16:49:56] Starting job 0,CPU time has been restored to 0.000000.
[02:57:16] Finished Job #0
[02:57:16] Starting job 1,CPU time has been restored to 34732.249841.
[03:51:09] Finished Job #1
[03:51:09] Starting job 2,CPU time has been restored to 37892.876902.
[04:26:58] Finished Job #2
[04:26:58] Starting job 3,CPU time has been restored to 39991.589555.
Application exited with RC = 0x1
[05:34:45] Finished Job #3
[05:34:45] Starting job 4,CPU time has been restored to 43974.295085.
[05:34:45] Skipping Job #4
05:34:49 (7180): called boinc_finish
</stderr_txt>
]]>
----------------------------------------

[Jul 27, 2016 6:28:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: CEP2 beta test July 25, 2016 [Issues Thread]

A few repair jobs have appeared already.

With this one, _0 exited with RC = 0x1 in Job #0, _1 exited with RC = 0x1 in Job #3. So it seems that the verifier still doesn't accept both as being valid. I suspect that one of them will turn Invalid.
BETA_ E236441_ 317_ S.488.C50H18N6O4S6.HLJAKKOLHWECLK-UHFFFAOYSA-N.12_ s1_ 14_ 2-- Microsoft Windows 10 Core x64 Edition, (10.00.10586.00) - In Progress 27/07/16 05:43:08 31/07/16 05:43:08 0.00 0.0 / 0.0
BETA_ E236441_ 317_ S.488.C50H18N6O4S6.HLJAKKOLHWECLK-UHFFFAOYSA-N.12_ s1_ 14_ 1-- Microsoft Windows 10 x64 Edition, (10.00.10586.00) 700 Pending Verification 26/07/16 21:02:59 27/07/16 05:42:56 7.89 264.9 / 0.0
BETA_ E236441_ 317_ S.488.C50H18N6O4S6.HLJAKKOLHWECLK-UHFFFAOYSA-N.12_ s1_ 14_ 0-- Microsoft Windows 8.1 Professional x64 Edition, (06.03.9600.00) 700 Pending Verification 26/07/16 21:02:21 27/07/16 03:32:37 1.57 44.7 / 0.0

Then here's a -119 error:
BETA_ E236441_ 40_ S.400.C56F1H25N2S1.LOUQHRXRTVLJRV-UHFFFAOYSA-N.2_ s1_ 14_ 1--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>beta11.CleanEnergyProjectLogo_2.tga</file_name>
<error_code>-119 (md5 checksum failed for file)</error_code>
<error_message>MD5 check failed</error_message>
</file_xfer_error>

and a -120 error:
BETA_ E236441_ 854_ S.396.C44F2H20N2S5.WYRHUGMFRDHYMU-UHFFFAOYSA-N.1_ s1_ 14_ 1--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>wcgrid_beta11_qchem_prod_win32.exe.7.00</file_name>
<error_code>-120 (RSA key check failed for file)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>cep2_image02_7.00.tga</file_name>
<error_code>-120 (RSA key check failed for file)</error_code>
</file_xfer_error>
[Jul 27, 2016 7:57:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TonyEllis
Senior Cruncher
Australia
Joined: Jul 9, 2008
Post Count: 286
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CEP2 beta test July 25, 2016 [Issues Thread]

Would have been good in this Beta to also have taken the opportunity to base the CPU time limit on the CPU capability - 18 hrs is not enough for a slower CPU...

Result Name: BETA_ E236441_ 658_ S.486.C47H15N11O4S5.PWDYCDIXXPPBQP-UHFFFAOYSA-N.11_ s1_ 14_ 0--
<core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[23:43:46] Number of jobs = 5
[23:43:46] Starting job 0,CPU time has been restored to 0.000000.
[23:43:50] Starting new Job
[23:43:50] Qink name = fldman
[23:44:31] Qink name = gesman
[23:45:20] Qink name = scfman
Killing job because cpu time limit has been exceeded. 0.000000||64800.452833||0.000000
[18:19:41] Finished Job #0
18:19:47 (27630): called boinc_finish

</stderr_txt>
]]>

and two more heading for the same fate...

I know it's to catch a divergent case - but surely the time limit should not be fixed but a function of the CPU speed... one size does not fit all :-)
EDIT:
Sure enough - another 18 hour limit hit...
Result Name: BETA_ E236441_ 258_ S.388.C38F1H13N6O2S5.UXKLNPGIPZQNCL-UHFFFAOYSA-N.14_ s1_ 14_ 1--
<core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[23:03:49] Number of jobs = 5
[23:03:49] Starting job 0,CPU time has been restored to 0.000000.
[23:03:52] Starting new Job
[23:03:53] Qink name = fldman
[23:04:13] Qink name = gesman
[23:04:35] Qink name = scfman
[13:37:50] Number of jobs = 5
[13:37:50] Starting job 0,CPU time has been restored to 0.000000.
[13:37:56] Starting new Job
[13:37:57] Qink name = fldman
[13:38:18] Qink name = gesman
[13:38:40] Qink name = scfman
[04:20:38] Qink name = anlman
[04:20:39] Qink name = drvman
[04:40:09] Qink name = optman
[04:40:14] Qink name = fldman
[04:40:14] Qink name = gesman
[04:40:35] Qink name = scfman
[06:55:45] Qink name = anlman
[06:55:46] Qink name = drvman
[07:13:57] Qink name = optman
[07:13:59] Qink name = fldman
[07:13:59] Qink name = gesman
[07:14:20] Qink name = scfman
Killing job because cpu time limit has been exceeded. 0.000000||64800.022898||0.000000
[12:54:00] Finished Job #0
12:54:03 (4454): called boinc_finish

</stderr_txt>
]]>

EDIT2:
and the next...
Result Name: BETA_ E236441_ 1_ S.482.C50H14N8O2S6.NZGFEWVPGNNTIL-UHFFFAOYSA-N.12_ s1_ 14_ 1--
<core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[06:23:52] Number of jobs = 5
[06:23:52] Starting job 0,CPU time has been restored to 0.000000.
[06:23:55] Starting new Job
[06:23:56] Qink name = fldman
[06:24:32] Qink name = gesman
[06:25:16] Qink name = scfman
[13:37:50] Number of jobs = 5
[13:37:50] Starting job 0,CPU time has been restored to 0.000000.
[13:37:56] Starting new Job
[13:37:57] Qink name = fldman
[13:38:33] Qink name = gesman
[13:39:17] Qink name = scfman
[03:01:25] Qink name = anlman
[03:01:26] Qink name = drvman
[03:23:59] Qink name = optman
[03:24:07] Qink name = fldman
[03:24:07] Qink name = gesman
[03:24:43] Qink name = scfman
[06:14:02] Qink name = anlman
[06:14:03] Qink name = drvman
[06:36:25] Qink name = optman
[06:36:31] Qink name = fldman
[06:36:31] Qink name = gesman
[06:37:11] Qink name = scfman
Killing job because cpu time limit has been exceeded. 0.000000||64800.078890||0.000000
[12:56:10] Finished Job #0
12:56:15 (4456): called boinc_finish

</stderr_txt>
]]>
----------------------------------------
----------------------------------------
[Edit 2 times, last edit by TonyEllis at Jul 28, 2016 3:45:57 AM]
[Jul 27, 2016 8:11:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 45   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread