| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 45
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
My remaining WUs all finished with RC = 0x1 in Job #3, taking between 6.3 and 9.6 hours.
|
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
One BETA arrived on the other computer; it isn't running yet, remaining time is 6 hours and 10 minutes for BETA_E236441_216_S.400.C56F1H25N2S1.UHXUUTGPCSCBNJ-UHFFFAOYSA-N.10_s1_14.
----------------------------------------Edit: One BETA just came in on this computer, too; not running yet, remaining time is also 6 hours and 10 minutes (this time for BETA_E236441_619_S.492.C55H21N3O3S6.ZSZKYKRHLLYRCH-UHFFFAOYSA-N.10_s1_14). (Slowly working my way towards a bronze BETA badge ... keep 'em coming! )Edit: Too bad, the first one wasn't very successful, uttering "process got signal 11" after 16 minutes. That's odd. I see 160 Valids for all other WCG-projects on that machine at this moment and only one Error. Here's the Event Log for that WU: <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [07:09:02] Number of jobs = 5 [07:09:02] Starting job 0,CPU time has been restored to 0.000000. [07:09:06] Starting new Job [07:09:06] Qink name = fldman [07:09:08] Qink name = gesman [07:09:10] Qink name = scfman </stderr_txt> ]]> Edit: First wingman got "Killing job because cpu time limit has been exceeded." Edit: Second wingman got "process exited with code 195 (0xc3, -61)". Edit: Third and fourth wingman also got "Killing job because cpu time limit has been exceeded." BETA_E236441_216_S.400.C56F1H25N2S1.UHXUUTGPCSCBNJ-UHFFFAOYSA-N.10_s1_14_4-- Linux 3.13.0-48-generic So everybody got rewarded, in spite of the "Error" message. [Edit 9 times, last edit by adriverhoef at Jul 30, 2016 9:26:07 AM] |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Running at 1.8-1.9Ghz [temps are up, speed down], the log times speak for themselves:
Result Name: BETA_ E236441_ 639_ S.486.C47H15N11O4S5.SRPWYKLTNMKEDP-UHFFFAOYSA-N.11_ s1_ 14_ 0-- <core_client_version>7.6.29</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [15:45:55] Number of jobs = 5 [15:45:55] Starting job 0,CPU time has been restored to 0.000000. [07:58:26] Finished Job #0 [07:58:26] Starting job 1,CPU time has been restored to 49909.306329. [09:31:11] Finished Job #1 [09:31:11] Starting job 2,CPU time has been restored to 55082.127888. [10:50:09] Finished Job #2 [10:50:09] Starting job 3,CPU time has been restored to 59240.069342. Application exited with RC = 0x1 [12:34:34] Finished Job #3 [12:34:34] Starting job 4,CPU time has been restored to 64544.056541. [12:34:35] Skipping Job #4 12:35:10 (15224): called boinc_finish </stderr_txt> ]]> 17:55 CPU hours at closing. |
||
|
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges:
|
We are releasing the rest of the workunits now.
Thanks, armstrdj |
||
|
|
Gurra
Cruncher Joined: Sep 11, 2006 Post Count: 33 Status: Offline Project Badges:
|
Look through the result error log and the client event log as to what file had this [event already happened during task parts download!]. There's a thread on the matter, where the -119 MD5 was resolved by taking the .dms files off the CDN distribution. Both -119 and -120 fall under the -186 [failed download exit code] The log file has registered the following events (abbreviated) on this machine: 25-Jul-2016 14:42:40 [WCG] Started download of wcgrid_beta11_qchem_prod_win32.exe.7.00 25-Jul-2016 14:42:44 [WCG] Started download of cep2_image02_7.00.tga 25-Jul-2016 14:44:33 [WCG] Temporarily failed download of cep2_image02_7.00.tga: transient HTTP error 25-Jul-2016 14:44:07 [WCG] Temporarily failed download of wcgrid_beta11_qchem_prod_win32.exe.7.00: transient HTTP error 25-Jul-2016 14:44:07 [WCG] Started download of 0bdae771d662bb34ad72c68d70bbb2d9.zip 25-Jul-2016 14:44:08 [WCG] Finished download of 0bdae771d662bb34ad72c68d70bbb2d9.zip 25-Jul-2016 14:44:42 [WCG] Started download of wcgrid_beta11_qchem_prod_win32.exe.7.00 25-Jul-2016 14:46:30 [WCG] Started download of cep2_image02_7.00.tga 25-Jul-2016 14:46:31 [WCG] Finished download of cep2_image02_7.00.tga 25-Jul-2016 14:46:47 [WCG] Finished download of wcgrid_beta11_qchem_prod_win32.exe.7.00 It looks to me as if cep2_image02_7.00.tga has downloaded correctly on the second try. I did a binary file compare of this cep2_image02_7.00.tga with the same file on another machine that has completed 2 beta WUs successfully. The file size is the same in both cases but there are major differences in file contents. The file cep2_image01_7.00.tga is equivalent on both machines. Any idea what brought on this problem? ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
If you set additional log flags you will see where the first and second download attempt came from, possibly one from the cloud, the other directly from the grid server. Guess more CDN pollution.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hehe, this is what you get when the time to completion on these beta units is way over-estimated:
26/07/2016 21:58:13 | World Community Grid | No tasks are available for the applications you have selected. 26/07/2016 21:58:13 | World Community Grid | Tasks won't finish in time: BOINC runs 100.0% of the time; computation is enabled 99.8% of that PS Whoever has stolen the missing 0.2%, I'd like it back please. |
||
|
|
yoro42
Ace Cruncher United States Joined: Feb 19, 2011 Post Count: 8979 Status: Offline Project Badges:
|
Current Status:
----------------------------------------BETA_ E236441_ 998_ S.488.C48H14N6O6S6.SLPQWXOGQDVNQJ-UHFFFAOYSA-N.12_ s1_ 14_ 0-- Ella In Progress 7/27/16 00:52:05 7/31/16 00:52:05 0.00 / 0.00 0.0 / 0.0 BETA_ E236441_ 284_ S.484.C49H14N8O3S6.OYGPFFGZYNTKSW-UHFFFAOYSA-N.17_ s1_ 14_ 0-- Zoot Valid 7/26/16 20:23:17 7/27/16 00:02:26 1.77 / 1.81 68.7 / 68.7 BETA_ E236441_ 287_ S.488.C51H18N4O5S6.CMVFTFJCJRSESD-UHFFFAOYSA-N.1_ s1_ 14_ 2-- Zoot Valid 7/25/16 13:47:41 7/26/16 09:40:47 9.60 / 9.81 377.0 / 379.0 BETA_ E236441_ 646_ S.486.C47H15N11O4S5.SRPWYKLTNMKEDP-UHFFFAOYSA-N.18_ s1_ 14_ 1-- Miles In Progress 7/26/16 20:59:24 7/30/16 20:59:24 0.00 / 0.00 0.0 / 0.0 BETA_ E236441_ 490_ S.400.C51H25N3O4S1.MZLYLDQNOUJPBQ-UHFFFAOYSA-N.5_ s1_ 14_ 1-- StanGetz In Progress 7/26/16 20:20:41 7/30/16 20:20:41 0.00 / 0.00 0.0 / 0.0 BETA_ E236441_ 248_ S.388.C38F1H13N6O2S5.UXKLNPGIPZQNCL-UHFFFAOYSA-N.4_ s1_ 14_ 1-- Lester-Young In Progress 7/26/16 13:01:06 7/30/16 13:01:06 0.00 / 0.00 0.0 / 0.0 BETA_ E236441_ 834_ S.400.C50H24N4O4S1.XRMGMKMCKGIJEQ-UHFFFAOYSA-N.19_ s1_ 14_ 0-- Coltrane Invalid 7/25/16 13:42:50 7/26/16 12:36:22 12.21 / 12.63 251.9 / 251.9 Result Log Result Name:Â BETA_ E236441_ 834_ S.400.C50H24N4O4S1.XRMGMKMCKGIJEQ-UHFFFAOYSA-N.19_ s1_14_0-- <core_client_version>7.2.47</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [16:49:56] Number of jobs = 5 [16:49:56] Starting job 0,CPU time has been restored to 0.000000. [02:57:16] Finished Job #0 [02:57:16] Starting job 1,CPU time has been restored to 34732.249841. [03:51:09] Finished Job #1 [03:51:09] Starting job 2,CPU time has been restored to 37892.876902. [04:26:58] Finished Job #2 [04:26:58] Starting job 3,CPU time has been restored to 39991.589555. Application exited with RC = 0x1 [05:34:45] Finished Job #3 [05:34:45] Starting job 4,CPU time has been restored to 43974.295085. [05:34:45] Skipping Job #4 05:34:49 (7180): called boinc_finish </stderr_txt> ]]> ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
A few repair jobs have appeared already.
With this one, _0 exited with RC = 0x1 in Job #0, _1 exited with RC = 0x1 in Job #3. So it seems that the verifier still doesn't accept both as being valid. I suspect that one of them will turn Invalid. BETA_ E236441_ 317_ S.488.C50H18N6O4S6.HLJAKKOLHWECLK-UHFFFAOYSA-N.12_ s1_ 14_ 2-- Microsoft Windows 10 Core x64 Edition, (10.00.10586.00) - In Progress 27/07/16 05:43:08 31/07/16 05:43:08 0.00 0.0 / 0.0 BETA_ E236441_ 317_ S.488.C50H18N6O4S6.HLJAKKOLHWECLK-UHFFFAOYSA-N.12_ s1_ 14_ 1-- Microsoft Windows 10 x64 Edition, (10.00.10586.00) 700 Pending Verification 26/07/16 21:02:59 27/07/16 05:42:56 7.89 264.9 / 0.0 BETA_ E236441_ 317_ S.488.C50H18N6O4S6.HLJAKKOLHWECLK-UHFFFAOYSA-N.12_ s1_ 14_ 0-- Microsoft Windows 8.1 Professional x64 Edition, (06.03.9600.00) 700 Pending Verification 26/07/16 21:02:21 27/07/16 03:32:37 1.57 44.7 / 0.0 Then here's a -119 error: BETA_ E236441_ 40_ S.400.C56F1H25N2S1.LOUQHRXRTVLJRV-UHFFFAOYSA-N.2_ s1_ 14_ 1-- <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> WU download error: couldn't get input files: <file_xfer_error> <file_name>beta11.CleanEnergyProjectLogo_2.tga</file_name> <error_code>-119 (md5 checksum failed for file)</error_code> <error_message>MD5 check failed</error_message> </file_xfer_error> and a -120 error: BETA_ E236441_ 854_ S.396.C44F2H20N2S5.WYRHUGMFRDHYMU-UHFFFAOYSA-N.1_ s1_ 14_ 1-- <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> app_version download error: couldn't get input files: <file_xfer_error> <file_name>wcgrid_beta11_qchem_prod_win32.exe.7.00</file_name> <error_code>-120 (RSA key check failed for file)</error_code> </file_xfer_error> <file_xfer_error> <file_name>cep2_image02_7.00.tga</file_name> <error_code>-120 (RSA key check failed for file)</error_code> </file_xfer_error> |
||
|
|
TonyEllis
Senior Cruncher Australia Joined: Jul 9, 2008 Post Count: 286 Status: Offline Project Badges:
|
Would have been good in this Beta to also have taken the opportunity to base the CPU time limit on the CPU capability - 18 hrs is not enough for a slower CPU...
----------------------------------------Result Name: BETA_ E236441_ 658_ S.486.C47H15N11O4S5.PWDYCDIXXPPBQP-UHFFFAOYSA-N.11_ s1_ 14_ 0-- <core_client_version>7.2.33</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [23:43:46] Number of jobs = 5 [23:43:46] Starting job 0,CPU time has been restored to 0.000000. [23:43:50] Starting new Job [23:43:50] Qink name = fldman [23:44:31] Qink name = gesman [23:45:20] Qink name = scfman Killing job because cpu time limit has been exceeded. 0.000000||64800.452833||0.000000 [18:19:41] Finished Job #0 18:19:47 (27630): called boinc_finish </stderr_txt> ]]> and two more heading for the same fate... I know it's to catch a divergent case - but surely the time limit should not be fixed but a function of the CPU speed... one size does not fit all :-) EDIT: Sure enough - another 18 hour limit hit... Result Name: BETA_ E236441_ 258_ S.388.C38F1H13N6O2S5.UXKLNPGIPZQNCL-UHFFFAOYSA-N.14_ s1_ 14_ 1-- <core_client_version>7.2.33</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [23:03:49] Number of jobs = 5 [23:03:49] Starting job 0,CPU time has been restored to 0.000000. [23:03:52] Starting new Job [23:03:53] Qink name = fldman [23:04:13] Qink name = gesman [23:04:35] Qink name = scfman [13:37:50] Number of jobs = 5 [13:37:50] Starting job 0,CPU time has been restored to 0.000000. [13:37:56] Starting new Job [13:37:57] Qink name = fldman [13:38:18] Qink name = gesman [13:38:40] Qink name = scfman [04:20:38] Qink name = anlman [04:20:39] Qink name = drvman [04:40:09] Qink name = optman [04:40:14] Qink name = fldman [04:40:14] Qink name = gesman [04:40:35] Qink name = scfman [06:55:45] Qink name = anlman [06:55:46] Qink name = drvman [07:13:57] Qink name = optman [07:13:59] Qink name = fldman [07:13:59] Qink name = gesman [07:14:20] Qink name = scfman Killing job because cpu time limit has been exceeded. 0.000000||64800.022898||0.000000 [12:54:00] Finished Job #0 12:54:03 (4454): called boinc_finish </stderr_txt> ]]> EDIT2: and the next... Result Name: BETA_ E236441_ 1_ S.482.C50H14N8O2S6.NZGFEWVPGNNTIL-UHFFFAOYSA-N.12_ s1_ 14_ 1-- <core_client_version>7.2.33</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [06:23:52] Number of jobs = 5 [06:23:52] Starting job 0,CPU time has been restored to 0.000000. [06:23:55] Starting new Job [06:23:56] Qink name = fldman [06:24:32] Qink name = gesman [06:25:16] Qink name = scfman [13:37:50] Number of jobs = 5 [13:37:50] Starting job 0,CPU time has been restored to 0.000000. [13:37:56] Starting new Job [13:37:57] Qink name = fldman [13:38:33] Qink name = gesman [13:39:17] Qink name = scfman [03:01:25] Qink name = anlman [03:01:26] Qink name = drvman [03:23:59] Qink name = optman [03:24:07] Qink name = fldman [03:24:07] Qink name = gesman [03:24:43] Qink name = scfman [06:14:02] Qink name = anlman [06:14:03] Qink name = drvman [06:36:25] Qink name = optman [06:36:31] Qink name = fldman [06:36:31] Qink name = gesman [06:37:11] Qink name = scfman Killing job because cpu time limit has been exceeded. 0.000000||64800.078890||0.000000 [12:56:10] Finished Job #0 12:56:15 (4456): called boinc_finish </stderr_txt> ]]>
Run Time Stats https://grassmere-productions.no-ip.biz/
----------------------------------------[Edit 2 times, last edit by TonyEllis at Jul 28, 2016 3:45:57 AM] |
||
|
|
|