Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 18
Posts: 18   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 4550 times and has 17 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

I've had error -120 (RSA key check failed for file) again on a repair job for the 17 June beta. Both my _3 and the subsequent _4 suffered it. There were different errors for _0 and _2, but _1 finished successfully. This time, I had the file_xfer_debug log flag set; I hope I've extracted the relevant lines, as there was an upload in the middle of it as well.

BETA_ HST1_ 004073_ 000084_ AC0018_ T325_ F00008_ S00005_ 4-- Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) 721 Error 21/06/16 17:22:52 21/06/16 17:24:59 0.00 296.9 / 0.0
BETA_ HST1_ 004073_ 000084_ AC0018_ T325_ F00008_ S00005_ 3-- Microsoft Windows 10 Core x64 Edition, (10.00.10586.00) 721 Error 21/06/16 17:20:20 21/06/16 17:22:50 0.00 296.9 / 0.0
BETA_ HST1_ 004073_ 000084_ AC0018_ T325_ F00008_ S00005_ 2-- Microsoft Windows 10 Professional x86 Edition, (10.00.10586.00) 721 Error 17/06/16 20:35:54 21/06/16 17:20:18 0.00 0.0 / 0.0
BETA_ HST1_ 004073_ 000084_ AC0018_ T325_ F00008_ S00005_ 1-- Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) 721 Too Late 17/06/16 20:25:35 18/06/16 20:16:22 22.67 296.9 / 296.9
BETA_ HST1_ 004073_ 000084_ AC0018_ T325_ F00008_ S00005_ 0-- Microsoft Windows XP Home x86 Edition, Service Pack 3, (05.01.2600.00) 721 Error 17/06/16 20:25:25 17/06/16 20:35:47 0.00 0.0 / 0.0

21-Jun-2016 18:20:20 [World Community Grid] Scheduler request completed: got 1 new tasks
21-Jun-2016 18:20:23 [World Community Grid] Started download of wcgrid_beta22_gromacs_7.21_windows_intelx86
21-Jun-2016 18:20:23 [World Community Grid] [file_xfer] URL: http://bdd7.http.cdn.softlayer.net/80BDD7/gri...acs_7.21_windows_intelx86
21-Jun-2016 18:20:23 [World Community Grid] Started download of wcgrid_HST1_graphics_prod_32.exe.7.21
21-Jun-2016 18:20:23 [World Community Grid] [file_xfer] URL: http://bdd7.http.cdn.softlayer.net/80BDD7/gri...graphics_prod_32.exe.7.21
21-Jun-2016 18:20:23 [World Community Grid] Started download of fad3ccac0a95d86a2381b3e218f17644.tpr
21-Jun-2016 18:20:23 [World Community Grid] [file_xfer] URL: https://grid.worldcommunitygrid.org/boinc/dow...5d86a2381b3e218f17644.tpr
21-Jun-2016 18:20:28 [World Community Grid] [file_xfer] http op done; retval 0 (Success)
21-Jun-2016 18:20:28 [World Community Grid] [file_xfer] file transfer status 0 (Success)
21-Jun-2016 18:20:28 [World Community Grid] Finished download of wcgrid_HST1_graphics_prod_32.exe.7.21
21-Jun-2016 18:20:28 [World Community Grid] [file_xfer] Throughput 84422 bytes/sec
21-Jun-2016 18:21:19 [---] Project communication failed: attempting access to reference site
21-Jun-2016 18:21:19 [World Community Grid] [file_xfer] http op done; retval -184 (transient HTTP error)
21-Jun-2016 18:21:19 [World Community Grid] [file_xfer] file transfer status -184 (transient HTTP error)
21-Jun-2016 18:21:19 [World Community Grid] Temporarily failed download of wcgrid_beta22_gromacs_7.21_windows_intelx86: transient HTTP error
21-Jun-2016 18:21:21 [---] Internet access OK - project servers may be temporarily down.
21-Jun-2016 18:21:21 [World Community Grid] Started download of wcgrid_beta22_gromacs_7.21_windows_intelx86
21-Jun-2016 18:21:21 [World Community Grid] [file_xfer] URL: https://grid.worldcommunitygrid.org/boinc/dow...acs_7.21_windows_intelx86
21-Jun-2016 18:21:23 [World Community Grid] [file_xfer] http op done; retval 0 (Success)
21-Jun-2016 18:21:23 [World Community Grid] [file_xfer] file transfer status 0 (Success)
21-Jun-2016 18:21:23 [World Community Grid] Finished download of wcgrid_beta22_gromacs_7.21_windows_intelx86
21-Jun-2016 18:21:23 [World Community Grid] [file_xfer] Throughput 12580 bytes/sec
21-Jun-2016 18:21:40 [World Community Grid] [file_xfer] http op done; retval 0 (Success)
21-Jun-2016 18:21:40 [World Community Grid] [file_xfer] file transfer status 0 (Success)
21-Jun-2016 18:21:40 [World Community Grid] Finished download of fad3ccac0a95d86a2381b3e218f17644.tpr
21-Jun-2016 18:21:40 [World Community Grid] [file_xfer] Throughput 62526 bytes/sec

So there was the "usual" transient HTTP error for one of the files, followed by apparent success - "file transfer status 0 (Success)" - but despite that the result log gives download error:

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>wcgrid_beta22_gromacs_7.21_windows_intelx86</file_name>
<error_code>-120 (RSA key check failed for file)</error_code>
</file_xfer_error>
</message>
]]>

The one thing I do notice is that the download URLs are different for the 2 attempts at downloading wcgrid_beta22_gromacs_7.21_windows_intelx86. The first is via the CDN

URL: http://bdd7.http.cdn.softlayer.net/80BDD7/gri...acs_7.21_windows_intelx86

whereas the second is direct

URL: https://grid.worldcommunitygrid.org/boinc/dow...acs_7.21_windows_intelx86

Could this changeover during the download be causing the error?
[Jun 22, 2016 9:49:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
RTS48
Veteran Cruncher
Bolivia
Joined: Aug 2, 2009
Post Count: 1350
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

Sorry - I don't quite understand how you can get an Invalid result for a Beta Test (let alone two). Once again the logs (which I will not bore you with) show a clean exit and a full completion.

I'm assuming these two WUs were declared invalid when exiting the PV Jail, so you still haven't got this issue sorted yet.

Invalids were
BETA_ HST1_ 003931_ 000011_ AC0031_ T325_ F00035_ S00005_ 2--
and
BETA_ HST1_ 004074_ 000003_ AC0030_ T300_ F00099_ S00006_ 2--

Others are awaiting Parole from the PV Jail.
----------------------------------------
Rod Peel
Santa Cruz
Bolivia
South America

,
,
[Jun 22, 2016 10:29:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

You mean like this recent example (mine is the _2 repair job)?

BETA_ HST1_ 004076_ 000064_ AT0008_ T300_ F00015_ S00006_ 2-- Microsoft Windows 10 Professional x64 Edition, (10.00.10586.00) 721 Valid 22/06/16 19:48:06 23/06/16 00:11:10 4.19 160.8 / 203.9
BETA_ HST1_ 004076_ 000064_ AT0008_ T300_ F00015_ S00006_ 1-- Microsoft Windows 8.1 Professional x64 Edition, (06.03.9600.00) 721 Valid 17/06/16 20:40:01 19/06/16 19:39:03 13.79 247.1 / 203.9
BETA_ HST1_ 004076_ 000064_ AT0008_ T300_ F00015_ S00006_ 0-- Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) 721 Invalid 17/06/16 20:39:57 22/06/16 19:47:59 7.70 210.0 / 203.9

I don't understand your first sentence, though. Beta tests need to run the same validation software as the intended production will, otherwise it's not a comprehensive test. As always, an Invalid means a result different from the majority.

In my example above, _0 and _1 were both PVer initially.

Yes, there's still an issue to be resolved.
[Jun 23, 2016 6:53:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
yoro42
Ace Cruncher
United States
Joined: Feb 19, 2011
Post Count: 8976
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

BETA_ HST1_ 004072_ 000089_ MT0007_ T350_ F00089_ S00005_ 1-- Darwin 15.5.0 721 Invalid 6/17/16 20:26:42 6/19/16 08:27:14 12.95 421.6 / 421.6
Result Log:

Result Name: BETA_ HST1_ 004072_ 000089_ MT0007_ T350_ F00089_ S00005_ 1--
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
INFO: result number = 1
INFO: No state to restore. Start from the beginning.
[09:57:44] INFO: Running initial simulation
Writing checkpoint at step 250.
Writing checkpoint at step 500.
Writing checkpoint at step 760.
Writing checkpoint at step 1010.
Writing checkpoint at step 1270.
Writing checkpoint at step 1530.
Writing checkpoint at step 1790.
[10:37:18] INFO: Completed step 2000 of initial simulation
Writing checkpoint at step 2040.
Writing checkpoint at step 2270.
Writing checkpoint at step 2520.
Writing checkpoint at step 2720.
Writing checkpoint at step 2880.
INFO: result number = 1
[11:02:01] INFO: Running initial simulation

Reading checkpoint file state.cpt generated: Sat Jun 18 10:58:01 2016


Writing checkpoint at step 3130.
Writing checkpoint at step 3320.
Writing checkpoint at step 3560.
Writing checkpoint at step 3870.
[11:24:20] INFO: Completed step 4000 of initial simulation
Writing checkpoint at step 4130.
Writing checkpoint at step 4340.
Writing checkpoint at step 4570.
Writing checkpoint at step 4790.
Writing checkpoint at step 5080.
Writing checkpoint at step 5320.
Writing checkpoint at step 5580.
Writing checkpoint at step 5830.
INFO: result number = 1
[12:14:59] INFO: Running initial simulation

Reading checkpoint file state.cpt generated: Sat Jun 18 12:03:15 2016


[12:16:19] INFO: Completed step 6000 of initial simulation
Writing checkpoint at step 6490.
Writing checkpoint at step 7150.
Writing checkpoint at step 7810.
[12:31:32] INFO: Completed step 8000 of initial simulation
Writing checkpoint at step 8470.
Writing checkpoint at step 9130.
Writing checkpoint at step 9790.
[12:46:42] INFO: Completed step 10000 of initial simulation
Writing checkpoint at step 10450.
Writing checkpoint at step 11100.
Writing checkpoint at step 11760.
[13:01:57] INFO: Completed step 12000 of initial simulation
Writing checkpoint at step 12410.
Writing checkpoint at step 13060.
Writing checkpoint at step 13720.
[13:17:15] INFO: Completed step 14000 of initial simulation
Writing checkpoint at step 14370.
Writing checkpoint at step 15020.
Writing checkpoint at step 15680.
[13:32:33] INFO: Completed step 16000 of initial simulation
Writing checkpoint at step 16330.
Writing checkpoint at step 16980.
Writing checkpoint at step 17630.
[13:47:54] INFO: Completed step 18000 of initial simulation
Writing checkpoint at step 18280.
Writing checkpoint at step 18930.
Writing checkpoint at step 19580.
[14:03:18] INFO: Completed step 20000 of initial simulation
Writing checkpoint at step 20230.
Writing checkpoint at step 20880.
Writing checkpoint at step 21520.
[14:18:48] INFO: Completed step 22000 of initial simulation
Writing checkpoint at step 22170.
Writing checkpoint at step 22820.
Writing checkpoint at step 23470.
[14:34:09] INFO: Completed step 24000 of initial simulation
Writing checkpoint at step 24120.
Writing checkpoint at step 24770.
Writing checkpoint at step 25420.
[14:49:35] INFO: Completed step 26000 of initial simulation
Writing checkpoint at step 26070.
Writing checkpoint at step 26710.
Writing checkpoint at step 27370.
[15:05:01] INFO: Completed step 28000 of initial simulation
Writing checkpoint at step 28010.
Writing checkpoint at step 28650.
Writing checkpoint at step 29300.
Writing checkpoint at step 29940.
[15:20:36] INFO: Completed step 30000 of initial simulation
Writing checkpoint at step 30570.
Writing checkpoint at step 31230.
Writing checkpoint at step 31880.
[15:36:03] INFO: Completed step 32000 of initial simulation
Writing checkpoint at step 32530.
Writing checkpoint at step 33180.
Writing checkpoint at step 33840.
[15:51:20] INFO: Completed step 34000 of initial simulation
Writing checkpoint at step 34490.
Writing checkpoint at step 35150.
Writing checkpoint at step 35810.
[16:06:35] INFO: Completed step 36000 of initial simulation
Writing checkpoint at step 36460.
Writing checkpoint at step 37110.
Writing checkpoint at step 37760.
[16:21:54] INFO: Completed step 38000 of initial simulation
Writing checkpoint at step 38420.
Writing checkpoint at step 39070.
Writing checkpoint at step 39730.
[16:37:09] INFO: Completed step 40000 of initial simulation
Writing checkpoint at step 40390.
Writing checkpoint at step 41040.
Writing checkpoint at step 41690.
[16:52:28] INFO: Completed step 42000 of initial simulation
Writing checkpoint at step 42340.
Writing checkpoint at step 43000.
Writing checkpoint at step 43660.
[17:07:42] INFO: Completed step 44000 of initial simulation
Writing checkpoint at step 44310.
Writing checkpoint at step 44970.
Writing checkpoint at step 45620.
[17:23:01] INFO: Completed step 46000 of initial simulation
Writing checkpoint at step 46270.
Writing checkpoint at step 46910.
Writing checkpoint at step 47560.
[17:38:28] INFO: Completed step 48000 of initial simulation
Writing checkpoint at step 48210.
Writing checkpoint at step 48860.
Writing checkpoint at step 49510.
[17:53:52] INFO: Completed step 50000 of initial simulation
Writing checkpoint at step 50160.
Writing checkpoint at step 50810.
Writing checkpoint at step 51460.
[18:09:15] INFO: Completed step 52000 of initial simulation
Writing checkpoint at step 52110.
Writing checkpoint at step 52760.
Writing checkpoint at step 53410.
[18:24:38] INFO: Completed step 54000 of initial simulation
Writing checkpoint at step 54060.
Writing checkpoint at step 54710.
Writing checkpoint at step 55360.
[18:40:02] INFO: Completed step 56000 of initial simulation
Writing checkpoint at step 56010.
Writing checkpoint at step 56660.
Writing checkpoint at step 57310.
Writing checkpoint at step 57950.
[18:55:28] INFO: Completed step 58000 of initial simulation
Writing checkpoint at step 58600.
Writing checkpoint at step 59240.
Writing checkpoint at step 59890.
[19:10:58] INFO: Completed step 60000 of initial simulation
Writing checkpoint at step 60530.
Writing checkpoint at step 61180.
Writing checkpoint at step 61830.
[19:26:25] INFO: Completed step 62000 of initial simulation
Writing checkpoint at step 62470.
Writing checkpoint at step 63110.
Writing checkpoint at step 63760.
[19:41:58] INFO: Completed step 64000 of initial simulation
Writing checkpoint at step 64410.
Writing checkpoint at step 65050.
Writing checkpoint at step 65690.
[19:57:31] INFO: Completed step 66000 of initial simulation
Writing checkpoint at step 66330.
Writing checkpoint at step 66970.
Writing checkpoint at step 67610.
[20:13:09] INFO: Completed step 68000 of initial simulation
Writing checkpoint at step 68250.
Writing checkpoint at step 68890.
Writing checkpoint at step 69530.
[20:28:45] INFO: Completed step 70000 of initial simulation
Writing checkpoint at step 70170.
Writing checkpoint at step 70820.
Writing checkpoint at step 71460.
[20:44:20] INFO: Completed step 72000 of initial simulation
Writing checkpoint at step 72100.
Writing checkpoint at step 72750.
Writing checkpoint at step 73420.
[20:59:28] INFO: Completed step 74000 of initial simulation
Writing checkpoint at step 74080.
Writing checkpoint at step 74740.
Writing checkpoint at step 75410.
[21:14:38] INFO: Completed step 76000 of initial simulation
Writing checkpoint at step 76060.
Writing checkpoint at step 76720.
Writing checkpoint at step 77370.
[21:29:51] INFO: Completed step 78000 of initial simulation
Writing checkpoint at step 78030.
Writing checkpoint at step 78690.
Writing checkpoint at step 79350.
[21:45:05] INFO: Completed step 80000 of initial simulation
Writing checkpoint at step 80000.
Writing checkpoint at step 80660.
Writing checkpoint at step 81160.
Writing checkpoint at step 81810.
[22:01:34] INFO: Completed step 82000 of initial simulation
Writing checkpoint at step 82460.
Writing checkpoint at step 83120.
Writing checkpoint at step 83780.
[22:16:46] INFO: Completed step 84000 of initial simulation
Writing checkpoint at step 84440.
Writing checkpoint at step 85100.
Writing checkpoint at step 85760.
[22:31:57] INFO: Completed step 86000 of initial simulation
Writing checkpoint at step 86420.
Writing checkpoint at step 87080.
Writing checkpoint at step 87740.
[22:47:04] INFO: Completed step 88000 of initial simulation
Writing checkpoint at step 88400.
Writing checkpoint at step 89060.
Writing checkpoint at step 89730.
[23:02:09] INFO: Completed step 90000 of initial simulation
Writing checkpoint at step 90390.
Writing checkpoint at step 91060.
Writing checkpoint at step 91720.
[23:17:13] INFO: Completed step 92000 of initial simulation
Writing checkpoint at step 92380.
Writing checkpoint at step 93050.
Writing checkpoint at step 93710.
[23:32:17] INFO: Completed step 94000 of initial simulation
Writing checkpoint at step 94370.
Writing checkpoint at step 95020.
Writing checkpoint at step 95680.
[23:47:31] INFO: Completed step 96000 of initial simulation
Writing checkpoint at step 96340.
Writing checkpoint at step 97000.
Writing checkpoint at step 97660.
[00:02:43] INFO: Completed step 98000 of initial simulation
Writing checkpoint at step 98310.
Writing checkpoint at step 98960.
Writing checkpoint at step 99620.
[00:17:57] INFO: Completed step 100000 of initial simulation
Writing checkpoint at step 100000.
[00:17:59] INFO: Finished initial simulation.
[00:17:59] INFO: Running secondary simulation
[00:25:33] INFO: Run complete, CPU time: 46610.505303
00:25:33 (410): called boinc_finish(0)
</stderr_txt>
]]>
----------------------------------------

[Jun 25, 2016 1:02:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

@yoro42, you had 2 restarts from a checkpoint file in that example. This beta does seem to be showing that there is a problem when a job is restarted like that.
[Jun 25, 2016 7:25:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Gurra
Cruncher
Joined: Sep 11, 2006
Post Count: 33
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

This Beta appears to have had some kind of issue with the code checking for WUs being returned on time. I don't have any extended logging enabled on the client side but the server side reported WU return within 3-5 days in both cases.

I had 2 WUs complete correctly with "Too Late" where everyone else had either "Error" or also "Too Late".
Errors were a mixture of code -120 RSA key check and "finish file present too long".

BETA_ HST1_ 004073_ 000072_ AC0017_ T325_ F00096_ S00005_ 0--: 2x "Too Late", 2x "Error: -120", 1x "Error: finish file present too long"
BETA_ HST1_ 004069_ 000094_ AT0018_ T000_ F00062_ S00006_ 2-- : 1x "Too Late", 2x "Error: -120", 2x "Error: finish file present too long"

Perhaps this happens because the error WUs return disproportionately fast?

Also had an "Error: -120" on a 64-bit Windows 2003 server. The machine has never had one of those before or since, its local time and date are set correctly and the boinc folders are not being AV scanned:

BETA_ HST1_ 004072_ 000035_ MT0007_ T350_ F00035_ S00005_ 2--
----------------------------------------

[Jun 25, 2016 10:20:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

"Too Late" is a catch all [mis]direction. Too late - "to get enough valid results for this task" is the meaning in 99.999% of the cases i.e. if after e.g. 5 copies there's still no quorum, it's considered 'too late' to continue, credit is still granted AT A LATER TIME [The exception script only runs a few times a few].

'Too late' in the literal sense hardly ever happens since WCG maintains a grace period on top over and above the deadline... long as the canonical result is on the Result Status pages and the result was valid, you'll get credited.
[Jun 25, 2016 10:30:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

This version has been promoted for the 64 bit Windows and 32/64 bit Linux platforms. Further updates and testing are required for 32 bit windows and OS x.

Announcement thread: https://secure.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=525938

Thanks,
armstrdj
[Jun 27, 2016 9:09:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 18   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread