Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 18
Posts: 18   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 8045 times and has 17 replies Next Thread
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

Please post issues for this beta here.

http://www.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=525207

Thanks,
armstrdj
[Jun 17, 2016 8:22:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
dango
Senior Cruncher
Joined: Jul 27, 2009
Post Count: 307
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

got 2, started well....
[Jun 17, 2016 8:39:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JimWork
Cruncher
Canada
Joined: Oct 11, 2005
Post Count: 35
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

got 12 ! woohoo --- if they work I get my little ruby badge and selfie pat on the back
[Jun 17, 2016 8:44:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

Guess what, same machine as before, the first download session (3 beta units) failed with error -120 (RSA key check failed for file), all subsequent download sessions of beta units succeeded. It does now have a scan exclusion on the BOINC ProgramData folder, so bye-bye to that hypothesis.

Failed download (all these 3 errored with -120):
17/06/2016 21:18:13 | World Community Grid | Scheduler request completed: got 3 new tasks
17/06/2016 21:18:15 | World Community Grid | Started download of wcgrid_beta22_gromacs_7.21_windows_x86_64
17/06/2016 21:18:15 | World Community Grid | Started download of wcgrid_HST1_graphics_prod_64.exe.7.21
17/06/2016 21:18:15 | World Community Grid | Started download of beta22_image01_7.21.tga
17/06/2016 21:18:15 | World Community Grid | Started download of beta22_image02_7.21.tga
17/06/2016 21:18:17 | World Community Grid | Finished download of beta22_image01_7.21.tga
17/06/2016 21:18:17 | World Community Grid | Finished download of beta22_image02_7.21.tga
17/06/2016 21:18:17 | World Community Grid | Started download of beta22_image03_7.21.tga
17/06/2016 21:18:17 | World Community Grid | Started download of beta22_image04_7.21.tga
17/06/2016 21:18:18 | World Community Grid | Finished download of beta22_image03_7.21.tga
17/06/2016 21:18:18 | World Community Grid | Finished download of beta22_image04_7.21.tga
17/06/2016 21:18:18 | World Community Grid | Started download of 51c7514d41a369294878918786403cd6.tpr
17/06/2016 21:18:18 | World Community Grid | Started download of 921d09821a96f169f77376f133f4a067.tpr
17/06/2016 21:18:19 | World Community Grid | Finished download of wcgrid_HST1_graphics_prod_64.exe.7.21
17/06/2016 21:18:19 | World Community Grid | Started download of e3c05a80b997297c5d05e51266bf4e08.tpr
17/06/2016 21:18:41 | World Community Grid | Finished download of 921d09821a96f169f77376f133f4a067.tpr
17/06/2016 21:18:49 | | Project communication failed: attempting access to reference site
17/06/2016 21:18:49 | World Community Grid | Temporarily failed download of wcgrid_beta22_gromacs_7.21_windows_x86_64: transient HTTP error
17/06/2016 21:18:49 | World Community Grid | Finished download of 51c7514d41a369294878918786403cd6.tpr
17/06/2016 21:18:50 | | Internet access OK - project servers may be temporarily down.
17/06/2016 21:18:50 | World Community Grid | Started download of wcgrid_beta22_gromacs_7.21_windows_x86_64
17/06/2016 21:18:50 | World Community Grid | Finished download of e3c05a80b997297c5d05e51266bf4e08.tpr
17/06/2016 21:18:51 | World Community Grid | Finished download of wcgrid_beta22_gromacs_7.21_windows_x86_64

Subsequent successful download (these 3 downloaded ok):
17/06/2016 21:23:19 | World Community Grid | Scheduler request completed: got 3 new tasks
17/06/2016 21:23:21 | World Community Grid | Started download of wcgrid_beta22_gromacs_7.21_windows_x86_64
17/06/2016 21:23:21 | World Community Grid | Started download of 8c805b3f4f07dd1c0f322724e785cf44.tpr
17/06/2016 21:23:21 | World Community Grid | Started download of 9a3569ba9e3ef061792b6cda5e3beeae.tpr
17/06/2016 21:23:21 | World Community Grid | Started download of 419e0dc8d42fc9adeb1dfddc4d42071f.tpr
17/06/2016 21:23:36 | World Community Grid | Finished download of 8c805b3f4f07dd1c0f322724e785cf44.tpr
17/06/2016 21:23:37 | World Community Grid | Finished download of 9a3569ba9e3ef061792b6cda5e3beeae.tpr
17/06/2016 21:23:37 | World Community Grid | Finished download of 419e0dc8d42fc9adeb1dfddc4d42071f.tpr
17/06/2016 21:23:43 | World Community Grid | Finished download of wcgrid_beta22_gromacs_7.21_windows_x86_64

Any ideas?
[Jun 17, 2016 8:45:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

What's the 32 bit version number? Got a x86_64 on the test W10 with v7.21.

HST issues I know of:

Validation
32 bit
AMD CPUs

And, is all this effort working up towards getting feeder levels above 11K a day? Outstanding issues such as FAH2 going invalid when crunching offline through the 10th trickle would be higher on my priority list to resolve. Simply not crunching them as internet stability is iffy here.
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Jun 17, 2016 9:15:40 PM]
[Jun 17, 2016 9:14:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 786
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

On Windows XP 32 bit 2 core I have 2 T000 betas running, using CPU and % done increasing but CPU% zero in BoincTasks. Windows task manager shows 20MB Mem usage and 1GB VM size, 6,000 page faults.
1st No checkpoint after 45 minuts, 5% done.
2nd I tried to suspend but did not see CPU drop, reset to start, now 1.2% after 10 mins.

AuthenticAMD, AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ [Family 15 Model 107 Stepping 2]
Memory 1.87 Gb, Virtual: 4.65 Gb
Disk Used: 15.62 Gb, Free: 0.44 Gb

Paul.
----------------------------------------
Paul.
[Jun 17, 2016 10:09:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

Had three so far on WinXP32 with BOINC 7.6.22 and all went like this:

Result Name: BETA_ HST1_ 004073_ 000084_ AC0018_ T325_ F00008_ S00005_ 0--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
The access code is invalid.
(0xc) - exit code 12 (0xc)
</message>
<stderr_txt>
INFO: result number = 0
INFO: No state to restore. Start from the beginning.
[21:33:54] INFO: Running initial simulation

-------------------------------------------------------
Program projects/www.worldcommunitygrid.org/wcgrid_beta22_gromacs_7, VERSION 4.6.1
Source code file: .\src\gmxlib\smalloc.c, line: 247

Fatal error:
Not enough memory. Failed to realloc 527008 bytes for nl->gid, nl->gid=0x0
(called from file .\src\mdlib\ns.c, line 122)
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day
: Not enough space

</stderr_txt>
]]>


I can't say I understand the memory error as Windows Task Manager says the Commit Charge is 1658M / 3168M and, as we had a couple of power failures less than 6 hours ago, the machine is pretty "fresh".

One of the beta's at 7.20 is still running alongside a production 7.16 -- could there be some interaction?

I've also noticed a bunch of soft_link files getting updated rather often to the current time. Where are those coming from?
[Jun 17, 2016 10:35:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 786
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

An Intel XP 32 completed OK with CPU logged.
On the dual core XP 32 one is valid and other PV.
CPU is zero on both.
Extract from log of the unit I restarted:

Result Name: BETA_ HST1_ 004078_ 000001_ AT0012_ T000_ F00056_ S00006_ 0--
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<stderr_txt>
22:37:23 (28764): start_timer_thread(): CreateThread() failed, errno 0
INFO: result number = 0
INFO: No state to restore. Start from the beginning.
[22:37:23] INFO: Running initial simulation
[22:43:35] INFO: Completed step 100000 of initial simulation
[22:49:28] INFO: Completed step 200000 of initial simulation
[22:55:06] INFO: Completed step 300000 of initial simulation
22:58:59 (26716): start_timer_thread(): CreateThread() failed, errno 0
INFO: result number = 0
INFO: No state to restore. Start from the beginning.
[22:58:59] INFO: Running initial simulation

Back Off! I just backed up md.log to ./#md.log.1#

Back Off! I just backed up traj.xtc to ./#traj.xtc.1#

Back Off! I just backed up ener.edr to ./#ener.edr.1#
[23:05:24] INFO: Completed step 100000 of initial simulation
...
[03:41:32] INFO: Completed step 5000000 of initial simulation
[03:41:32] INFO: Finished initial simulation.
[03:41:32] INFO: Running secondary simulation
[03:41:35] INFO: Run complete, CPU time: 16736.515625
03:41:35 (26716): called boinc_finish(0)

</stderr_txt>
]]>

BETA_ HST1_ 004078_ 000001_ AT0012_ T000_ F00056_ S00006_ 0-- unknown2 Pending Validation 17/06/16 20:20:35 18/06/16 02:40:07 0.00 / 4.71 98.6 / 0.0

Paul.

Edit: Add result status.
----------------------------------------
Paul.
----------------------------------------
[Edit 1 times, last edit by PMH_UK at Jun 18, 2016 8:54:52 AM]
[Jun 18, 2016 8:39:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

Today I completed a repair job for the 17 June beta, with one wingman turning Invalid. The only difference in the Invalid Result Log was a couple of restarts from a checkpoint file.

BETA_ HST1_ 004074_ 000024_ AC0031_ T400_ F00045_ S00005_ 2-- Microsoft Windows 10 Professional x64 Edition, (10.00.10586.00) 721 Valid 20/06/16 14:00:15 20/06/16 18:43:07 3.62 143.2 / 146.2
BETA_ HST1_ 004074_ 000024_ AC0031_ T400_ F00045_ S00005_ 1-- Microsoft Windows 8.1 x64 Edition, (06.03.9600.00) 721 Invalid 17/06/16 20:39:32 20/06/16 14:00:06 7.43 214.7 / 146.2
BETA_ HST1_ 004074_ 000024_ AC0031_ T400_ F00045_ S00005_ 0-- Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) 721 Valid 17/06/16 20:39:25 18/06/16 06:45:19 5.99 149.2 / 146.2

Result Name: BETA_ HST1_ 004074_ 000024_ AC0031_ T400_ F00045_ S00005_ 1--
<core_client_version>7.6.9</core_client_version>
<![CDATA[
<stderr_txt>
INFO: result number = 1
INFO: No state to restore. Start from the beginning.
[16:27:19] INFO: Running initial simulation
Writing checkpoint at step 380.
Writing checkpoint at step 780.
Writing checkpoint at step 1230.
Writing checkpoint at step 1750.
[16:49:35] INFO: Completed step 2000 of initial simulation
... (snipped)
[19:10:34] INFO: Completed step 38000 of initial simulation
Writing checkpoint at step 38410.
Writing checkpoint at step 39510.
[19:19:29] INFO: Completed step 40000 of initial simulation
INFO: result number = 1
[11:13:36] INFO: Running initial simulation

Reading checkpoint file state.cpt generated: Fri Jun 17 19:17:21 2016

[11:15:08] INFO: Completed step 40000 of initial simulation
... (snipped)
[15:05:20] INFO: Completed step 86000 of initial simulation
Writing checkpoint at step 86710.
INFO: result number = 1
[07:51:55] INFO: Running initial simulation

Reading checkpoint file state.cpt generated: Sat Jun 18 15:08:44 2016

[07:56:17] INFO: Completed step 88000 of initial simulation
Writing checkpoint at step 88170.
... (snipped)
Writing checkpoint at step 98980.
[08:46:14] INFO: Completed step 100000 of initial simulation
Writing checkpoint at step 100000.
[08:46:17] INFO: Finished initial simulation.
[08:46:17] INFO: Running secondary simulation
[08:58:53] INFO: Run complete, CPU time: 26744.542500
08:58:53 (4288): called boinc_finish(0)
[Jun 20, 2016 10:41:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 17, 2016 [Issues Thread]

All WUs of that batch returned, including one that was received on a 32bit Windows Server 2003 box, with only one other resulting in an error ( BETA_HST1_004074_000080_AC0032_T400_F00001_S00005, on Windows 7/64bit),but so did 3 wingmans (1 Windows XP 32bit, 1 Windows 7/64bit, 1 Windows 8.1/64)...
[Jun 21, 2016 9:45:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 18   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread