Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: New Beta Test - March 17, 2016 [ Issues Thread ] |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 21
|
Author |
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1671 Status: Offline Project Badges: |
Wrong beta thread! Yes, I did notice it this morning (night working is not always good ). Yves |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1317 Status: Offline Project Badges: |
I tested with task BETA_HST1_000002_000137_AC0013_T300_F00037_S00001_0 on Win7
----------------------------------------a few times the suspend (LAIM off)/resume problem with the previous beta-batch. The problem is solved. Now normal checkpointing and progress after the resumes. Endless running therefore will also be OK, but can't report yet. Progress 20% after 4 hours runtime and several CEP BETA's running High Priority. I got more of those BETA_E's than available threads. |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
I got more of those BETA_E's than available threads. [ot]There's a dirty trick for the connoisseurs, but there's also an oddball something which is coupled to how the CEP2 settings are in the device profile, to cause for more CEP2 betas to come than the standard rule of max = 1 Beta task * threads. Set my profile to 16 CEP2 allowed but still only 8 came to the octo and then 'no work available for', so not figured out what the exact oddball override settings are. As yours seem to work, don't change what's broken, if you have no problem with it ;O)[/ot] |
||
|
Eric_Kaiser
Veteran Cruncher Germany (Hessen) Joined: May 7, 2013 Post Count: 1047 Status: Offline Project Badges: |
Caught a bunch of the new beta.
----------------------------------------The short runnings start with BETA_AC* and the long runnings with BETA_HST1*. Checkpointing works for both types of beta wu. Stopping and (re)starting is ok. The short runnings were finished in ~2,3 hrs. The long runnings are still running. My personnel estimation on runtime is something around 15 hrs per wu. On my windows/linux hosts they show 3 hrs for 20% progress. |
||
|
Jason1478963
Senior Cruncher United States Joined: Sep 18, 2005 Post Count: 295 Status: Offline Project Badges: |
Not sure why but 11 (24 NOW) of the AC0002 wu's from 5 different machines came up invalid after what seemed like a smooth run.
----------------------------------------Result Name: BETA_ AC0002_ T000_ F00098_ S00001am_ 1-- <core_client_version>7.6.9</core_client_version> <![CDATA[ <stderr_txt> simulation [04:26:33] INFO: Completed step 3982000 of initial simulation [04:26:35] INFO: Completed step 3983000 of initial simulation [04:26:37] INFO: Completed step 3984000 of initial simulation [04:26:39] INFO: Completed step 3985000 of initial simulation [04:26:41] INFO: Completed step 3986000 of initial simulation ~~~~~~~~~~~~~~~~~~~~~ Shortened up ~~~~~~~~~~~~~~~~~~~~~ [05:02:06] INFO: Completed step 4994000 of initial simulation [05:02:08] INFO: Completed step 4995000 of initial simulation [05:02:10] INFO: Completed step 4996000 of initial simulation [05:02:12] INFO: Completed step 4997000 of initial simulation [05:02:14] INFO: Completed step 4998000 of initial simulation [05:02:16] INFO: Completed step 4999000 of initial simulation [05:02:18] INFO: Completed step 5000000 of initial simulation Writing checkpoint at step 5000000. [05:02:19] INFO: Finished initial simulation. [05:02:19] INFO: Running secondary simulation [05:02:21] INFO: Run complete, CPU time: 10232.593750 05:02:21 (6256): called boinc_finish(0) </stderr_txt> ]]> [Edit 1 times, last edit by Jason1478963 at Mar 21, 2016 2:02:37 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The BETA_ AC0002 units still have Result Log info written every second and truncated at the start:
Result Name: BETA_ AC0002_ T000_ F00029_ S00001ab_ 1-- <core_client_version>7.4.42</core_client_version> <![CDATA[ <stderr_txt> ted step 3982000 of initial simulation [03:16:49] INFO: Completed step 3983000 of initial simulation [03:16:50] INFO: Completed step 3984000 of initial simulation [03:16:51] INFO: Completed step 3985000 of initial simulation [03:16:52] INFO: Completed step 3986000 of initial simulation etc. |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Could this be intentional, to get extra log data during beta and is it writing to storage every second? Anything hitting storage in production with output every second is bad, bad. Picture what happens when you got a 8-16-24 and more cores running these concurrent.
----------------------------------------[Edit 1 times, last edit by SekeRob* at Mar 19, 2016 9:14:23 AM] |
||
|
Jason1478963
Senior Cruncher United States Joined: Sep 18, 2005 Post Count: 295 Status: Offline Project Badges: |
Could this be intentional, to get extra log data during beta and is it writing to storage every second? Anything hitting storage in production with output every second is bad, bad. Picture what happens when you got a 8-16-24 and more cores running these concurrent. It certainly isn't good, I lost a hdd to the CEP2 beta run, and of course caused the creation of another device ID, starting from scratch again. :( |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: |
It certainly isn't good, I lost a hdd to the CEP2 beta run, and of course caused the creation of another device ID, starting from scratch again. :( I feel your pain. Same thing happened to me during the CEP1 BETA. I recommend doing backups at least every 2 weeks and more often when BETAs are/have been running. A must to avoid your issue. Could this be intentional, to get extra log data during beta and is it writing to storage every second? Anything hitting storage in production with output every second is bad, bad. Picture what happens when you got a 8-16-24 and more cores running these concurrent. Makes for a flashing green and red light show.
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
----------------------------------------[Edit 1 times, last edit by nanoprobe at Mar 19, 2016 3:56:07 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
One Invalid and 2 Valid from this unit. I can't see any significant difference between the Result Logs.
BETA_ HST1_ 000002_ 000022_ AC0012_ T300_ F00022_ S00001_ 2-- Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) 713 Valid 19/03/16 18:31:11 20/03/16 04:16:10 9.37 320.9 / 327.8 BETA_ HST1_ 000002_ 000022_ AC0012_ T300_ F00022_ S00001_ 1-- Microsoft Windows 8.1 Professional x64 Edition, (06.03.9600.00) 713 Valid 17/03/16 21:55:50 19/03/16 06:15:07 11.02 334.7 / 327.8 BETA_ HST1_ 000002_ 000022_ AC0012_ T300_ F00022_ S00001_ 0-- Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) 713 Invalid 17/03/16 21:55:39 19/03/16 18:31:03 17.39 348.4 / 327.8 |
||
|
|