Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 120
|
![]() |
Author |
|
Thyme Lawn
Cruncher Joined: Dec 9, 2008 Post Count: 46 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've a 218 task on my Q6600 2.40GHz XP, BOINC 6.12.43 system. It's 50% done after 8 hours and has had no problems recovering to the previous checkpoint (elapsed time, CPU time and progress are all restored to the expected values on restart). Completed and validated, with 15.95 hours CPU time and 16.65 hours elapsed time (WU 920672411). Now running its first 400 task.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
|
||
|
branjo
Master Cruncher Slovakia Joined: Jun 29, 2012 Post Count: 1892 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Did everything in my power to break them. Guess we'll see if everything validates. This gave me an opportunity to take two rack servers down repeatedly to try out different cooling fans. What about hammer and nails ![]() NI! ![]() Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006 ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Returned a long one near 11hrs, I made it restart 5 times including a reboot.
It went valid on return. ![]() |
||
|
ccandido
Senior Cruncher Joined: Jun 22, 2011 Post Count: 182 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Regarding checkpointing, I have hundreds of lines like these in my log?
----------------------------------------10/12/2013 21:37:29 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_7038_1 checkpointed 10/12/2013 21:38:02 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_6014_0 checkpointed 10/12/2013 21:38:42 | World Community Grid | [checkpoint] result BETA_MCM1_0000218_2023_0 checkpointed 10/12/2013 21:42:56 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_2890_0 checkpointed 10/12/2013 21:44:09 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_6721_0 checkpointed 10/12/2013 21:46:53 | World Community Grid | [checkpoint] result BETA_MCM1_0000218_0137_0 checkpointed 10/12/2013 21:46:54 | World Community Grid | [checkpoint] result BETA_MCM1_0000218_1374_0 checkpointed 10/12/2013 21:47:19 | World Community Grid | [checkpoint] result BETA_MCM1_0000218_0145_0 checkpointed 10/12/2013 21:47:32 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_7038_1 checkpointed 10/12/2013 21:48:02 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_6014_0 checkpointed 10/12/2013 21:48:43 | World Community Grid | [checkpoint] result BETA_MCM1_0000218_2023_0 checkpointed 10/12/2013 21:52:58 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_2890_0 checkpointed 10/12/2013 21:54:08 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_6721_0 checkpointed 10/12/2013 21:56:55 | World Community Grid | [checkpoint] result BETA_MCM1_0000218_0137_0 checkpointed 10/12/2013 21:56:56 | World Community Grid | [checkpoint] result BETA_MCM1_0000218_1374_0 checkpointed 10/12/2013 21:57:23 | World Community Grid | [checkpoint] result BETA_MCM1_0000218_0145_0 checkpointed 10/12/2013 21:57:36 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_7038_1 checkpointed 10/12/2013 21:58:04 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_6014_0 checkpointed 10/12/2013 21:58:48 | World Community Grid | [checkpoint] result BETA_MCM1_0000218_2023_0 checkpointed 10/12/2013 22:03:07 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_2890_0 checkpointed 10/12/2013 22:04:15 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_6721_0 checkpointed 10/12/2013 22:06:56 | World Community Grid | [checkpoint] result BETA_MCM1_0000218_0137_0 checkpointed 10/12/2013 22:06:57 | World Community Grid | [checkpoint] result BETA_MCM1_0000218_1374_0 checkpointed 10/12/2013 22:07:25 | World Community Grid | [checkpoint] result BETA_MCM1_0000218_0145_0 checkpointed 10/12/2013 22:07:41 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_7038_1 checkpointed 10/12/2013 22:08:07 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_6014_0 checkpointed 10/12/2013 22:08:49 | World Community Grid | [checkpoint] result BETA_MCM1_0000218_2023_0 checkpointed 10/12/2013 22:13:10 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_2890_0 checkpointed 10/12/2013 22:14:19 | World Community Grid | [checkpoint] result BETA_MCM1_0000400_6721_0 checkpointed ![]() ![]() |
||
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
And everything that was restarted is either Pver or invalid.
----------------------------------------Guess we'll be seeing another beta and I hope the techs can get some good data from this one. ![]() Distributed computing volunteer since September 27, 2000 |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
So far, I have received 16 of the 7.27 Betas on three different Linux machines (32 and 64 bit) and all have recovered last check points properly.after rebooting or suspending after updating the agent with "no" to leaving project in memory on the website profile, suspending, verifying via ps that application is truly out of active memory and then resuming. That typically has resulted in a set back of several minutes for the WU's in progress that restarted but has appeared consistent so far. No short ones less than a minute or long ones that go beyond 100% so encountered far.
----------------------------------------Note: I also notice that these WU's do not use 100% of all multi cpu cores available. Is that by design? [Edit 1 times, last edit by Former Member at Dec 11, 2013 3:08:39 AM] |
||
|
ccandido
Senior Cruncher Joined: Jun 22, 2011 Post Count: 182 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Half of the wu that I had running when rebooted were marked invalid
----------------------------------------Did a new reboot today no results completed yet ![]() ![]() |
||
|
yoro42
Ace Cruncher United States Joined: Feb 19, 2011 Post Count: 8979 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
6 Invalid
----------------------------------------BETA_ MCM1_ 0000144_ 4245_ 0-- Bird Invalid 12/10/13 01:04:18 12/10/13 06:03:06 3.68 / 4.61 93.5 / 75.2 BETA_ MCM1_ 0000144_ 3983_ 1-- Coltrane Invalid 12/10/13 01:02:49 12/10/13 05:29:24 3.56 / 3.99 104.9 / 86.4 BETA_ MCM1_ 0000144_ 3479_ 0-- Coltrane Invalid 12/10/13 00:59:10 12/10/13 04:55:44 3.29 / 3.43 90.2 / 60.2 BETA_ MCM1_ 0000144_ 3034_ 1-- Coltrane Invalid 12/10/13 00:56:00 12/10/13 05:02:22 3.29 / 3.53 92.7 / 71.5 BETA_ MCM1_ 0000144_ 9245_ 0-- Coltrane Invalid 12/10/13 00:36:20 12/10/13 04:48:41 3.24 / 3.34 87.8 / 73.5 BETA_ MCM1_ 0000144_ 8661_ 1-- Coltrane Invalid 12/10/13 00:33:11 12/10/13 05:02:22 3.31 / 3.55 93.2 / 48.6 Sample Result Log: Result Name: BETA_ MCM1_ 0000144_ 8661_ 1-- <core_client_version>7.2.33</core_client_version> <![CDATA[ <stderr_txt> Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.27_windows_x86_64 -SettingsFile MCM1_0000144_8661.txt -DatabaseFile dataset-17_72_SDG_v1.txt Settings File DateOfDesign = 11/08/2013 Designer = PMCC_OCI WorkOrderID = 0000144_8661 DatasetID = 17_72_SDG_v1 NumberOfGenesInStartingSignature = 16 NumberOfGenesInSignatureMin = 10 NumberOfGenesInSignatureMax = 20 GroupVectorValues = {A}{B}{C}{D}{E}{F} ExplicitStartingGeneSignatures = A B D F StartingGeneSignatureAlgorithm = randomFixedLengthSearch SearchAlgorithmNumberToCreate = 7738 SearchAlgorithmSequentialStartPosition = 5 RunPermutationAlgorithm = 0 PermutationGroups = A PermutationGroupsForReplacement = G PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy PermutationsNumIterations = 7738 OptimizationAlgorithmFrequency = 0 0 1 FBeta = 1.5 SimAnnealIMax = 20000 SimAnnealAlpha = 0.9996 NReps = 10 TrainFrac = 0.7 NFolds = 10 VMethod = NFCV ModelType = SVM FitnessFn = 0 MinFitness = -1 SvmArgs = "-v 0 -c 0.01 -t 1 -d 3 -r 0" SvmLearnLimit = 500000 RSeed = 348661 [18:07:54] Initializing wcg_learn_limit = 500000 [18:08:01] Running [18:08:01] EvaluateFitnessOfStartingGeneSignatures 7738 Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.27_windows_x86_64 -SettingsFile MCM1_0000144_8661.txt -DatabaseFile dataset-17_72_SDG_v1.txt [18:33:17] Initializing wcg_learn_limit = 500000 [18:33:24] Running [18:33:24] EvaluateFitnessOfStartingGeneSignatures 7738 Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.27_windows_x86_64 -SettingsFile MCM1_0000144_8661.txt -DatabaseFile dataset-17_72_SDG_v1.txt [18:37:53] Initializing wcg_learn_limit = 500000 [18:38:07] Running [18:38:07] EvaluateFitnessOfStartingGeneSignatures 7738 [22:00:42] Writing final output [22:00:42] Closing Output Stream [22:00:42] Cleaning up Result.out = 1626955.000000 Run complete, CPU time: 11933.561664 22:00:42 (7456): called boinc_finish </stderr_txt> ]]> ![]() |
||
|
cowtipperbs
Advanced Cruncher Joined: Aug 24, 2009 Post Count: 78 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have a task that has run for 24 plus hours and still has 2 plus hours left... and is past the 1 day dead line. Should I abort the task?
----------------------------------------Name: BETA_MCM1_0000218_7522 ![]() [Edit 1 times, last edit by cowtipperbs at Dec 11, 2013 4:52:56 AM] |
||
|
BladeD
Ace Cruncher USA Joined: Nov 17, 2004 Post Count: 28976 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Why hasn't my beta WUs started on 2 PCs? MCM WUs with a deadline of 12/15-16 are running, but the beta WUs with a deadline of 12/13 are not.
---------------------------------------- |
||
|
|
![]() |