Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 70
Posts: 70   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 14248 times and has 69 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta May 23, 2016 [ Issues Thread ]

Why are some of these WU's classified as "error" when no apparent error has occurred but simply did not finish job 0 within the 18 hour time limit? Wouldn't it be more prudent to classify them as "time exceeded" or some such thing and save the WCG servers sending these WU's to more clients?
[May 24, 2016 5:22:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta May 23, 2016 [ Issues Thread ]

The why is in there surely being other faster devices that will get to checkpoint 1. Each task gets 5 chances to do that.
[May 24, 2016 5:29:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
pvh513
Senior Cruncher
Joined: Feb 26, 2011
Post Count: 260
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta May 23, 2016 [ Issues Thread ]

I got 7 of these WUs, of which 1 has finished so far. It exited with RC = 0x100 in job #0 and then skipped jobs #1 through #4. For my wingman it exited with RC = 0x100 in job #3 and then skipped job #4. This is now pending verification and a third unit has been sent out. Name is BETA_ E236437_ 323_ S.372.C52H28S2.WEULLIFNMVXNAH-UHFFFAOYSA-N.4_ s1_ 14a.
[May 24, 2016 5:43:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta May 23, 2016 [ Issues Thread ]

" These work units are smaller in size than the previous test. Should allow for the first job to run faster. "

Not quite as others already have observed. Had 3 of 5 on laptop not checkpointing in the first 17.5 hours and then faith struck... system momentarily busy, heartbeat troubles...

5519 World Community Grid 5/24/2016 7:08:02 PM Task BETA_E236438_79_S.384.C34F2H10N8O4S4.ATKBMOAMWBBVNU-UHFFFAOYSA-N.4_s1_14a_1 exited with zero status but no 'finished' file
5520 World Community Grid 5/24/2016 7:08:02 PM If this happens repeatedly you may need to reset the project.
5521 World Community Grid 5/24/2016 7:08:02 PM [checkpoint] result ZIKA_000001131_x1nb7_HCVJ4_RNAPol_wRNAand2Mn_chnA_0023_0 checkpointed
5522 World Community Grid 5/24/2016 7:08:03 PM [cpu_sched] Restarting task BETA_E236438_79_S.384.C34F2H10N8O4S4.ATKBMOAMWBBVNU-UHFFFAOYSA-N.4_s1_14a_1 using beta11 version 700 in slot 8
5523 5/24/2016 7:08:13 PM Suspending computation - CPU is busy
5524 World Community Grid 5/24/2016 7:08:13 PM [cpu_sched] Preempting BETA_E236438_79_S.384.C34F2H10N8O4S4.ATKBMOAMWBBVNU-UHFFFAOYSA-N.4_s1_14a_1 (left in memory)
5525 World Community Grid 5/24/2016 7:08:13 PM [cpu_sched] Preempting BETA_E236438_990_S.388.C40F5H13N6S3.FYRXAQSGOQBMGM-UHFFFAOYSA-N.9_s1_14a_0 (left in memory)
5526 World Community Grid 5/24/2016 7:08:13 PM [cpu_sched] Preempting BETA_E236438_293_S.392.C44F2H18N4S4.CRKKDMWLGUUGIU-UHFFFAOYSA-N.13_s1_14a_1 (left in memory)
5527 World Community Grid 5/24/2016 7:08:13 PM [cpu_sched] Preempting BETA_E236438_576_S.394.C44H20N2O4S4.JBZGKUDBQBIDKE-UHFFFAOYSA-N.15_s1_14a_1 (left in memory)
5528 World Community Grid 5/24/2016 7:08:13 PM [cpu_sched] Preempting ZIKA_000001118_x1nb7_HCVJ4_RNAPol_wRNAand2Mn_chnA_0058_2 (left in memory)
5529 World Community Grid 5/24/2016 7:08:13 PM [cpu_sched] Preempting ZIKA_000001123_x1nb7_HCVJ4_RNAPol_wRNAand2Mn_chnA_0006_0 (left in memory)
5530 World Community Grid 5/24/2016 7:08:13 PM [cpu_sched] Preempting ZIKA_000001123_x1nb7_HCVJ4_RNAPol_wRNAand2Mn_chnA_0322_0 (left in memory)
5531 World Community Grid 5/24/2016 7:08:13 PM [cpu_sched] Preempting ZIKA_000001131_x1nb7_HCVJ4_RNAPol_wRNAand2Mn_chnA_0023_0 (left in memory)
5532 World Community Grid 5/24/2016 7:08:13 PM Task BETA_E236438_990_S.388.C40F5H13N6S3.FYRXAQSGOQBMGM-UHFFFAOYSA-N.9_s1_14a_0 exited with zero status but no 'finished' file
5533 World Community Grid 5/24/2016 7:08:13 PM If this happens repeatedly you may need to reset the project.
5534 World Community Grid 5/24/2016 7:08:13 PM Task BETA_E236438_293_S.392.C44F2H18N4S4.CRKKDMWLGUUGIU-UHFFFAOYSA-N.13_s1_14a_1 exited with zero status but no 'finished' file
5535 World Community Grid 5/24/2016 7:08:13 PM If this happens repeatedly you may need to reset the project.
5536 World Community Grid 5/24/2016 7:08:13 PM Task BETA_E236438_576_S.394.C44H20N2O4S4.JBZGKUDBQBIDKE-UHFFFAOYSA-N.15_s1_14a_1 exited with zero status but no 'finished' file
5537 World Community Grid 5/24/2016 7:08:13 PM If this happens repeatedly you may need to reset the project.
5538 5/24/2016 7:08:23 PM Resuming computation

Of course the reset advice is no-go... this is CEP2 after all. Anyway, caught them at 20 minutes into retry from start and returned them back to sender... doubtful these 3 would have made it on this device to first checkpoint. The 4th strangely did not budge and has 3 checkpoints, so this one is good to finish in time. The 5th finished in 14:12 with 4 checkpoints.
----------------------------------------
[Edit 1 times, last edit by SekeRob* at May 24, 2016 5:47:28 PM]
[May 24, 2016 5:45:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta May 23, 2016 [ Issues Thread ]

Yikes, gotta say this forum doesn't tolerate diversity of opinion very well

But to be on topic, I've received a few betas, all valid taking between 6-10 hours each.
[May 24, 2016 6:52:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
RTS48
Veteran Cruncher
Bolivia
Joined: Aug 2, 2009
Post Count: 1353
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta May 23, 2016 [ Issues Thread ]

Betas crunching fine and due to finish in about 18 hours EXCEPT I have just had a power cut (only a few minutes but enough to shut down my UPS). When back up I find that all of my Betas have reset to zero loosing me 64 hours (8 hours by 8 cores) of crunch time. Why oh why does this Beta not do a CPU checpoint. Please please ensure that future Betas include a checkpoint so that folks like me (subject to random power cuts) can preserve most of the work already completed.
----------------------------------------
Rod Peel
Santa Cruz
Bolivia
South America

,
,
[May 24, 2016 11:24:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta May 23, 2016 [ Issues Thread ]

Betas crunching fine and due to finish in about 18 hours EXCEPT I have just had a power cut (only a few minutes but enough to shut down my UPS). When back up I find that all of my Betas have reset to zero loosing me 64 hours (8 hours by 8 cores) of crunch time. Why oh why does this Beta not do a CPU checpoint. Please please ensure that future Betas include a checkpoint so that folks like me (subject to random power cuts) can preserve most of the work already completed.

Checkpointing still seems to be an issue. I thought one of the tests for this new beta was to remedy that problem but I'm still seeing tasks run past 11 hours before the first checkpoint.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


[May 25, 2016 12:15:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Seoulpowergrid
Veteran Cruncher
Joined: Apr 12, 2013
Post Count: 823
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta May 23, 2016 [ Issues Thread ]

+1

I am over 14 hours in on a 3.4 GHz machine and the first checkpoint is yet to be reached.
----------------------------------------

[May 25, 2016 4:25:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta May 23, 2016 [ Issues Thread ]

We still have an RC = 0x1 exit in Job #0 not validating with an RC = 0x1 exit in Job #3; instead both go to PVer and a repair unit gets issued. The question is whether one of the original pair still ends up Invalid ...

BETA_ E236439_ 314_ S.422.C44H18N4O2S6.PLTGJJHXMUKIKO-UHFFFAOYSA-N.12_ s1_ 14a_ 2-- Microsoft Windows 8.1 Professional x64 Edition, (06.03.9600.00) - In Progress 25/05/16 03:55:21 29/05/16 03:55:21 0.00 0.0 / 0.0
BETA_ E236439_ 314_ S.422.C44H18N4O2S6.PLTGJJHXMUKIKO-UHFFFAOYSA-N.12_ s1_ 14a_ 1-- Microsoft Windows 10 Core x64 Edition, (10.00.10586.00) 700 Pending Verification 24/05/16 10:40:21 25/05/16 03:55:13 8.06 293.7 / 0.0
BETA_ E236439_ 314_ S.422.C44H18N4O2S6.PLTGJJHXMUKIKO-UHFFFAOYSA-N.12_ s1_ 14a_ 0-- Microsoft x64 Edition, (10.00.10586.00) 700 Pending Verification 24/05/16 10:40:01 24/05/16 12:44:09 2.02 63.3 / 0.0
[May 25, 2016 9:15:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta May 23, 2016 [ Issues Thread ]

Maybe the validator needs an additional rule to always set the wingman canonical candidate to the one with the highest number of jobs completed. Suppose the 3rd copy only gets to job #2, which one is than of binding interest? In the example, for validation purposes, only look at the first 2 jobs that can be matched and assume the 3rd is fine. [Think this is how HCMD2 worked when 2 results had a different number of jobs completed in the allowed time.]
[May 25, 2016 9:22:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 70   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 | Next Page ]
[ Jump to Last Post ]
Post new Thread