Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: New Set of Beta WU's - FAAH |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 484
|
Author |
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1313 Status: Offline Project Badges: |
Short deadlines for the resends is not only a problem for the HPF2-beta's.
----------------------------------------I got a FAAH-beta resend on a 1500MHz machine (24/7, 100%) which of course started directly in High Priority with originally estimated 59 hours runtime: BETA_ faah23192_ ZINC00038022_ x1HHPxtl_ 01_ 2-- 668066 In Progress 03/07/11 15:39:00 04/07/11 10:51:00 0.00 0.0 / 0.0 The estimated runtime reduced meanwhile (now 76%) to ~20 hours, but will return Too Late, triggering a resend for a new wingman Hopefully this new resend will be server aborted, when my result returns. Edit: (had a closer look, quorum 1) I'm the worthless wingman. The first came in too late, but errored. The second came too late, but was valid. And I will return too late (not my fault), and because of the already returned valid result will get no granted credit and associated runtime for this task BETA_ faah23192_ ZINC00038022_ x1HHPxtl_ 01_ 2-- - In Progress 03/07/11 15:39:00 04/07/11 10:51:00 0.00 0.0 / 0.0 BETA_ faah23192_ ZINC00038022_ x1HHPxtl_ 01_ 1-- 640 Valid 02/07/11 02:07:42 03/07/11 16:09:12 8.29 178.7 / 165.8 BETA_ faah23192_ ZINC00038022_ x1HHPxtl_ 01_ 0-- 640 Error 30/06/11 02:07:23 03/07/11 21:34:46 1.76 36.9 / 0.0 [Edit 2 times, last edit by Crystal Pellet at Jul 4, 2011 7:04:54 AM] |
||
|
Mathilde2006
Senior Cruncher Germany Joined: Sep 30, 2006 Post Count: 269 Status: Offline Project Badges: |
Short deadlines for the resends is not only a problem for the HPF2-beta's. I got a FAAH-beta resend on a 1500MHz machine (24/7, 100%) which of course started directly in High Priority with originally estimated 59 hours runtime: BETA_ faah23192_ ZINC00038022_ x1HHPxtl_ 01_ 2-- 668066 In Progress 03/07/11 15:39:00 04/07/11 10:51:00 0.00 0.0 / 0.0 The estimated runtime reduced meanwhile (now 76%) to ~20 hours, but will return Too Late, triggering a resend for a new wingman Hopefully this new resend will be server aborted, when my result returns. IMHO the WU will get "No Reply" and after return you'll get credit. AFAIK there is a grace period between 'No Reply' (and return after deadline with credit) and 'Too late' (no credit). |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
By now all that are dyed at crunching with WCG know that the techs have extreme leniency for granting credit where it's assumed that a fault is always on the side of WCG. So, e.g. "Too Late" and results stuck due "No Reply" will be pushed through when there's a tech on deck and takes time out to force matters into your favor. If the occasional credit does not pass due, then we bite tongue and crunch on... "This is what we do" (any association with a movie line is unintended :)
--//-- |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1313 Status: Offline Project Badges: |
I don't bite my tongue and the loss of credit is no pain.
----------------------------------------The loss of 20 hours Beta runtime is a bit of pain. Talking here about loss of credits and runtime is deflecting attention of the real problem: 40% of the original 2 days deadline for a resend of these long running tasks is much too short even for machines not switched off during the night, but always on. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Think the Dutch have an expression along the line of "Shared pain is half the pain"... it got me too, wrecked havoc on me quad and eventually had 15 tasks in memory due the continuous priority switching. This state was continuous Friday through early Monday morning. 2 of 3 Errors did not get credit worth 13 hours, the other good results, 39, did but for those 8 CEP2 stuck in PV jail did.
----------------------------------------While the short deadlines were disturbance in the posterior , it did lead into a full simulation including the handling of panic states. The tech will have learned what to optimize. Cheers and thanks for helping to test. --//-- edit: Those errors could have been credited in passing. BTW, when I speak/spoke of ''credit'', it's in the widest sense... points, time, result count. edit2: "pain" was probably better translated as "sorrow", virtual pain :D [Edit 2 times, last edit by Former Member at Jul 4, 2011 9:15:10 AM] |
||
|
sk..
Master Cruncher http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif Joined: Mar 22, 2007 Post Count: 2324 Status: Offline Project Badges: |
The good news for me was 75days of Betas from this run.
The good news for the project is that it was a good paradigm for the real thing. The bad news is that during the weekends some systems are off: BETA_ op963_ 00015_ 20-- c No Reply 02/07/11 05:49:11 03/07/11 04:41:02 0.00 0.0 / 0.0 BETA_ op963_ 00023_ 18-- t1 No Reply 02/07/11 04:21:28 04/07/11 04:21:28 0.00 0.0 / 0.0 BETA_ X0000116520982200912241453_ 1-- c No Reply 01/07/11 20:08:20 03/07/11 20:08:20 0.00 0.0 / 0.0 BETA_ X0000116520784200912171557_ 1-- c No Reply 01/07/11 19:21:22 03/07/11 19:21:22 0.00 0.0 / 0.0 BETA_ faah23195_ ZINC01378136_ x1HHPxtl_ 02_ 0-- c No Reply 01/07/11 18:08:56 03/07/11 18:08:56 0.00 0.0 / 0.0 BETA_ op991_ 00015_ 6-- o No Reply 01/07/11 10:07:44 03/07/11 10:07:44 0.00 0.0 / 0.0 BETA_ op982_ 00016_ 6-- o No Reply 01/07/11 08:47:54 03/07/11 08:47:54 0.00 0.0 / 0.0 BETA_ op984_ 00002_ 1-- L No Reply 01/07/11 08:23:33 03/07/11 08:23:33 0.00 0.0 / 0.0 BETA_ op984_ 00056_ 16-- M No Reply 01/07/11 08:05:33 03/07/11 08:05:33 0.00 0.0 / 0.0 BETA_ op988_ 00012_ 12-- M No Reply 01/07/11 07:30:31 03/07/11 07:30:31 0.00 0.0 / 0.0 BETA_ op983_ 00036_ 6-- M No Reply 01/07/11 07:14:33 03/07/11 07:14:33 0.00 0.0 / 0.0 BETA_ op982_ 00010_ 18-- a No Reply 01/07/11 06:46:32 03/07/11 06:46:32 0.00 0.0 / 0.0 BETA_ op985_ 00002_ 18-- M No Reply 01/07/11 04:03:16 03/07/11 04:03:16 0.00 0.0 / 0.0 BETA_ op988_ 00064_ 14-- a No Reply 01/07/11 03:37:30 03/07/11 03:37:30 0.00 0.0 / 0.0 BETA_ faah23194_ ZINC00819142_ x1HHPxtl_ 01_ 0-- M Too Late 30/06/11 22:14:26 04/07/11 08:29:25 8.18 93.2 / 0.0 BETA_ faah23194_ ZINC06501506_ x1HHPxtl_ 02_ 0-- a Too Late 30/06/11 22:08:08 04/07/11 08:35:30 9.84 128.0 / 0.0 I think I was also caught out with a slightly too large cache here and there and perhaps with such short deadlines, systems that do not upload and report tasks immediately are probably being caught out. I also have a few that have network restrictions; only use the Internet at night. Oh well, such problems make it more realistic and perhaps someone will do manual credit awards... |
||
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges: |
I agree with Crystal Pellet that the 0.4 * 48 = 19.2 hour deadline for these beta repair jobs is a problem for the sciences that don't cap their run times. I kept my cache under 1 day even during the beta downpour, but my slow machines were hard-pressed to get the repair jobs in on time even so. (I had to turn off HT for a while on 2 machines to speed up processing of individual WUs, so I sacrificed total crunching output to finish betas.) But if short deadlines are what is required to test the servers, so be it.
---------------------------------------- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello,
I repost my question, too much on going to receive full attention: http://www.worldcommunitygrid.org/forums/wcg/...d,31437_offset,120#330579 Thanks |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
IMHO the WU will get "No Reply" and after return you'll get credit. AFAIK there is a grace period between 'No Reply' (and return after deadline with credit) and 'Too late' (no credit). Does anyone know what the grace period is? I just had a HPF2 beta finish past the due time and go into PV. After some time (over an hour, I believe), it changed to "Too Late". The job already had quorum, so I wonder why it bothers to put it into PV first? Just to get my hopes up? I know the Error ones have been changing to full credit. It will be interesting to see if Too Late does too. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
"Grace" lasts until the results are assimilated and removed from the live DB. Since the main slowing is driven by the scheduler, the drive to complete that is substantial. Clean water does not last 24 hours after validation, Clean Energy maybe 48 hours. For HCMD2 it's probably longer because of the batch structure. HCC I've in a recent drive not seen stay on for longer than 24-36 after validation [IIRC]
----------------------------------------"Too Late" is a pathway employed to get it lined up for credit when e.g. wingmen are missing [re-distribution held for selected tasks]. To answer latakia, no idea other than knreed writing on Friday that he'd continue to check out the BETA CEP2 download issue. Some saw extreme slow DL speeds [but one, had up to 1Mbps for them]. We'll learn when the time is there. --//-- edit: insert "seen" [Edit 1 times, last edit by Former Member at Jul 5, 2011 7:50:11 AM] |
||
|
|