| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 7
|
|
| Author |
|
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges:
|
OK, I've been around long enough to know how you usually get WUs in "Pending Verification". I don't mind the occasional PendVer that pops up or when one of my machines goes rogue and decides to poop an error WU. I have checked and I don't have any WUs in error or invalid status. Every FAHV WU from this particular machine (Win 7, Intel) started having Pending Verification status since 12/15 1:15am (UTC?). The only thing that is different is a WU that is in "Server Aborted" status (coincidentally before the PendVer status started). Is this what started the PendVer epidemia (at least for me
---------------------------------------- ) ? Is there some other situation that causes the Pending Verification status? I know that the WUs will eventually sort themselves out... just curious as to why it happened.Thanks, CJSL Gotta keep crunching, there's a world to save !!! ---------------------------------------- [Edit 1 times, last edit by cjslman at Dec 16, 2013 1:17:24 PM] |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7844 Status: Offline Project Badges:
|
Yes, the server abort will trigger a bunch of the pending verifications. The pending verification status is also used to periodically check the machine integrity.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Pretty sure to have had an exchange on this with knreed and he fixed the server aborted switch flipping i.e. it is not supposed to. Who knows when the Result Status was fixed during the web update, different check codes were assigned but strongly doubt that. Code 202 / 203, server / user aborted is not to impact the PVer series [which then means the device is expected to have a serial Valid for the next 20+ results [always has to have the last 20 results valid or it starts all over again].
And confirmed. The change was applied on Oct.12, 2012 "... user aborts and server aborts should no longer reset the 'consecutive valid' counter on the host_app_version table. That will allow them to retain their 'reliable' status.". If it does, there's a regression. At any rate, except for the random result passing via PVer, me FAHV only, rarely a faah, is doing it in 90%+of the cases alone and see the occasional Server Abort for the odd repair that resulted from a No Reply. Do -not- abort tasks that have started. They do trigger PVer for FA@H [never felt the need to abort a running CEP2 to experience anything with that science]. Under the old server v601 the trigger was supposed to be only Invalid, but v700 is quite a bit more sensitive, mainly [my cooked up theory] because it's quicker to grant "zero redundant" operating rights... in new serial 20+ valid, in old like about 80. [It was said at start of v700 to even be 5, but that was a little too generous... the tracking consistently gave > 20 serial Valid required. [8.25 AM, no caffe yet and already 3 para's, but now I can forget this, until it flairs up again :] |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7844 Status: Offline Project Badges:
|
I hardly ever see a server abort, but when I did, it did trigger a cascade of pending verifications. If this was supposed to be fixed, it may have reverted to old form.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Had several server aborts on FAHV in November, and no PVer avalanche, so not knowing at all what then ticked the server off [After MCM launch].
Name DeviceName Status LastUpdate Returned CPUTime Claimed Granted Elapsed PPH ProjectName OSName CPUModel Speed Sent ElapsedTime FAHV_ x3AVLbINy3A_ 0313923_ 0130_ 0-- 2524499 Valid 14-11-2013 8:17:00 13-11-2013 23:48:23 0,96 33,7 33,7 13,50 35,10 FAAH Ubuntu-64 13.10 Q6600 2400 13-11-2013 10:18:38 0,98 FAHV_ x3AVLbINy3A_ 0313923_ 0149_ 0-- 2524499 Valid 14-11-2013 8:17:00 13-11-2013 23:46:13 1,25 44,1 44,1 13,46 35,28 FAAH Ubuntu-64 13.10 Q6600 2400 13-11-2013 10:18:38 1,28 FAHV_ x3AVLbINy3A_ 0313923_ 0307_ 0-- 2524499 Valid 14-11-2013 8:17:00 13-11-2013 23:41:27 1,37 48,8 48,8 13,38 35,62 FAAH Ubuntu-64 13.10 Q6600 2400 13-11-2013 10:18:38 1,40 FAHV_ x3AVLbINleA_ 0313556_ 0180_ 1-- 2524499 Valid 14-11-2013 8:17:00 13-11-2013 19:07:02 1,10 36,6 35,9 15,58 32,64 FAAH Ubuntu-64 13.10 Q6600 2400 13-11-2013 3:32:28 1,10 FAHV_ x3AVLbINy3A_ 0313923_ 0166_ 0-- 2524499 Valid 14-11-2013 0:13:00 13-11-2013 23:00:27 1,20 42,8 42,8 12,70 35,67 FAAH Ubuntu-64 13.10 Q6600 2400 13-11-2013 10:18:38 1,22 FAHV_ x3AVLbINy3A_ 0313923_ 0310_ 0-- 2524499 Valid 14-11-2013 0:13:00 13-11-2013 22:50:44 1,19 42,3 42,3 12,53 35,55 FAAH Ubuntu-64 13.10 Q6600 2400 13-11-2013 10:18:38 1,21 FAHV_ x3AVNbINy3A_ 0347384_ 0498_ 1-- 2524499 Valid 14-11-2013 0:13:00 13-11-2013 22:29:44 0,97 34,3 34,3 12,20 35,36 FAAH Ubuntu-64 13.10 Q6600 2400 13-11-2013 10:17:28 0,99 FAHV_ x3AVLbINy3A_ 0313922_ 0308_ 1-- 2524499 Valid 14-11-2013 0:13:00 13-11-2013 22:17:40 1,30 45,8 40,9 12,02 31,46 FAAH Ubuntu-64 13.10 Q6600 2400 13-11-2013 10:16:20 1,32 FAHV_ x3AVNbINy3A_ 0347381_ 0431_ 2-- 2524499 Valid 13-11-2013 22:50:00 13-11-2013 21:46:59 1,28 44,3 42,7 11,53 33,36 FAAH Ubuntu-64 13.10 Q6600 2400 13-11-2013 10:15:11 1,30 FAHV_ x3AVNaINy3A_ 0347066_ 0247_ 1-- 2524499 Valid 13-11-2013 22:50:00 13-11-2013 21:37:10 1,21 41,6 42,0 15,60 34,71 FAAH Ubuntu-64 13.10 Q6600 2400 13-11-2013 6:01:06 1,24 FAHV_ x3AVLbINleA_ 0313680_ 0304_ 0-- 2524499 Valid 13-11-2013 22:50:00 13-11-2013 21:30:09 1,19 41,1 41,1 16,52 34,54 FAAH Ubuntu-64 13.10 Q6600 2400 13-11-2013 4:58:45 1,21 FAHV_ x3AVLbINleA_ 0313641_ 0350_ 0-- 2524499 Valid 13-11-2013 22:18:00 13-11-2013 20:17:39 1,17 38,3 38,3 15,99 32,74 FAAH Ubuntu-64 13.10 Q6600 2400 13-11-2013 4:18:03 1,18 FAHV_ x3AVNbINy3A_ 0347381_ 0419_ 2-- 2524499 Server Aborted 13-11-2013 20:01:00 13-11-2013 15:53:52 0,00 5,64 FAAH Ubuntu-64 13.10 Q6600 2400 13-11-2013 10:15:11 0,00 FAHV_ x3AVNbINy3A_ 0347375_ 0443_ 2-- 2524499 Server Aborted 13-11-2013 20:01:00 13-11-2013 15:53:52 0,00 5,64 FAAH Ubuntu-64 13.10 Q6600 2400 13-11-2013 10:15:11 0,00 FAHV_ x3AVNbINy3A_ 0347381_ 0441_ 2-- 2524499 Server Aborted 13-11-2013 20:01:00 13-11-2013 15:53:52 0,00 5,64 FAAH Ubuntu-64 13.10 Q6600 2400 13-11-2013 10:15:11 0,00 As can be derived from the WCGDAWS extract's timestamps, the immediate validations continued for the _0 copies... no waiting on post reporting copies (and HTD, there's no select mining here). The time out of 15.99 hours indicates it was send to that host prior to the SA's and returned after the SA's Something broke again, or it just happened to be time to do a periodic serial check, but only knreed et al can verify [It would be nice if a tech is send on code hunting to have a little bit more meat to work with]. How to easily check... knreed can send a server abort for one task to device 2524499. This one is only doing FAHV without buffer... return within 2 hours from receipt....hmmm well that would always be started tasks, so I'll set the buffer to 0.1 days to make sure the host has some In Progress. Verifying this ensures we do not waste time on unnecessary redundant copies [but it does work to verify both devices, wingman too i.e. not as redundant as we think it to be] (Most irritating program on the Sci-Fi channel... Ghost Hunters ;O). |
||
|
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges:
|
Thanks for the comments. After reading them, I was going to post the evidence, but the principal actor, the server aborted WU, is not to be found
---------------------------------------- . But thanks anyway, it answered my question as to what cause the deluge of Pvers. And that really sucks, because I didn't have anything to do with the server abort (that's kicked off by the server, not me or my machine). Oh well, another day in paradise .CJSL Crunching for a better world... |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7844 Status: Offline Project Badges:
|
I hardly ever see a server abort, but when I did, it did trigger a cascade of pending verifications. If this was supposed to be fixed, it may have reverted to old form. Cheers I just had two server aborts on a machine yesterday and it did not trigger any pending verifications this time. Whatever it was has been fixed or fixed itself. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
|