| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 7
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello,
I was thinking to ask this since a while. When a WU with a validation copy does not receive in time the additional result, this one becomes "no reply" and another copy is sent to reliable machines, with a turnaround of <2 days. Happens sometimes for several reasons the fast machine is not fast anymore! And a third copy is sent. Now, I notice quite often that (and not only for resent WU) happens something like this: 0000108470112200904091510_ 2-- - In Progress 5/31/11 14:00:36 6/3/11 09:12:36 0.00 0.0 / 0.0 X0000108470112200904091510_ 0-- 642 Error 5/24/11 13:59:42 5/31/11 13:59:35 0.00 0.0 / 0.0 X0000108470112200904091510_ 1-- 642 Pending Validation 5/24/11 13:59:25 5/25/11 16:17:07 1.96 34.5 / 0.0 As you can see, the WU errored out only 7 seconds before the deadline! It cannot be casual! My question is: this is happening because BOINC is freaking out having all these WU about to expire, and then does something crazy with the result of delivering errors? I think has something to do with WUs about to expire, never started crunching, and the way BOINC manages them. Any thoughts? Thanks! |
||
|
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 3010 Status: Offline Project Badges:
|
My thoughts would be that BOINC is "freaking out" having all these WU's to crunch, and instead of attempting to run them (when, it's obvious to BOINC that they won't complete in time), it causes them to error out.
----------------------------------------What we've got to remember, is that HCC is the ONLY project (other than DDDT2), with a 7 day deadline as opposed to the 10 day one all the other WCG projects get. ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
latakia,
What client version is listed in the Result log of this task: X0000108470112200904091510_ 0-- 642 Error 5/24/11 13:59:42 5/31/11 13:59:35 0.00 0.0 / 0.0 Maybe if you click the error link and post a copy of the Result log we can second guess. Some versions of client is trained to abort a task on reaching deadline asd in ''why waste time on this'', except WCG maybe does not know this state, yet, so it is marked ''error''. We have: - User Aborted - Server Aborted - Aborted (which filters out both on the Result Status page) but not yet Client/Agent Aborted. But, as said, if you first could copy / paste the result log of the errd task into a reply, we might be able to learn more. --//-- PS, I've noticed similar btw... an error but the log only listing the task name and client version... empty... a sign of an aborted task. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Sekerob, it is exactly like you described in your last line.
So these are all signs of workunits aborted because no time to finish them. Thanks for the explanation. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Oh, I just noticed something!
----------------------------------------One of my machines was in the situation of reliable ---> not reliable anymore, and 4 results with short deadline didn't make in time... So, in the status you see "error" BUT if you filter for "aborted" they come out! So basically they are errors at the first sight but in reality they are not - Boinc aborts them and there is no - as Sekerob was stating - such a status defined "boinc aborted"... Good to know! edit: a g was missing... [Edit 1 times, last edit by Former Member at Jun 1, 2011 3:52:32 AM] |
||
|
|
deltavee
Ace Cruncher Texas Hill Country Joined: Nov 17, 2004 Post Count: 4894 Status: Offline Project Badges:
|
WU completing after almost 15 days! Barely under the wire of wingman no. 3.
E202394_ 100_ C.25.C20H13N3SSi.00508265.2.set1d06_ 3-- 640 Valid 6/26/11 12:16:05 6/27/11 17:02:36 6.17 131.2 / 189.4 E202394_ 100_ C.25.C20H13N3SSi.00508265.2.set1d06_ 2-- - No Reply 6/22/11 12:06:07 6/26/11 12:06:07 0.00 0.0 / 0.0 E202394_ 100_ C.25.C20H13N3SSi.00508265.2.set1d06_ 0-- 640 Valid 6/12/11 12:16:16 6/12/11 22:57:33 10.28 224.7 / 189.4 <--Me E202394_ 100_ C.25.C20H13N3SSi.00508265.2.set1d06_ 1-- 640 Valid 6/12/11 12:02:07 6/27/11 03:38:47 7.64 154.0 / 189.4 <--15 days! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Without knowing the client version it's a guess, but the 15 day "original" client probably did not talk to the servers for longer and when it did, it had already started the task [user could have suspended it 2 weeks ago after running for a little]. Then they're not aborted. Long as the task is on the RS pages the ''grace'' period continues... technically ''too late'' tough. Regrettably client with _3 also did not talk to servers prior to starting, else it woulds have likely been ''server aborted''.
Clients are designed to report late tasks immediately. A change coming is that the project can set a flag so that clients will report a task immediately upon completion. Could be one to employ for CEP2 only and or for ''No Reply" repair tasks. Something knreed might want to ponder on that (probably has already :O). --//-- |
||
|
|
|