| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 3596
|
|
| Author |
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
Sunday Report
10,768 units have been validated this week Assuming that a full generation 182 will be the last, there are 1,626,322 units still outstanding. I will re-start my forecasting once output stabilises. The definition of accelerated and extreme are unchanged have moved up a generation. There are 34 Extremes and 58 Accelerated units listed as one has crossed over. The numbers in their generations are 2,357 & 3,824. The end daye for the project is too unpredictable but seems to be hovering around 2027 at the moment. Mike |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1331 Status: Offline Project Badges:
|
adri Your explanation is as good as any, but the error on your 44 seems to be an oddity - it waited 5 days before even being created! Or maybe it was but was delayed until its deadline. Mike Mike, I suspect that the Error reported for the _0 task was probably "Validation Error" (as its stderr.txt seems to be reasonable!), and the validator wouldn't have even looked at the _0 return until the _1 task checked in, at which point the error(s) in the returned data from _0 would've become apparent and the _3 retry would have gone out (_1 was already late, so _2 had already gone out courtesy of the transitioner!) There are several things that WCG just flags as Error when a more detailed error/status code is actually available (Not Started by Deadline is another example!) -- I see quite a few of these "stderr.txt looks o.k. but Error is reported" items amongst my wingmen, and I seem to recall that Adri had already seen evidence that that could indicate the validator flagged up an error (ill-formed or impossible data?) rather than a failure to match... By the way, this doesn't just happen to ARP1 tasks... Cheers - AL |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2355 Status: Offline Project Badges:
|
adri Your explanation is as good as any, but the error on your 44 seems to be an oddity - it waited 5 days before even being created! Or maybe it was but was delayed until its deadline. Mike As it happens, luckily, after scrolling back quite some lines in my terminal's history (consisting of 5,000 lines), I found a moment on that screen where task _0 was still Pending Validation (!) and that task _1 had turned into 'No Reply': <1> ARP1_0010993_138_0 Linux openSU Pending Validation 2023-05-06T09:00:27 2023-05-07T15:09:59 5.54/5.62 (Of course, when I made my selection at the time, there was no number <44>, it was a different selection and ARP1_0010993_138 came out on top as number <1>; different selection, different rankings.) So, of course at first there were only tasks _0 and _1. Then _0 was turned in, waiting for task _1 to complete. Task _1 got a No Reply after 6 days, so task _2 was issued to me. That's the situation that you see above (copied from my screen). Now let's have a look at number <44> again from a selection that I made later on: <44> ARP1_0010993_138_0 Linux openSU Error 2023-05-06T09:00:27 2023-05-07T15:09:59 (It's an exact copy of the situation that you saw earlier in this thread.) You can see here that the OpenSUSE task _0 has changed into Error. What happened there? Well, one of the plausible reasons that task _3 was created, is that once my task (_2) was returned at 2023-05-12T19:59:14, tasks _0 and _2 were compared and - we don't know what happened exactly, but another task was needed for verification of the result - so task _3 was created and sent to another device at 2023-05-12T21:41:47. Quite a few minutes later, task _1 was returned at 2023-05-12T22:28:59. With task _3 still In Progress, my task _2 was declared Valid at Fri 12 May 22:55:23 UTC 2023 and so was task _1. We can't tell when the OpenSUSE task _0 was declared as Error, maybe at the same time as when tasks _1 and _2 were declared Valid. Anyway, when task _3 ended, they got the predicate Valid, too. Adri |
||
|
|
Ian C
Cruncher Joined: Jul 28, 2022 Post Count: 28 Status: Offline |
So I have this:
https://www.worldcommunitygrid.org/contribution/workunit/304031341 But now also this: https://www.worldcommunitygrid.org/contribution/workunit/301783165 Something looks well hinky to me. |
||
|
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1310 Status: Offline Project Badges:
|
Sorry. I don't understand what is hinky. I'm missing something.
----------------------------------------We have a validation issue at the moment. I think we all have a bunch of older WUs with 2 results that just sit in pending validation. Also newer resends may have validated just fine even though older WUs are still pending. Does the hinky fall in this category, or is there something else to add to the list of problems? - I'm so looking forward to the boost that will come when all these WUs are validated. [Edit 1 times, last edit by Unixchick at May 15, 2023 6:00:35 PM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1331 Status: Offline Project Badges:
|
So I have this: https://www.worldcommunitygrid.org/contribution/workunit/304031341 But now also this: https://www.worldcommunitygrid.org/contribution/workunit/301783165 Something looks well hinky to me. One of those work-units was created within the last 4 or 5 days so (as Unixchick says) it will definitely be caught up in the validator backlog. (Validators would usually look for the oldest work-units that are validation candidates...) The other one is a classic example of the mess that happens when tasks don't return a success state, and the difficulties that might occur when trying to interpret the result status on the web site! -- It has an extra task in progress because the _0 task must have gone No Reply (the return date is after the deadline!) so the last task got sent out and may or may not get cancelled the next time the receiving client contacts WCG... (The other retries all stemmed from the first task flagged as Error.) As mentioned previously, the retries are issued without reference to the validator :-) And, as regards the validation backlog, whilst I appreciate that validating ARP1 tasks takes longer as it has to do a bitwise comparison of the result files, I have to wonder if the validator(s) only run part-time as part of an attempt to cram all the needed BOINC processes onto insufficient processor threads... Cheers - Al |
||
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
Al
Another reason for the last copy going out might have been that the 2 PVs didn't match, so an extra was sent to check. Mike |
||
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
Even with extreme status, 3 units have been validated with an average time of 12.5 days, including an ultra from generation 19.
Mike |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1331 Status: Offline Project Badges:
|
Al Another reason for the last copy going out might have been that the 2 PVs didn't match, so an extra was sent to check. Mike Mike, Thanks for that -- _0 was definitely late (and the system should've queued a retry at 04:35 or so on the 13th!), but I misread the outgoing time on _5 so that association was incorrect. However, if the validator found a mismatch it should've set them to Pending Verification, so I'm unsure what actually happened here as a standard BOINC server platform should not behave like this if all validators, transitioners and feeders are running constantly![1] The sooner they can get enough hardware to run on, the better :-) Cheers - Al [1] "Managing" the ARP1 feeder would explain why retries sometimes seem to take quite a while to actually go out [Waiting to send?]... There also has to be something strange regarding the validator(s) (or they're turning it/them off for prolonged periods...) [Edit 1 times, last edit by alanb1951 at May 16, 2023 1:33:51 AM] |
||
|
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1326 Status: Offline Project Badges:
|
https://www.worldcommunitygrid.org/contribution/workunit/301873584 ARP1_0022784_140
----------------------------------------The above task has 2 results pending validation & another 2 results have been sent out, mine is the bottom 1.Results 3 and 4 were sent out around 17 minutes of the 2nd results been returned Seems overkill to me. Unless there is a major discrepancy between Windows 10 and Windows 11. my task has completed 10% in 37 minutes and 30 seconds ![]() [Edit 2 times, last edit by Speedy51 at May 16, 2023 4:17:50 AM] |
||
|
|
|