Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3596
Posts: 3596   Pages: 360   [ Previous Page | 265 266 267 268 269 270 271 272 273 274 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5927114 times and has 3595 replies Next Thread
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Sunday Report

10,768 units have been validated this week

Assuming that a full generation 182 will be the last, there are 1,626,322 units still outstanding. I will re-start my forecasting once output stabilises.

The definition of accelerated and extreme are unchanged have moved up a generation.

There are 34 Extremes and 58 Accelerated units listed as one has crossed over. The numbers in their generations are 2,357 & 3,824.

The end daye for the project is too unpredictable but seems to be hovering around 2027 at the moment.

Mike
[May 14, 2023 6:39:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1331
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

adri

Your explanation is as good as any, but the error on your 44 seems to be an oddity - it waited 5 days before even being created! Or maybe it was but was delayed until its deadline.

Mike

Mike,

I suspect that the Error reported for the _0 task was probably "Validation Error" (as its stderr.txt seems to be reasonable!), and the validator wouldn't have even looked at the _0 return until the _1 task checked in, at which point the error(s) in the returned data from _0 would've become apparent and the _3 retry would have gone out (_1 was already late, so _2 had already gone out courtesy of the transitioner!)

There are several things that WCG just flags as Error when a more detailed error/status code is actually available (Not Started by Deadline is another example!) -- I see quite a few of these "stderr.txt looks o.k. but Error is reported" items amongst my wingmen, and I seem to recall that Adri had already seen evidence that that could indicate the validator flagged up an error (ill-formed or impossible data?) rather than a failure to match...

By the way, this doesn't just happen to ARP1 tasks...

Cheers - AL
[May 14, 2023 9:59:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2355
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

adri

Your explanation is as good as any, but the error on your 44 seems to be an oddity - it waited 5 days before even being created! Or maybe it was but was delayed until its deadline.

Mike

As it happens, luckily, after scrolling back quite some lines in my terminal's history (consisting of 5,000 lines), I found a moment on that screen where task _0 was still Pending Validation (!) and that task _1 had turned into 'No Reply':
 <1>   ARP1_0010993_138_0  Linux openSU  Pending Validation  2023-05-06T09:00:27  2023-05-07T15:09:59  5.54/5.62
<1> ARP1_0010993_138_1 Fedora Linux No Reply 2023-05-06T09:03:59 2023-05-12T09:03:59 0.00/0.00
<1> * ARP1_0010993_138_2 Fedora Linux In Progress 2023-05-12T09:04:12 2023-05-15T09:04:12 0.00/0.00

(Of course, when I made my selection at the time, there was no number <44>, it was a different selection and ARP1_0010993_138 came out on top as number <1>; different selection, different rankings.)

So, of course at first there were only tasks _0 and _1. Then _0 was turned in, waiting for task _1 to complete. Task _1 got a No Reply after 6 days, so task _2 was issued to me. That's the situation that you see above (copied from my screen).

Now let's have a look at number <44> again from a selection that I made later on:
<44>   ARP1_0010993_138_0  Linux openSU  Error      2023-05-06T09:00:27  2023-05-07T15:09:59
<44> ARP1_0010993_138_1 Fedora Linux Valid 2023-05-06T09:03:59 2023-05-12T22:28:59
<44> * ARP1_0010993_138_2 Fedora Linux Valid 2023-05-12T09:04:12 2023-05-12T19:59:14
<44> ARP1_0010993_138_3 Linux Ubuntu Valid 2023-05-12T21:41:47 2023-05-13T06:31:57

(It's an exact copy of the situation that you saw earlier in this thread.)
You can see here that the OpenSUSE task _0 has changed into Error. What happened there?
Well, one of the plausible reasons that task _3 was created, is that once my task (_2) was returned at 2023-05-12T19:59:14, tasks _0 and _2 were compared and - we don't know what happened exactly, but another task was needed for verification of the result - so task _3 was created and sent to another device at 2023-05-12T21:41:47. Quite a few minutes later, task _1 was returned at 2023-05-12T22:28:59. With task _3 still In Progress, my task _2 was declared Valid at Fri 12 May 22:55:23 UTC 2023 and so was task _1. We can't tell when the OpenSUSE task _0 was declared as Error, maybe at the same time as when tasks _1 and _2 were declared Valid. Anyway, when task _3 ended, they got the predicate Valid, too.

Adri
[May 14, 2023 11:01:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ian C
Cruncher
Joined: Jul 28, 2022
Post Count: 28
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

[May 15, 2023 5:31:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1310
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Sorry. I don't understand what is hinky. I'm missing something.

We have a validation issue at the moment. I think we all have a bunch of older WUs with 2 results that just sit in pending validation. Also newer resends may have validated just fine even though older WUs are still pending.

Does the hinky fall in this category, or is there something else to add to the list of problems?

- I'm so looking forward to the boost that will come when all these WUs are validated.
----------------------------------------
[Edit 1 times, last edit by Unixchick at May 15, 2023 6:00:35 PM]
[May 15, 2023 5:59:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1331
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available


One of those work-units was created within the last 4 or 5 days so (as Unixchick says) it will definitely be caught up in the validator backlog. (Validators would usually look for the oldest work-units that are validation candidates...)

The other one is a classic example of the mess that happens when tasks don't return a success state, and the difficulties that might occur when trying to interpret the result status on the web site! -- It has an extra task in progress because the _0 task must have gone No Reply (the return date is after the deadline!) so the last task got sent out and may or may not get cancelled the next time the receiving client contacts WCG... (The other retries all stemmed from the first task flagged as Error.) As mentioned previously, the retries are issued without reference to the validator :-)

And, as regards the validation backlog, whilst I appreciate that validating ARP1 tasks takes longer as it has to do a bitwise comparison of the result files, I have to wonder if the validator(s) only run part-time as part of an attempt to cram all the needed BOINC processes onto insufficient processor threads...

Cheers - Al
[May 15, 2023 8:33:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Al

Another reason for the last copy going out might have been that the 2 PVs didn't match, so an extra was sent to check.

Mike
[May 15, 2023 10:58:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Even with extreme status, 3 units have been validated with an average time of 12.5 days, including an ultra from generation 19.

Mike
[May 15, 2023 11:25:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1331
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Al

Another reason for the last copy going out might have been that the 2 PVs didn't match, so an extra was sent to check.

Mike

Mike,

Thanks for that -- _0 was definitely late (and the system should've queued a retry at 04:35 or so on the 13th!), but I misread the outgoing time on _5 so that association was incorrect.

However, if the validator found a mismatch it should've set them to Pending Verification, so I'm unsure what actually happened here as a standard BOINC server platform should not behave like this if all validators, transitioners and feeders are running constantly![1]

The sooner they can get enough hardware to run on, the better :-)

Cheers - Al

[1] "Managing" the ARP1 feeder would explain why retries sometimes seem to take quite a while to actually go out [Waiting to send?]... There also has to be something strange regarding the validator(s) (or they're turning it/them off for prolonged periods...)
----------------------------------------
[Edit 1 times, last edit by alanb1951 at May 16, 2023 1:33:51 AM]
[May 16, 2023 1:27:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1326
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

https://www.worldcommunitygrid.org/contribution/workunit/301873584 ARP1_0022784_140
The above task has 2 results pending validation & another 2 results have been sent out, mine is the bottom 1.Results 3 and 4 were sent out around 17 minutes of the 2nd results been returned

Seems overkill to me. Unless there is a major discrepancy between Windows 10 and Windows 11. my task has completed 10% in 37 minutes and 30 seconds
----------------------------------------

----------------------------------------
[Edit 2 times, last edit by Speedy51 at May 16, 2023 4:17:50 AM]
[May 16, 2023 4:08:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3596   Pages: 360   [ Previous Page | 265 266 267 268 269 270 271 272 273 274 | Next Page ]
[ Jump to Last Post ]
Post new Thread