Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 567
Posts: 567   Pages: 57   [ Previous Page | 43 44 45 46 47 48 49 50 51 52 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 42067 times and has 566 replies Next Thread
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Unixchick said:
My queue is short so I can spot issues... My queue isn't refilling at a one in one out rate. This could be a short term issue that my short queue makes more noticeable, or I'm an early warning canary
I think the system has [mostly] been in "trying to clear out pending retries" mode since around 14:50 UTC today. Since then I've managed to get a whole 3 (yes, three!) retries and nothing else :-(

Given that every scheduler request gets the "other platforms" message I have to wonder what the bottleneck is, and whether it is specific to Linux users or not.
At this point, I think it is not inconceivable that this is a intermittent, lingering effect of the general Internet outage caused by Cloudflare, which started shortly after 12:00 UTC...

Definitive answer could only be giving by Dylan (or Igor), and they have been silent again for a week...

Ralf
[Nov 18, 2025 6:44:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Boca Raton Community HS
Senior Cruncher
Joined: Aug 27, 2021
Post Count: 209
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Yes, I spoke too soon, sorry about that.

Our system with the smallest cache just ran dry. I don't think we have any systems that store more than one day of work, just for clarity.

Reply to the above post from TPCBF: that is a really good point about Cloudflare. Didn't think of that.

Additional edit: if anyone has a specific way that we (Boca Raton HS) can help sift through our results to look for something/anything, just let us know. I have a group of students that is ready to help at any point.
----------------------------------------
[Edit 2 times, last edit by Boca Raton Community HS at Nov 18, 2025 7:09:46 PM]
[Nov 18, 2025 6:48:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1293
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I'm getting fresh MCM WUs now.
[Nov 19, 2025 2:20:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2493
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I'm getting fresh MCM WUs now.
Same here Unixchick. A whole bunch of them filled my cache to my max setting.
----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Nov 19, 2025 2:23:54 AM]
[Nov 19, 2025 2:23:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
MJH333
Senior Cruncher
England
Joined: Apr 3, 2021
Post Count: 300
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

This is something I haven’t seen very often … I’m currently processing MCM1_9999999_0017_9 .
See workunit 764547366.
Cheers,
Mark
[Nov 19, 2025 3:56:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

This is something I haven’t seen very often … I’m currently processing MCM1_9999999_0017_9 .
See workunit 764547366.
Cheers,
Mark
Well, those_9999999_ WUs were likely tests units only to begin with, so anything can happen with those IMHO....

Ralf
[Nov 19, 2025 4:23:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
MJH333
Senior Cruncher
England
Joined: Apr 3, 2021
Post Count: 300
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

This is something I haven’t seen very often … I’m currently processing MCM1_9999999_0017_9 .
See workunit 764547366.
Cheers,
Mark
Well, those_9999999_ WUs were likely tests units only to begin with, so anything can happen with those IMHO....

Ralf
Thanks Ralf.
My event log contains the following message for that task:
"7941 World Community Grid 19/11/2025 16:54:45 [error] State file error: missing file
7942 World Community Grid 19/11/2025 16:54:45 [error] State file error: missing input file
7943 World Community Grid 19/11/2025 16:54:45 [error] Can't handle task MCM1_9999999_0017 in scheduler reply
7944 World Community Grid 19/11/2025 16:54:45 [error] State file error: missing task MCM1_9999999_0017
7945 World Community Grid 19/11/2025 16:54:45 [error] Can't handle task MCM1_9999999_0017_9 in scheduler reply"
That presumably explains why it has gone through so many iterations without success.
Cheers,
Mark
[Nov 19, 2025 6:27:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Boca Raton Community HS
Senior Cruncher
Joined: Aug 27, 2021
Post Count: 209
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

That work unit is destined for failure! Is there a limit before the server would terminate the work unit?
----------------------------------------
[Edit 1 times, last edit by Boca Raton Community HS at Nov 19, 2025 7:07:06 PM]
[Nov 19, 2025 7:06:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1293
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Just got some weird errors
https://www.worldcommunitygrid.org/contribution/workunit/776764589

I'm going to assume a bad batch. I got 3 in a row. now I've got working WUs
----------------------------------------
[Edit 1 times, last edit by Unixchick at Nov 19, 2025 9:14:24 PM]
[Nov 19, 2025 9:08:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1316
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Unixchick,

Those appear to be download errors - permanent HTTP error.

Some days that seems to happen quite a few times mid-afternoon to early evening UTC time. If the ones I'm getting are typical, it appears that one of the nodes has some sort of issue for a few minutes (as my failures at a given time all seem to be from fanouts in the same group).

I can only comment on tasks I've seen fail, as the fanout data doesn't appear anywhere in the stuff the APIs let us look at (so I wrote a script to monitor scheduler replies, where the information is available wink). I only have reliable data from 2025-11-14 onwards, though there were a lot of such errors before that day, some of them discussed in this thread during the first week of November, including Dylan mentioning a possible reason and tentative fix.

On 2025-11-14 I saw errors around 14:30 and 15:40 (UTC) that seemed to be from the third group and errors around 16:40 that seemed to be from the second group. I then had four error-free days, but today (2025-11-19) errors at around 14:55 seemed to be from the second group and errors at around 16:10 seemed to be from the fourth group.

Given that these errors seem to show up during their working day I wonder if some development or diagnostic work is causing this, or whether it's a periodic issue we're going to have to get used to... Note that in all cases, other users have managed to successfully download tasks for the WUs in question, so it's presumably not something corrupt on the download server(s).

ADDENDUM: I didn't check again until early on 2025-11-20, and it turned out there had been more largish batches of errors on 2025-11-19 at around 18:00 (fourth group), 18:15 (sixth group) and 18:45 (fourth group again). Not good...

Cheers - Al.

P.S. I think the master file is served from the second group...

[Edited for the addendum and the note about successful downloads]
----------------------------------------
[Edit 3 times, last edit by alanb1951 at Nov 20, 2025 3:51:01 AM]
[Nov 19, 2025 11:06:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 567   Pages: 57   [ Previous Page | 43 44 45 46 47 48 49 50 51 52 | Next Page ]
[ Jump to Last Post ]
Post new Thread