World Community Grid - View Thread - Project Status (First Post Updated)

World Community Grid Forums

Category: Community

Forum: Chat Room

Thread: Project Status (First Post Updated)

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 567

[ ]

Author

This topic has been viewed 42067 times and has 566 replies

TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

10 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

5 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

50 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

Unixchick said:

My queue is short so I can spot issues... My queue isn't refilling at a one in one out rate. This could be a short term issue that my short queue makes more noticeable, or I'm an early warning canary

I think the system has [mostly] been in "trying to clear out pending retries" mode since around 14:50 UTC today. Since then I've managed to get a whole 3 (yes, three!) retries and nothing else :-(

Given that every scheduler request gets the "other platforms" message I have to wonder what the bottleneck is, and whether it is specific to Linux users or not.

At this point, I think it is not inconceivable that this is a intermittent, lingering effect of the general Internet outage caused by Cloudflare, which started shortly after 12:00 UTC...

Definitive answer could only be giving by Dylan (or Igor), and they have been silent again for a week...

Ralf

[Nov 18, 2025 6:44:03 PM]

Boca Raton Community HS
Senior Cruncher
Joined: Aug 27, 2021
Post Count: 209
Status: Offline
Project Badges:

10 year badge for Smash Childhood Cancer

10 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

Yes, I spoke too soon, sorry about that.

Our system with the smallest cache just ran dry. I don't think we have any systems that store more than one day of work, just for clarity.

Reply to the above post from TPCBF: that is a really good point about Cloudflare. Didn't think of that.

Additional edit: if anyone has a specific way that we (Boca Raton HS) can help sift through our results to look for something/anything, just let us know. I have a group of students that is ready to help at any point.

----------------------------------------
[Edit 2 times, last edit by Boca Raton Community HS at Nov 18, 2025 7:09:46 PM]

[Nov 18, 2025 6:48:33 PM]

Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1293
Status: Offline
Project Badges:

180 day badge for Smash Childhood Cancer

45 day badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

1 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

I'm getting fresh MCM WUs now.

[Nov 19, 2025 2:20:35 AM]

Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2493
Status: Offline
Project Badges:

10 year badge for Mapping Cancer Markers

14 day badge for FightAIDS@Home - Phase 2

1 year badge for Microbiome Immunity Project

90 day badge for Africa Rainfall Project

2 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

I'm getting fresh MCM WUs now.

Same here Unixchick. A whole bunch of them filled my cache to my max setting.

----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Nov 19, 2025 2:23:54 AM]

[Nov 19, 2025 2:23:25 AM]

MJH333
Senior Cruncher
England
Joined: Apr 3, 2021
Post Count: 300
Status: Offline
Project Badges:

50 year badge for Mapping Cancer Markers

5 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

This is something I haven’t seen very often … I’m currently processing MCM1_9999999_0017_9 .
See workunit 764547366.
Cheers,
Mark

[Nov 19, 2025 3:56:18 PM]

TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:


Re: Project Status (First Post Updated)

This is something I haven’t seen very often … I’m currently processing MCM1_9999999_0017_9 .
See workunit 764547366.
Cheers,
Mark

Well, those_9999999_ WUs were likely tests units only to begin with, so anything can happen with those IMHO....

Ralf

[Nov 19, 2025 4:23:10 PM]

MJH333
Senior Cruncher
England
Joined: Apr 3, 2021
Post Count: 300
Status: Offline
Project Badges:


Re: Project Status (First Post Updated)

This is something I haven’t seen very often … I’m currently processing MCM1_9999999_0017_9 .
See workunit 764547366.
Cheers,
Mark

Well, those_9999999_ WUs were likely tests units only to begin with, so anything can happen with those IMHO....

Ralf

Thanks Ralf.
My event log contains the following message for that task:
"7941 World Community Grid 19/11/2025 16:54:45 [error] State file error: missing file
7942 World Community Grid 19/11/2025 16:54:45 [error] State file error: missing input file
7943 World Community Grid 19/11/2025 16:54:45 [error] Can't handle task MCM1_9999999_0017 in scheduler reply
7944 World Community Grid 19/11/2025 16:54:45 [error] State file error: missing task MCM1_9999999_0017
7945 World Community Grid 19/11/2025 16:54:45 [error] Can't handle task MCM1_9999999_0017_9 in scheduler reply"
That presumably explains why it has gone through so many iterations without success.
Cheers,
Mark

[Nov 19, 2025 6:27:54 PM]

Boca Raton Community HS
Senior Cruncher
Joined: Aug 27, 2021
Post Count: 209
Status: Offline
Project Badges:


Re: Project Status (First Post Updated)

That work unit is destined for failure! Is there a limit before the server would terminate the work unit?

----------------------------------------
[Edit 1 times, last edit by Boca Raton Community HS at Nov 19, 2025 7:07:06 PM]

[Nov 19, 2025 7:06:28 PM]

Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1293
Status: Offline
Project Badges:


Re: Project Status (First Post Updated)

Just got some weird errors
https://www.worldcommunitygrid.org/contribution/workunit/776764589

I'm going to assume a bad batch. I got 3 in a row. now I've got working WUs

----------------------------------------
[Edit 1 times, last edit by Unixchick at Nov 19, 2025 9:14:24 PM]

[Nov 19, 2025 9:08:46 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1316
Status: Recently Active
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

10 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

Unixchick,

Those appear to be download errors - permanent HTTP error.

Some days that seems to happen quite a few times mid-afternoon to early evening UTC time. If the ones I'm getting are typical, it appears that one of the nodes has some sort of issue for a few minutes (as my failures at a given time all seem to be from fanouts in the same group).

I can only comment on tasks I've seen fail, as the fanout data doesn't appear anywhere in the stuff the APIs let us look at (so I wrote a script to monitor scheduler replies, where the information is available wink

). I only have reliable data from 2025-11-14 onwards, though there were a lot of such errors before that day, some of them discussed in this thread during the first week of November, including Dylan mentioning a possible reason and tentative fix.

On 2025-11-14 I saw errors around 14:30 and 15:40 (UTC) that seemed to be from the third group and errors around 16:40 that seemed to be from the second group. I then had four error-free days, but today (2025-11-19) errors at around 14:55 seemed to be from the second group and errors at around 16:10 seemed to be from the fourth group.

Given that these errors seem to show up during their working day I wonder if some development or diagnostic work is causing this, or whether it's a periodic issue we're going to have to get used to... Note that in all cases, other users have managed to successfully download tasks for the WUs in question, so it's presumably not something corrupt on the download server(s).

ADDENDUM: I didn't check again until early on 2025-11-20, and it turned out there had been more largish batches of errors on 2025-11-19 at around 18:00 (fourth group), 18:15 (sixth group) and 18:45 (fourth group again). Not good...

Cheers - Al.

P.S. I think the master file is served from the second group...

[Edited for the addendum and the note about successful downloads]

----------------------------------------
[Edit 3 times, last edit by alanb1951 at Nov 20, 2025 3:51:01 AM]

[Nov 19, 2025 11:06:14 PM]

[ ]