World Community Grid - View Thread - Project Status (Old)

World Community Grid Forums

Category: Community

Forum: Chat Room

Thread: Project Status (Old)

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 352

[ ]

Author

This topic has been viewed 35750 times and has 351 replies

GB033533
Senior Cruncher
UK
Joined: Dec 8, 2004
Post Count: 206
Status: Offline
Project Badges:

90 day badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

90 day badge for Discovering Dengue Drugs - Together

90 day badge for Nutritious Rice for the World

45 day badge for The Clean Energy Project

90 day badge for Help Fight Childhood Cancer

14 day badge for Influenza Antiviral Drug Search

90 day badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

45 day badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

90 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

90 day badge for Uncovering Genome Mysteries

90 day badge for Outsmart Ebola Together

90 day badge for FightAIDS@Home - Phase 2

180 day badge for Smash Childhood Cancer

1 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

1 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

Is anyone else getting this error msg again;

19/06/2025 10:30:43 | World Community Grid | Another scheduler instance is running for this host

or just me? I don't think I've really got another instance running.

----------------------------------------

[Jun 19, 2025 9:37:17 AM]

Warped@RSA
Senior Cruncher
South Africa
Joined: Jan 15, 2006
Post Count: 440
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

2 year badge for Help Fight Childhood Cancer

45 day badge for Influenza Antiviral Drug Search

1 year badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Discovering Dengue Drugs - Together - Phase 2

90 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

1 year badge for Computing for Sustainable Water

10 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

180 day badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

Yes, I am getting the same.
I tried exiting BOINC which did not help.
I then rebooted the machine, still without success.
The tasks have been uploaded but remain "Ready to Report".

----------------------------------------

Dave

[Jun 19, 2025 9:55:51 AM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1334
Status: Recently Active
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

180 day badge for GO Fight Against Malaria

50 year badge for Mapping Cancer Markers

5 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project


Re: Project Status (First Post Updated)

Hey, ho; here we go again... Haven't seen this one for a while. It seems to have started at about 08:45 UTC today.

That's a server-side message so there's nothing users can do to resolve it :-( -- in brief, it means that the server is having user-specific lock-file problems, probably because of filestore connectivity issues.

[Edit]Search, search, found it! I dug the following out of a post I made in late 2023...

...scheduler requests use a per-host lock file to ensure that there aren't two concurrent requests from one host. The file is created at the start of the request, holds the PID of the scheduler instance, and is deleted at the end of the request.

There are two possible error conditions, one of which is that the lock file can't be acquired in the first place, the other that there is an existing lock. Unfortunately, although the message written to the server log distinguishes the two cases, the message sent to the client does not.

In this case, I suspect the issue is an inability to create the lock file in the first place :-(

Cheers - Al.

[Final edits to add the time the problem started, then to correct the time I'd entered to UTC!]

----------------------------------------
[Edit 3 times, last edit by alanb1951 at Jun 19, 2025 11:04:45 AM]

[Jun 19, 2025 10:44:43 AM]

ATHANASIOS PAN. GKOLIARAS
Cruncher
Joined: Dec 10, 2006
Post Count: 10
Status: Offline
Project Badges:

45 day badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Computing for Clean Water

45 day badge for Drug Search for Leishmaniasis

45 day badge for GO Fight Against Malaria

20 year badge for Mapping Cancer Markers

14 day badge for Uncovering Genome Mysteries

180 day badge for FightAIDS@Home - Phase 2

2 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

The beloved WCG brings new problems every time :D

[Jun 19, 2025 12:02:27 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1334
Status: Recently Active
Project Badges:


Re: Project Status (First Post Updated)

One of the annoying issues when we have a scheduler outage is caused by tasks that don't finish until near the deadline[*1] send in their results but are unable to report them -- a fair few tasks will end up as late returners (with redundant retries that may or may not get flushed before they get sent out...)

The build-up of No Reply tasks seems to have started, and is currently accompanied by retries being marked as "Waiting to be sent" (again, because of the scheduler outage)...

Of course, this will sort itself out once the scheduler issue is resolved, but it might be a bit messy at first because of the number of out-of-work users who are going to need to download the (non-sticky) MCM1 master data file again :-(

At least events of this severity have been far less frequent recently. Roll on July, if the infrastructure used by WCG improves when the new data centre stuff should go live...

Cheers - Al.

*1 Why some systems seem to need nearly 6 days to return tasks that should only take a few hours to run is a complex (and, I suspect, emotive) issue with no simple answer...

[Jun 19, 2025 1:36:17 PM]

Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1313
Status: Offline
Project Badges:

45 day badge for Microbiome Immunity Project


Re: Project Status (First Post Updated)

Wow. We haven't seen that error in a while. Hopefully the tech team will soon be in to fix this issue.
Thanks to everyone who reported on the problem and added extra details.

[Jun 19, 2025 1:53:39 PM]

catchercradle
Senior Cruncher
England
Joined: Jan 16, 2009
Post Count: 171
Status: Offline
Project Badges:

14 day badge for Drug Search for Leishmaniasis

14 day badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

Thanks. Completely new one to me. Shame it had to happen at the same time as CPDN's servers went down in Oxford due to power going out in a server room. (That was about 24 hours ago but no update since Andy's email.

[Jun 19, 2025 2:26:38 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

45 day badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

5 year badge for FightAIDS@Home - Phase 2


Re: Project Status (First Post Updated)

Al

With a bit of luck the No Replies that finish before the restart will get credited as the re-sends are also held up.

Mike

[Jun 19, 2025 2:29:55 PM]

Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1313
Status: Offline
Project Badges:


Re: Project Status (First Post Updated)

Message from server: We are currently experiencing high load and are temporarily deferring your scheduler request. Your client will automatically try again later.

I'm now getting this message in the logs, so they are working on the issue

[Jun 19, 2025 2:45:27 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1334
Status: Recently Active
Project Badges:


Re: Project Status (First Post Updated)

[Edited to reflect that, by the time I'd wrangled this into shape, Unixchick had made a much shorter post on the subject...]

I notice that this has happened in the past, and in the days of IBM at that... Here's a thread from March 2014 -- I've linked to uplinger's initial reply

It is quite interesting reading the rest of that thread from that point -- I wonder if Sgt. Joe remembers it (and I saw some other familiar names too...) Whilst it isn't about lock file problems, it does give an insight into "what happened next"...

Unfortunately, results can't be reported (even if one sets No New Tasks the request gets deferred...) so we're still stuck...

Cheers - Al.

P.S. Not necessarily connected to the lock file issue but of note given some of the speculation in that thread... Over the last three days I've seen over 300 MCM1 No Reply tasks (about 20% of the tasks I processed) that looked likely to be from cloud instances that had been turned off without tidying up first, and that was before the scheduler became unable to send out retries... I wonder how many more of those might still be lurking? And is that only going to be a Linux issue?? Ah, well...

----------------------------------------
[Edit 1 times, last edit by alanb1951 at Jun 19, 2025 3:46:35 PM]

[Jun 19, 2025 3:43:31 PM]

[ ]