World Community Grid - View Thread - 2024-03-11 Update (Website errors and MCM1 work unit availability)

World Community Grid Forums

Category: Official Messages

Forum: News

Thread: 2024-03-11 Update (Website errors and MCM1 work unit availability)

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 10

[ ]

Author

This topic has been viewed 109010 times and has 9 replies

TigerLily
Senior Cruncher
Joined: May 26, 2023
Post Count: 280
Status: Offline
Project Badges:


2024-03-11 Update (Website errors and MCM1 work unit availability)

We are working to resolve the frequent 503 service unavailable errors that users are encountering when the load balancer is unable to connect to the production webserver by adding additional instances of the webserver to balance the load between, and an additional load balancer as well. We expect this work to be completed this week.

Regarding the lack of available MCM1 work units, we noticed that the createWork process was not loading new MCM1 work units last week. We were able to begin building batches of work units again on Friday evening last week (March 8). However, the throughput is still poor and we will be pushing additional changes to prod to reduce the batch size (in terms of enumerated work units) of SQL queries hitting the BOINC database from various BOINC components, reduce the LIMIT value in some queries and adjust the parameters of the database and database clients on the relevant servers to allow for larger MySQL packets containing these remote queries. We will communicate whether the issue was resolved as we expect, which in this case will be clearly reflected in the logs on the BOINC database server. As soon as we have made these changes and confirmed the result, we will update everyone here.

Thank you for your support, and for your patience as we work to resolve these issues.

WCG Team

[Mar 11, 2024 5:33:56 PM]

Barnsley_Tatts
Senior Cruncher
Joined: Nov 3, 2005
Post Count: 291
Status: Offline
Project Badges:

90 day badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

45 day badge for Nutritious Rice for the World

90 day badge for Help Fight Childhood Cancer

90 day badge for The Clean Energy Project - Phase 2

90 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

20 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

90 day badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: 2024-03-11 Update (Website errors and MCM1 work unit availability)

MCM WUs seem to be flowing quite well at the moment thankfully.

Long may it continue! :)

----------------------------------------

[Mar 11, 2024 5:46:06 PM]

as1981
Advanced Cruncher
Joined: Dec 3, 2006
Post Count: 51
Status: Offline
Project Badges:

14 day badge for OpenPandemics - COVID-19


Re: 2024-03-11 Update (Website errors and MCM1 work unit availability)

@TigerLily Thank you for the update
I'm also able to download work units at the moment.

[Mar 11, 2024 7:39:43 PM]

Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1307
Status: Offline
Project Badges:

180 day badge for Smash Childhood Cancer

45 day badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

1 year badge for OpenPandemics - COVID-19


Re: 2024-03-11 Update (Website errors and MCM1 work unit availability)

Thank you for the detailed update TigerLily. Thanks to the tech team for the fixes.

[Mar 11, 2024 8:10:45 PM]

Blount
Veteran Cruncher
Joined: Aug 19, 2005
Post Count: 600
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding

2 year badge for Human Proteome Folding - Phase 2

180 day badge for Help Cure Muscular Dystrophy

45 day badge for Discovering Dengue Drugs - Together

180 day badge for Nutritious Rice for the World

1 year badge for Help Fight Childhood Cancer

45 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

45 day badge for Uncovering Genome Mysteries

1 year badge for Outsmart Ebola Together

1 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: 2024-03-11 Update (Website errors and MCM1 work unit availability)

TigerLily, I am seeing a small trickle of MCM WUs. Any idea when we might see a ramp of of WUs?

[Mar 13, 2024 7:49:44 PM]

mike0164
Cruncher
Joined: Apr 28, 2007
Post Count: 9
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

180 day badge for Discovering Dengue Drugs - Together

2 year badge for Help Fight Childhood Cancer

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

90 day badge for Computing for Sustainable Water

100 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

2 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: 2024-03-11 Update (Website errors and MCM1 work unit availability)

TigerLily, I am seeing a small trickle of MCM WUs. Any idea when we might see a ramp of of WUs?

TigerLily,

Any more news on the MCM WU front? It has been over a week since your last message here. As others have seen, I'm only getting a trickle of WUs per day at this point. Per the other thread, I do know you said the tech team is working the issue. Would/could you give us more details?

Thank you.

Mike

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by mike0164 at Mar 21, 2024 2:44:36 AM]

[Mar 21, 2024 2:11:55 AM]

TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:

10 year badge for Help Fight Childhood Cancer

5 year badge for The Clean Energy Project - Phase 2

2 year badge for Drug Search for Leishmaniasis

2 year badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

50 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Re: 2024-03-11 Update (Website errors and MCM1 work unit availability)

Well,...

OPNG WUs reduced to a trickle again, though MCM1 seems to be sending out adequate amounts of WUs, even when it sometimes looks a bit delayed, but no host is really running empty...

However, looking at my stats, I started to notice that the amount of valid MCM1 WUs on the Results page seems to increase again, instead of the usual (well, in good times wink

) 3 days retention before they got purged, they are going back to at least a week, with the oldest valid WU returned 3//23.

I think it would prudent to check if the problem that caused those WUs to accumulate for 4 months isn't rearing it's ugly head again and would be foreshadowing more problems to come.

thanks,

Ralf

[Apr 3, 2024 4:50:01 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1327
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

50 year badge for Mapping Cancer Markers

10 year badge for FightAIDS@Home - Phase 2


Re: 2024-03-11 Update (Website errors and MCM1 work unit availability)

As far as I can tell from my returned results analysis, the assimilator still seems to be performing as it should but it looks as if d/b purge hasn't been doing anything since about 12:20 UTC on 2024-04-03.

Ralf - regarding long waits for purging: there seem to be two 24r hour lags between validation and purging[*1] -- my apologies if you already knew that :-) If something I return has to wait for a retry because of a wingman missing the deadline it might well take 8 or 9 days (or more under certain uncommon circumstances) for a task to disappear! I only start worrying if an MCM1 task sits at Valid state for more than two days after the last returned result time :-)

Cheers - Al

[*1] One of those 24 hour intervals is between flagging for purge and the actual purge; the other is between assimilation and file deletion.

[Apr 3, 2024 6:35:22 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1327
Status: Offline
Project Badges:


Re: 2024-03-11 Update (Website errors and MCM1 work unit availability)

Further to my post immediately above: database puirge appears to have restarted 24 hours after it stopped, and it took about 2 hours to get back to the normal "gone 24 hours after flagging for purge" routine...

Was it a scheduled [but unannounced] outage, or is there usually a restart each day at that time and yesterday it just didn't [re]start???

Cheers - Al.

[Apr 4, 2024 5:41:55 PM]

TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:


Re: 2024-03-11 Update (Website errors and MCM1 work unit availability)

The issue is not only that valid WUs are sitting for a week or longer, but that I noticed a sharp increase (about 4x by now) of those WUs since last week. And with all those folks complaining that they don't get any WUs, there shouldn't be much of a problem with WUs sitting for a week or more, unless we would really be dealing with a large number of people hogging WUs in multi-day caches, for no logical, only selfish reasons. And I tried to explain in the past why this habit is overall just counter-productive.

Beside that, the dreaded 503 errors are back quite frequently, which as experience for more than a year now shows, are the harbinger of bad news if not being dealt with in a timely fashion... sad

Ralf

[Apr 4, 2024 10:17:36 PM]

[ ]