| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 10
|
|
| Author |
|
|
TigerLily
Senior Cruncher Joined: May 26, 2023 Post Count: 280 Status: Offline Project Badges:
|
We are working to resolve the frequent 503 service unavailable errors that users are encountering when the load balancer is unable to connect to the production webserver by adding additional instances of the webserver to balance the load between, and an additional load balancer as well. We expect this work to be completed this week.
Regarding the lack of available MCM1 work units, we noticed that the createWork process was not loading new MCM1 work units last week. We were able to begin building batches of work units again on Friday evening last week (March 8). However, the throughput is still poor and we will be pushing additional changes to prod to reduce the batch size (in terms of enumerated work units) of SQL queries hitting the BOINC database from various BOINC components, reduce the LIMIT value in some queries and adjust the parameters of the database and database clients on the relevant servers to allow for larger MySQL packets containing these remote queries. We will communicate whether the issue was resolved as we expect, which in this case will be clearly reflected in the logs on the BOINC database server. As soon as we have made these changes and confirmed the result, we will update everyone here. Thank you for your support, and for your patience as we work to resolve these issues. WCG Team |
||
|
|
Barnsley_Tatts
Senior Cruncher Joined: Nov 3, 2005 Post Count: 291 Status: Offline Project Badges:
|
MCM WUs seem to be flowing quite well at the moment thankfully.
----------------------------------------Long may it continue! :) ![]() |
||
|
|
as1981
Advanced Cruncher Joined: Dec 3, 2006 Post Count: 51 Status: Offline Project Badges:
|
@TigerLily Thank you for the update
I'm also able to download work units at the moment. |
||
|
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1296 Status: Offline Project Badges:
|
Thank you for the detailed update TigerLily. Thanks to the tech team for the fixes.
|
||
|
|
Blount
Veteran Cruncher Joined: Aug 19, 2005 Post Count: 590 Status: Offline Project Badges:
|
TigerLily, I am seeing a small trickle of MCM WUs. Any idea when we might see a ramp of of WUs?
|
||
|
|
mike0164
Cruncher Joined: Apr 28, 2007 Post Count: 9 Status: Offline Project Badges:
|
TigerLily, I am seeing a small trickle of MCM WUs. Any idea when we might see a ramp of of WUs? TigerLily, Any more news on the MCM WU front? It has been over a week since your last message here. As others have seen, I'm only getting a trickle of WUs per day at this point. Per the other thread, I do know you said the tech team is working the issue. Would/could you give us more details? Thank you. Mike ![]() [Edit 1 times, last edit by mike0164 at Mar 21, 2024 2:44:36 AM] |
||
|
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 2173 Status: Offline Project Badges:
|
Well,...
OPNG WUs reduced to a trickle again, though MCM1 seems to be sending out adequate amounts of WUs, even when it sometimes looks a bit delayed, but no host is really running empty... However, looking at my stats, I started to notice that the amount of valid MCM1 WUs on the Results page seems to increase again, instead of the usual (well, in good times ) 3 days retention before they got purged, they are going back to at least a week, with the oldest valid WU returned 3//23.I think it would prudent to check if the problem that caused those WUs to accumulate for 4 months isn't rearing it's ugly head again and would be foreshadowing more problems to come. thanks, Ralf |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
As far as I can tell from my returned results analysis, the assimilator still seems to be performing as it should but it looks as if d/b purge hasn't been doing anything since about 12:20 UTC on 2024-04-03.
Ralf - regarding long waits for purging: there seem to be two 24r hour lags between validation and purging[*1] -- my apologies if you already knew that :-) If something I return has to wait for a retry because of a wingman missing the deadline it might well take 8 or 9 days (or more under certain uncommon circumstances) for a task to disappear! I only start worrying if an MCM1 task sits at Valid state for more than two days after the last returned result time :-) Cheers - Al [*1] One of those 24 hour intervals is between flagging for purge and the actual purge; the other is between assimilation and file deletion. |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
Further to my post immediately above: database puirge appears to have restarted 24 hours after it stopped, and it took about 2 hours to get back to the normal "gone 24 hours after flagging for purge" routine...
Was it a scheduled [but unannounced] outage, or is there usually a restart each day at that time and yesterday it just didn't [re]start??? Cheers - Al. |
||
|
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 2173 Status: Offline Project Badges:
|
As far as I can tell from my returned results analysis, the assimilator still seems to be performing as it should but it looks as if d/b purge hasn't been doing anything since about 12:20 UTC on 2024-04-03. The issue is not only that valid WUs are sitting for a week or longer, but that I noticed a sharp increase (about 4x by now) of those WUs since last week. And with all those folks complaining that they don't get any WUs, there shouldn't be much of a problem with WUs sitting for a week or more, unless we would really be dealing with a large number of people hogging WUs in multi-day caches, for no logical, only selfish reasons. And I tried to explain in the past why this habit is overall just counter-productive.Ralf - regarding long waits for purging: there seem to be two 24r hour lags between validation and purging[*1] -- my apologies if you already knew that :-) If something I return has to wait for a retry because of a wingman missing the deadline it might well take 8 or 9 days (or more under certain uncommon circumstances) for a task to disappear! I only start worrying if an MCM1 task sits at Valid state for more than two days after the last returned result time :-) Cheers - Al [*1] One of those 24 hour intervals is between flagging for purge and the actual purge; the other is between assimilation and file deletion. Beside that, the dreaded 503 errors are back quite frequently, which as experience for more than a year now shows, are the harbinger of bad news if not being dealt with in a timely fashion... Ralf |
||
|
|
|