| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 352
|
|
| Author |
|
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2496 Status: Offline Project Badges:
|
Still no purging of old validated tasks. So, I'm not surprised. The DB just became unavailable. "Server can't open database." And the results page is just spinning....
----------------------------------------5155 World Community Grid 2025-08-17 22:06:32 Sending scheduler request: Requested by user. 5156 World Community Grid 2025-08-17 22:06:32 Reporting 1 completed tasks 5157 World Community Grid 2025-08-17 22:06:32 Requesting new tasks for CPU 5158 World Community Grid 2025-08-17 22:06:37 Scheduler request completed: got 0 new tasks 5159 World Community Grid 2025-08-17 22:06:37 Server can't open database [Edit 3 times, last edit by Grumpy Swede at Aug 17, 2025 8:17:42 PM] |
||
|
|
pramo
Veteran Cruncher USA Joined: Dec 14, 2005 Post Count: 715 Status: Offline Project Badges:
|
beat me to it....
----------------------------------------8/17/2025 4:41:50 PM | World Community Grid | Server can't open database ![]() |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
I find Grumpy Swede has beaten me to the report :-)
----------------------------------------I can get a fairly accurate fix on when things went away: one of my systems successfully got 1 task at 19:57:13, but a request at 19:59:13 got "Scheduler request to https://scheduler.worldcommunitygrid.org/boinc/wcg_cgi/fcgi failed: HTTP service unavailable" and a request at 20:00:31 got "Server can't open database" and "Project requested delay of 3600 seconds"... This might be part of them moving to nodes in the new data centre, or then again it might not :-( -- time will tell,,, Cheers - Al. [Edit 1 times, last edit by alanb1951 at Aug 17, 2025 8:58:55 PM] |
||
|
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1294 Status: Recently Active Project Badges:
|
We are seeing a variety of error messages...
----------------------------------------I'm getting Sever error : feeder not running. right now. Since the error message is changing, I'll guess they are working on it. I hope everyone has a good cache to tide them over. edit to add: Thank you to all who rush to post the latest information here. [Edit 1 times, last edit by Unixchick at Aug 17, 2025 10:21:21 PM] |
||
|
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 2173 Status: Offline Project Badges:
|
Still no purging of old validated tasks. So, I'm not surprised. The DB just became unavailable. "Server can't open database." And the results page is just spinning.... I posted about the WUs piling up again at least twice late last week, just before the announcement of the new forum came out.] Hence my comment of "don't fix what isn't broken, fix what is"... Ralf |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
Unixchick,
----------------------------------------I hope you are right about them working on it, but when there were "can't open database" errors in the past the typical pattern was just over 2 hours of that message followed by the "feeder not running" message for some arbitrary length of time (presumably dependent on when they started on the fix!). Unfortunately, that tends to suggest it's just the system eventually giving up on the database :-( I wonder if this is in any way related to the Graham->Nibi switch? (Given the lack of communication, I suspect not, but...) If this isn't because of work on the data centre switch, I hope it isn't a repeat of the 2025-06-28 outage (where the database went away at about 14:30 UTC and the system didn't come back properly until about 30 hours later...) And unfortunately, if the database isn't accessible, there's no benefit in the delays to reporting results, so it won't help clear out the assimilation backlog. Ralf, Several of us have posted about the assimilation backlog, hoping to get some information on what is going on. And I have no idea whether this database issue is a result of that! I agree with the "fix what is" comment, by the way. Al. [Edited to include response to Ralf...] [Edit 1 times, last edit by alanb1951 at Aug 17, 2025 11:57:04 PM] |
||
|
|
bfmorse
Senior Cruncher US Joined: Jul 26, 2009 Post Count: 442 Status: Offline Project Badges:
|
I also noticed on my OVERVIEW page that it was:
"Last updated: Aug 17, 2025 - 12:06 UTC" and it is now: Aug 18, 2025 - 01:43 UTC |
||
|
|
dylanht
World Community Grid Tech Joined: Jul 1, 2021 Post Count: 35 Status: Offline Project Badges:
|
Database node went down. Unfortunately, I'm not able to bring the VM back up myself this time. I have reached out to hosting about it, they have not yet replied so likely tomorrow morning we're back up after crash recovery.
With the migration to Nibi, a) optimistically, no more expiring DHCP lease issue which was the reason we were given for this behaviour where nodes occasionally just lose connectivity on all interfaces until reboot b) sharding, replication, so that we no longer have SPOF database node |
||
|
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 865 Status: Offline Project Badges:
|
It's pretty crazy that anything mission critical relies on DHCP configurations to begin with instead of static network configs.
----------------------------------------Thanks for the update on a Sunday, Dylan. All good. I know the feel. Things outside your control, relying on other organizations/teams/etc. Have a good day.
|
||
|
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1294 Status: Recently Active Project Badges:
|
Thank you for the update Dylan.
|
||
|
|
|