Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 352
Posts: 352   Pages: 36   [ Previous Page | 24 25 26 27 28 29 30 31 32 33 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 30280 times and has 351 replies Next Thread
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2496
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Still no purging of old validated tasks. So, I'm not surprised. The DB just became unavailable. "Server can't open database." And the results page is just spinning....

5155 World Community Grid 2025-08-17 22:06:32 Sending scheduler request: Requested by user.
5156 World Community Grid 2025-08-17 22:06:32 Reporting 1 completed tasks
5157 World Community Grid 2025-08-17 22:06:32 Requesting new tasks for CPU
5158 World Community Grid 2025-08-17 22:06:37 Scheduler request completed: got 0 new tasks
5159 World Community Grid 2025-08-17 22:06:37 Server can't open database
----------------------------------------
[Edit 3 times, last edit by Grumpy Swede at Aug 17, 2025 8:17:42 PM]
[Aug 17, 2025 8:12:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
pramo
Veteran Cruncher
USA
Joined: Dec 14, 2005
Post Count: 715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

beat me to it....
8/17/2025 4:41:50 PM | World Community Grid | Server can't open database
----------------------------------------

[Aug 17, 2025 8:52:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I find Grumpy Swede has beaten me to the report :-)

I can get a fairly accurate fix on when things went away: one of my systems successfully got 1 task at 19:57:13, but a request at 19:59:13 got "Scheduler request to https://scheduler.worldcommunitygrid.org/boinc/wcg_cgi/fcgi failed: HTTP service unavailable" and a request at 20:00:31 got "Server can't open database" and "Project requested delay of 3600 seconds"...

This might be part of them moving to nodes in the new data centre, or then again it might not :-( -- time will tell,,,

Cheers - Al.
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Aug 17, 2025 8:58:55 PM]
[Aug 17, 2025 8:55:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1294
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

We are seeing a variety of error messages...
I'm getting Sever error : feeder not running. right now.

Since the error message is changing, I'll guess they are working on it.

I hope everyone has a good cache to tide them over.

edit to add: Thank you to all who rush to post the latest information here.
----------------------------------------
[Edit 1 times, last edit by Unixchick at Aug 17, 2025 10:21:21 PM]
[Aug 17, 2025 10:19:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Still no purging of old validated tasks. So, I'm not surprised. The DB just became unavailable. "Server can't open database." And the results page is just spinning....
]
I posted about the WUs piling up again at least twice late last week, just before the announcement of the new forum came out.
Hence my comment of "don't fix what isn't broken, fix what is"...


Ralf
[Aug 17, 2025 11:29:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Unixchick,

I hope you are right about them working on it, but when there were "can't open database" errors in the past the typical pattern was just over 2 hours of that message followed by the "feeder not running" message for some arbitrary length of time (presumably dependent on when they started on the fix!). Unfortunately, that tends to suggest it's just the system eventually giving up on the database :-(

I wonder if this is in any way related to the Graham->Nibi switch? (Given the lack of communication, I suspect not, but...)

If this isn't because of work on the data centre switch, I hope it isn't a repeat of the 2025-06-28 outage (where the database went away at about 14:30 UTC and the system didn't come back properly until about 30 hours later...) And unfortunately, if the database isn't accessible, there's no benefit in the delays to reporting results, so it won't help clear out the assimilation backlog.

Ralf,

Several of us have posted about the assimilation backlog, hoping to get some information on what is going on. And I have no idea whether this database issue is a result of that!

I agree with the "fix what is" comment, by the way.

Al.

[Edited to include response to Ralf...]
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Aug 17, 2025 11:57:04 PM]
[Aug 17, 2025 11:49:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I also noticed on my OVERVIEW page that it was:
"Last updated: Aug 17, 2025 - 12:06 UTC"

and it is now: Aug 18, 2025 - 01:43 UTC
[Aug 18, 2025 1:43:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
dylanht
World Community Grid Tech
Joined: Jul 1, 2021
Post Count: 35
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Database node went down. Unfortunately, I'm not able to bring the VM back up myself this time. I have reached out to hosting about it, they have not yet replied so likely tomorrow morning we're back up after crash recovery.

With the migration to Nibi,
a) optimistically, no more expiring DHCP lease issue which was the reason we were given for this behaviour where nodes occasionally just lose connectivity on all interfaces until reboot
b) sharding, replication, so that we no longer have SPOF database node
[Aug 18, 2025 2:15:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

It's pretty crazy that anything mission critical relies on DHCP configurations to begin with instead of static network configs.

Thanks for the update on a Sunday, Dylan. All good. I know the feel. Things outside your control, relying on other organizations/teams/etc.

Have a good day.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[Aug 18, 2025 3:47:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1294
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Thank you for the update Dylan.
[Aug 18, 2025 4:57:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 352   Pages: 36   [ Previous Page | 24 25 26 27 28 29 30 31 32 33 | Next Page ]
[ Jump to Last Post ]
Post new Thread