Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Official Messages Forum: News Thread: Planned Maintenance on Tuesday, November 2 (Completed) |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 9
|
Author |
|
WCGAdmin
World Community Grid Admin Joined: Jun 9, 2020 Post Count: 168 Status: Offline |
We are replacing two failed disk drives and performing some database maintenance activities.
----------------------------------------https://www.worldcommunitygrid.org/about_us/viewNewsArticle.do?articleId=744 [Edit 1 times, last edit by caitilarkin at Nov 2, 2021 5:33:20 PM] |
||
|
F4UCorsair
Cruncher Joined: Feb 3, 2009 Post Count: 7 Status: Offline Project Badges: |
You're waiting 5 days to replace failed drives? So you're telling us that your servers and/or storage systems don't support hot swapping?
|
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: |
The drives are part of shared nothing filesystem (IBM GPFS FPO). We have 55 drives in all and only 2 of them have failed. Many more would need to fail before I became concerned about a loss of data.
----------------------------------------As for why we cannot perform a hot swap, I have been inquiring about that. The servers support it, but when I opened the ticket to get the failed disks replaced I was told that it wasn't possible. So I am trying to get an answer as to what is going on, but in the meantime we are getting the disks replaced while we do the database change since we would be down during that time anyway. [Edit 1 times, last edit by knreed at Oct 28, 2021 7:59:15 PM] |
||
|
BobbyB
Veteran Cruncher Canada Joined: Apr 25, 2020 Post Count: 602 Status: Offline Project Badges: |
After being down for 6 hours, I imagine that when the system comes up again it will get hammered by all those Boinc clients as they start their up/downloading. Will it handle all that?
Guess we should make sure our machines "load up" before the outage. |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 858 Status: Offline Project Badges: |
will the forum stay up during the maintenance? I'm assuming not since it says the website will be down
|
||
|
caitilarkin
Former World Community Grid Admin USA Joined: Nov 4, 2015 Post Count: 331 Status: Offline Project Badges: |
will the forum stay up during the maintenance? I'm assuming not since it says the website will be down The forum is generally down for all or part of the maintenance window. |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: |
This work has been completed and we are catching up on the backlog.
As for BobbyB's question. the peak load once we started up hit our configured limits for about 7-8 minutes (i.e. we hit the limit of 300 concurrent uploads) and during that time we were receiving about 1.1 Gbps of data during that time. The load is now rapidly dropping back to normal levels and users should see no issues uploading or downloading work* * The one exception is that we are currently processing all of the results returned which means that we are going to generate 6 hours worth of resends in 20-40 minutes. As a result over the 30-40 mintues there will be times where users have trouble getting assigned work during a scheduler request because we might be clogged with work that needs a reliable host on a given platform. This will clear quickly and automatically and we should be fully back to normal soon. |
||
|
BobbyB
Veteran Cruncher Canada Joined: Apr 25, 2020 Post Count: 602 Status: Offline Project Badges: |
Out of curiosity, at what time did the system come back up?
My first upload was at: 2021-11-02 12:49:40 | World Community Grid | Started upload of MCM1_0183919_4068_0_r1718798455_0 I am at UTC-4 |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: |
It came back up about 16:45 UTC. So you uploaded about 4 minutes after it became available again.
----------------------------------------[Edit 1 times, last edit by knreed at Nov 2, 2021 10:19:36 PM] |
||
|
|