Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 9
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1715 times and has 8 replies Next Thread
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Short server outages today April 14, 2015

We are updating the firmware on our database servers. This requires us to move manually migrate the servers. During this time the website will be unavailable for about 5 minutes each time. I meant to post this earlier but did not. We still have 2 more 5 minute outages coming.

Thanks,
-Uplinger
[Apr 14, 2015 5:38:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 3010
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Short server outages today April 14, 2015

Keith, is this the reason why I'm seeing a lot of jobs in PV status - despite they being ZR copies?
----------------------------------------

[Apr 14, 2015 6:38:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Short server outages today April 14, 2015

We have turned off the validators on the backend so we can quickly stop and move the database. The results will still be validated when we turn the backend processes back on. To make the outages as quick as possible, only the feeder is running at the moment to continually supply results to the members.

Thanks,
-Uplinger
[Apr 14, 2015 7:03:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 3010
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Short server outages today April 14, 2015

Okay, thanks for confirming that.

I have no reason at all to doubt that they'll all get validated at some point in the not too distant future - although I'm sure that there'll be some out there who may have raised this as an issue.
----------------------------------------

[Apr 14, 2015 7:07:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Short server outages today April 14, 2015

Greetings everyone,

It appears we have encountered some major issues with migrating the databases after the firmware updates. We have reverted everything back for the time being while we diagnose why we encountered the issues.

Thanks,
-Uplinger
[Apr 15, 2015 12:08:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Short server outages today April 14, 2015

Would this also have been the reason why I have been seeing a bunch of SSL errors both in the BOINC clients as well when trying to access the web site/forum? confused

Ralf
[Apr 15, 2015 12:15:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Short server outages today April 14, 2015

When these updates were performed and the database was migrated it did cause the site to become inaccessible for some time during this. As for SSL error, I do not believe you should have seen this unless it was associated with the server being down. I did not force any of my agents to connect during this time so I did not see log.

Also, before anyone else posts, yes the stats did run, and yes they are proper :) . The validators were disabled during the day today to help speed up the outages. They are catching up and we should see a good boost at mid day stats in about 11 hours from now.

Thanks,
-Uplinger
----------------------------------------
[Edit 1 times, last edit by uplinger at Apr 15, 2015 1:28:34 AM]
[Apr 15, 2015 1:26:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Short server outages today April 14, 2015

When these updates were performed and the database was migrated it did cause the site to become inaccessible for some time during this. As for SSL error, I do not believe you should have seen this unless it was associated with the server being down. I did not force any of my agents to connect during this time so I did not see log.
First I thought it was a local problem when I manually updated the project on a local client. Then when trying to connect to the web site to check if there is anything going on and only got "connection reset" when trying to connect to workcommuntygrid.org.
Connected remotely to two other client's sites and got the same errors. Then after an hour or so, I just tried to get back on the forum and got through, posting my previous reply. Since then I see a lot of my hosts uploading and receiving WUs, but all uploaded WU's seem to be stuck in PVa jail since...

Ralf
[Apr 15, 2015 1:52:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Short server outages today April 14, 2015

Currently we are about 30 minutes behind on validation, I would guess the remaining results will be caught up by that time. The new database servers installed a few months ago are quite a bit faster as a delay of validation by 12 hours would have taken more than 3 hours to catch up. A good example of the upgrade, but not a true stress test of the system :). The Connection Reset was what I did see on FireFox and that was normal. We were hopeful that the outage to the website was going to be in the range of three 5 minute outages, but during a migration we encountered an issue and had to take more time in collecting data to help determine the root cause.

Thanks for the feedback and for your patience,
-Uplinger
[Apr 15, 2015 3:33:44 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread