Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 6
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3570 times and has 5 replies Next Thread
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Long Outage

We just experienced a 6 hour outage on the website and BOINC grid. It appears that we are up again but we are still investigating what happened.
[May 9, 2008 2:19:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long Outage

We are having issues with file uploads. We will provide more info when we have it.
[May 9, 2008 5:51:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long Outage

A quick update:

We have been using GPFS - a clustered file system to mount SAN storage across several servers. The file systems that store workunit data for downloads and that store the results when they are uploaded are on GPFS. For reasons that we are still trying to diagnose, the GPFS software caused a kernel panic on the webservers which locked them up (and they had to be physically rebooted) and caused the outage.

We are still using GPFS on some of the backend servers, but the web servers are now accessing the data via NFS. We are continuing to investigate what happened with GPFS but full connectivity to members has been restored.

The BOINC clients should be able to upload and download work easily at this time. They will automatically attempt to reconnect once the back-off timers expire so no action is required by the members.

We appreciate your patience.
[May 9, 2008 10:03:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long Outage

We continue to suffer problems. We are working with various support personal to identify and resolve the ongoing outage. Access for the website and the BOINC grid agents will be intermittent while we continue to work through the issues.
[May 10, 2008 6:19:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long Outage

We were able to locate a spot on the GPFS filesystem that had been corrupted. We have corrected the problem (a few files were lost in the process but they were workunit input files so some folks will get errors downloading workunits, but no results were lost). The BOINC grid and website have been up and stable for the past hour. We continue to watch and monitor the behavior of the system.
[May 11, 2008 12:51:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long Outage

One final update to this thread.

The system has been stable now for over 36 hours. There is a small number of results (less than 1% of the results returned on typical day) that were lost due to the outage. We are disappointed that this happened, but we will be able to send that work out again and get the needed data back to the researchers.

We appreciate the members patience with us while this issue was resolved.

And in better news - please enjoy the new project!
[May 12, 2008 6:28:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread