| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 6
|
|
| Author |
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
We just experienced a 6 hour outage on the website and BOINC grid. It appears that we are up again but we are still investigating what happened.
|
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
We are having issues with file uploads. We will provide more info when we have it.
|
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
A quick update:
We have been using GPFS - a clustered file system to mount SAN storage across several servers. The file systems that store workunit data for downloads and that store the results when they are uploaded are on GPFS. For reasons that we are still trying to diagnose, the GPFS software caused a kernel panic on the webservers which locked them up (and they had to be physically rebooted) and caused the outage. We are still using GPFS on some of the backend servers, but the web servers are now accessing the data via NFS. We are continuing to investigate what happened with GPFS but full connectivity to members has been restored. The BOINC clients should be able to upload and download work easily at this time. They will automatically attempt to reconnect once the back-off timers expire so no action is required by the members. We appreciate your patience. |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
We continue to suffer problems. We are working with various support personal to identify and resolve the ongoing outage. Access for the website and the BOINC grid agents will be intermittent while we continue to work through the issues.
|
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
We were able to locate a spot on the GPFS filesystem that had been corrupted. We have corrected the problem (a few files were lost in the process but they were workunit input files so some folks will get errors downloading workunits, but no results were lost). The BOINC grid and website have been up and stable for the past hour. We continue to watch and monitor the behavior of the system.
|
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
One final update to this thread.
The system has been stable now for over 36 hours. There is a small number of results (less than 1% of the results returned on typical day) that were lost due to the outage. We are disappointed that this happened, but we will be able to send that work out again and get the needed data back to the researchers. We appreciate the members patience with us while this issue was resolved. And in better news - please enjoy the new project! |
||
|
|
|