| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 26
|
|
| Author |
|
|
neebong
Cruncher Joined: Mar 1, 2006 Post Count: 1 Status: Offline Project Badges:
|
Does anyone have any idea when the server is coming back up again?
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Does anyone have any idea when the server is coming back up again? Your answer is at: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=10703 In one of the last, in the thread is: "The worst case scenario is that the BOINC servers will be down until Tuesday, December 26, 2006." Had a nice holiday |
||
|
|
keithhenry
Ace Cruncher Senile old farts of the world ....uh.....uh..... nevermind Joined: Nov 18, 2004 Post Count: 18667 Status: Offline Project Badges:
|
Well, as with anything serious that occurs with this project, we always seem to have multiple threads going on the matter. Such is life. I joined this project two days after it started. In all that time, the WCG team has set it up, gotten it going, kept it going and improved it all while boarding new projects and dealing with the gremlins, grimjits and snipes that have found their way into our realm. Finally, after two long years, they seem to finally have a pretty serious outage, if only in duration and smack dab in the middle of the holiday season no less. Figures right? I mean, who hasn't tried to take a nice vacation from work only to have your longingly awaited plans scuttled by a nice good old snafu back at the office that you just can't avoid getting pulled into? From what I've been able to pick up on over the past two years, I'd guess that we're looking at probably 8-10, 12 at the outside, folks that we collectively label the "WCG admins". Just a handful of folks run this whole shebang. Makes sense if you think about it. Non-profits are notorious for operating on a shoestring. I expect that WCG is no different. IBM may provide the facilities, equipment and such but the last thing it is for them is a revenue source.
----------------------------------------So I for one would like to take a moment to say thanks to our brave little band of warriors! Thanks for two plus great years of a job well done. No, it's not been perfect but I think they've done quite a job with what they have to work with. As for the current situation, I'd say consider modifying your BOINC profile, when that's possible, to set the "Connect to server every X days" to have a value of 2.0. If things get fixed, you will get enough work to last through two days if this current bugger proves to be a bit stubborn and it comes back for a third visit. Once things look like it's back to normal, you can reset this back to what you normally use. If you don't run BOINC 24x7, you may want to adjust the 2.0 value accordingly. Let BOINC continue to crunch on your machines until it's out of work instead of aborting WUs. Once this is behind us, I would not be surprised to see the admins make some temporary tweak/adjustment so that our work doesn't show as being returned too late and being a waste. If you run out of work for BOINC, then you can reinstall/reactivate UD and run it in the meantime. When BOINC does get back on its legs, let UD finish it's current WU before you shut it down and switch back to BOINC. Please try to check here in the forums when you can as when BOINC does get running again, you'll want to make sure all of your completed WU's that are Ready To Report get returned. Let's try to keep any lost crunching to a minimum (not to mention delaying quorums any more than is necessary). Most importantly, hang tough. I've chosen to crunch 100% for WCG for my own reasons. It's a good crew all in all. I've certainly read stories about other projects that seem to have ongoing problems so things could certainly be a lot worse for us here than it may be at the moment. What isn't and hasn't changed are the reasons each of us came here and started crunching for in the first place. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Finally, after two long years, they seem to finally have a pretty serious outage, if only in duration and smack dab in the middle of the holiday season no less. Figures right? I mean, who hasn't tried to take a nice vacation from work only to have your longingly awaited plans scuttled by a nice good old snafu back at the office that you just can't avoid getting pulled into? From what I've been able to pick up on over the past two years, I'd guess that we're looking at probably 8-10, 12 at the outside, folks that we collectively label the "WCG admins". Just a handful of folks run this whole shebang. Makes sense if you think about it. Non-profits are notorious for operating on a shoestring. I expect that WCG is no different. IBM may provide the facilities, equipment and such but the last thing it is for them is a revenue source. So I for one would like to take a moment to say thanks to our brave little band of warriors! Thanks for two plus great years of a job well done. No, it's not been perfect but I think they've done quite a job with what they have to work with. I second that, We have all on ooasion tweaked nelsonc and other admins, but push comes to shove we have a d*** (self edited to save nelsonc the effort) fine bunch of Admins and Techs. Of course Murphy's Law says that an outage always occurs on a Friday at 5 pm or at the start of a long weekend. Harware failure is one of those things that happens when you least expect it and is beyong the control of the Admins and Techs. And judging from the response time it was a D*** fast. ( Tongue in cheek mode on) When this is all over... tell us it wasnt an IBM-Hitachi DeathStar err Speedstor that died. ( Toungue in cheek mode off) A WELL DONE to all the Staff at WCG and at the risk of offending someone... A Merry Christmas to You All and I hope this doen't ruin any Christmas plans. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I fully agree with Harvey and Keith. The folks at WCG have been first rate at keeping the work going and responding quickly and positively to any problems that have arisen.
We are fortunate that WCG has, in effect, two separate projects in the BOINC and UD Agents so, if one goes down as has happened, the other serves as a fallback and work can continue. BOINC fans may not like the UD agent because the points payoff isn't as big, but the outage is only temporary and running something is better than nothing. Both agents achieve the same goal, producing results that matter to the research projects we volunteered to work on. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
i agree with the above posts.
in fact i am surprised that anything at all might be done this snowy holiday week! even if all the data is lost, i think we all know that 'stuff' happens! just let us know, that's all we ask. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi halfcard,
----------------------------------------According to http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=10703 no data has been lost. But it has to be loaded onto the new hard drive from the tape backup system, which may take time. I suspect that the hosting facility is running on a skeleton crew this snowy Christmas holiday season. "In the Colorado Rockies Where the snow is deep and cold And a man afoot can starve to death Unless he's brave and bold. . ." Lawrence [Edit 1 times, last edit by Former Member at Dec 23, 2006 7:36:31 AM] |
||
|
|
Dotsch
Advanced Cruncher Joined: Feb 12, 2006 Post Count: 100 Status: Offline Project Badges:
|
We are fortunate that WCG has, in effect, two separate projects in the BOINC and UD Agents so, if one goes down as has happened, the other serves as a fallback and work can continue. BOINC fans may not like the UD agent because the points payoff isn't as big, but the outage is only temporary and running something is better than nothing. Both agents achieve the same goal, producing results that matter to the research projects we volunteered to work on. It is also quite easy to setup a other BOINC project as backup project : http://boinc-wiki.ath.cx/index.php?title=BOINC_Powered_Backup_Project The backup project setup, set a low resource share for the backup project, so the BOINC client downloads work from the backup project only if the primary project is down. A list of the different BOINC projects : http://boinc-wiki.ath.cx/index.php?title=Choosing_a_BOINC_Powered_Project Also, I recommend every body to setup a backup project or attach at some BOINC projects, if it is wanted to keep the systems crunching, if project will be down. A higer WU cache will also help to keep the systems crunching, if the project is down. The default is 0.1 days. But set the WU cache to 2 or 3 days is a good value. |
||
|
|
merko
Cruncher Joined: Jan 29, 2006 Post Count: 3 Status: Offline |
Hi all: I have been trying to upload several wu's over the past few days - with no luck. paste from boinc manager follows:
------------------------------------------ 12/23/2006 4:30:56 AM||Resuming network activity 12/23/2006 4:30:56 AM|World Community Grid|[file_xfer] Started upload of file faah1097_d098n817_x1MEU_00_2_0 12/23/2006 4:30:56 AM|World Community Grid|[file_xfer] Started upload of file faah1097_d098n817_x1MEU_00_2_1 12/23/2006 4:30:56 AM|Project Neuron|Fetching scheduler list 12/23/2006 4:31:04 AM|World Community Grid|[file_xfer] Temporarily failed upload of faah1097_d098n817_x1MEU_00_2_0: file not found 12/23/2006 4:31:04 AM|World Community Grid|Backing off 51 minutes and 28 seconds on upload of file faah1097_d098n817_x1MEU_00_2_0 12/23/2006 4:31:04 AM|World Community Grid|[file_xfer] Started upload of file faah1097_d099n184_x1MEU_00_1_0 12/23/2006 4:31:05 AM|World Community Grid|[file_xfer] Temporarily failed upload of faah1097_d098n817_x1MEU_00_2_1: file not found 12/23/2006 4:31:05 AM|World Community Grid|Backing off 3 hours, 29 minutes and 47 seconds on upload of file faah1097_d098n817_x1MEU_00_2_1 12/23/2006 4:31:05 AM|World Community Grid|[file_xfer] Started upload of file faah1097_d099n184_x1MEU_00_1_1 12/23/2006 4:31:08 AM|World Community Grid|[file_xfer] Temporarily failed upload of faah1097_d099n184_x1MEU_00_1_0: file not found 12/23/2006 4:31:08 AM|World Community Grid|Backing off 52 minutes and 54 seconds on upload of file faah1097_d099n184_x1MEU_00_1_0 12/23/2006 4:31:08 AM|World Community Grid|[file_xfer] Temporarily failed upload of faah1097_d099n184_x1MEU_00_1_1: file not found 12/23/2006 4:31:08 AM|World Community Grid|Backing off 19 minutes and 58 seconds on upload of file faah1097_d099n184_x1MEU_00_1_1 12/23/2006 4:31:18 AM||Project communication failed: attempting access to reference site 12/23/2006 4:31:21 AM||Access to reference site succeeded - project servers may be temporarily down. 12/23/2006 4:31:23 AM|Project Neuron|Scheduler list fetch failed: system connect 12/23/2006 4:31:23 AM|Project Neuron|6 consecutive failures fetching scheduler list - deferring 86400 seconds 12/23/2006 4:31:23 AM|Project Neuron|Deferring scheduler requests for 1 days, 0 hours, 0 minutes and 0 seconds 12/23/2006 4:31:24 AM|World Community Grid|[file_xfer] Started upload of file faah1097_d098n817_x1MEU_00_2_0 12/23/2006 4:31:25 AM|World Community Grid|[file_xfer] Started upload of file faah1097_d098n817_x1MEU_00_2_1 12/23/2006 4:31:27 AM|World Community Grid|[file_xfer] Temporarily failed upload of faah1097_d098n817_x1MEU_00_2_0: file not found 12/23/2006 4:31:27 AM|World Community Grid|Backing off 1 hours, 10 minutes and 12 seconds on upload of file faah1097_d098n817_x1MEU_00_2_0 12/23/2006 4:31:28 AM|World Community Grid|[file_xfer] Temporarily failed upload of faah1097_d098n817_x1MEU_00_2_1: file not found 12/23/2006 4:31:28 AM|World Community Grid|Backing off 2 hours, 0 minutes and 26 seconds on upload of file faah1097_d098n817_x1MEU_00_2_1 12/23/2006 4:31:28 AM|World Community Grid|[file_xfer] Started upload of file faah1097_d099n184_x1MEU_00_1_0 12/23/2006 4:31:30 AM|NanoHive@Home|Sending scheduler request: Requested by user 12/23/2006 4:31:30 AM|NanoHive@Home|(not requesting new work or reporting completed tasks) 12/23/2006 4:31:32 AM|World Community Grid|[file_xfer] Temporarily failed upload of faah1097_d099n184_x1MEU_00_1_0: file not found 12/23/2006 4:31:32 AM|World Community Grid|Backing off 2 hours, 39 minutes and 8 seconds on upload of file faah1097_d099n184_x1MEU_00_1_0 12/23/2006 4:31:32 AM|World Community Grid|[file_xfer] Started upload of file faah1097_d099n184_x1MEU_00_1_1 12/23/2006 4:31:34 AM|World Community Grid|[file_xfer] Started upload of file faah1097_d098n817_x1MEU_00_2_0 12/23/2006 4:31:34 AM|World Community Grid|[file_xfer] Temporarily failed upload of faah1097_d099n184_x1MEU_00_1_1: file not found 12/23/2006 4:31:34 AM|World Community Grid|Backing off 8 minutes and 44 seconds on upload of file faah1097_d099n184_x1MEU_00_1_1 12/23/2006 4:31:34 AM|NanoHive@Home|Scheduler RPC succeeded 12/23/2006 4:31:34 AM|NanoHive@Home|Message from server: Project is temporarily shut down for maintenance 12/23/2006 4:31:34 AM|NanoHive@Home|Deferring scheduler requests for 1 hours, 0 minutes and 0 seconds 12/23/2006 4:31:34 AM|NanoHive@Home|Project is down 12/23/2006 4:31:37 AM|World Community Grid|[file_xfer] Temporarily failed upload of faah1097_d098n817_x1MEU_00_2_0: file not found 12/23/2006 4:31:37 AM|World Community Grid|Backing off 53 minutes and 13 seconds on upload of file faah1097_d098n817_x1MEU_00_2_0 12/23/2006 4:31:38 AM|World Community Grid|[file_xfer] Started upload of file faah1097_d098n817_x1MEU_00_2_1 12/23/2006 4:31:39 AM|World Community Grid|[file_xfer] Started upload of file faah1097_d099n184_x1MEU_00_1_0 12/23/2006 4:31:41 AM|World Community Grid|[file_xfer] Temporarily failed upload of faah1097_d098n817_x1MEU_00_2_1: file not found 12/23/2006 4:31:41 AM|World Community Grid|Backing off 3 hours, 45 minutes and 8 seconds on upload of file faah1097_d098n817_x1MEU_00_2_1 12/23/2006 4:31:41 AM|World Community Grid|[file_xfer] Temporarily failed upload of faah1097_d099n184_x1MEU_00_1_0: file not found 12/23/2006 4:31:41 AM|World Community Grid|Backing off 3 hours, 45 minutes and 48 seconds on upload of file faah1097_d099n184_x1MEU_00_1_0 12/23/2006 4:31:55 AM|World Community Grid|Sending scheduler request: Requested by user 12/23/2006 4:31:55 AM|World Community Grid|Reporting 2 tasks 12/23/2006 4:32:00 AM|World Community Grid|Scheduler request failed: HTTP file not found 12/23/2006 4:32:00 AM|World Community Grid|Deferring scheduler requests for 16 minutes and 27 seconds 12/23/2006 4:32:06 AM|Spinhenge@home|Sending scheduler request: Requested by user 12/23/2006 4:32:06 AM|Spinhenge@home|(not requesting new work or reporting completed tasks) 12/23/2006 4:32:11 AM|Spinhenge@home|Scheduler RPC succeeded [server version 507] 12/23/2006 4:32:11 AM|Spinhenge@home|Deferring scheduler requests for 5 minutes and 3 seconds -- ![]() Mark Reiss ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello merko,
The problem with the World Community Grid BOINC server is explained in 'Known Issues': http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=10703 Lawrence |
||
|
|
|