Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 14
|
![]() |
Author |
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We are currently experience an intermittent issue that causes a significant slowdown on the filesystems that support file downloads and file uploads. Due to this file upload and downloads will be intermittently unavailable.
----------------------------------------[Edit 3 times, last edit by knreed at Sep 5, 2012 5:33:52 AM] |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We are open again, but it is not clear why it slowed down for a period of time. We will continue to investigate.
|
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
And it is back again.
|
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Currently, when this issue occurs, it impacts scheduler requests, website/forum access and file upload/downloads. However, it is only file upload/downloads that are involved in the actual root cause of the issue.
As a result, we are going to change things so that website/forum traffic and scheduler requests go through one set of servers while file upload/download requests go through a separate set of servers. This change will be largely transparent. However, we will be changing the scheduler URL to https://scheduler.worldcommunitygrid.org/boinc/wcg_cgi/fcgi from https://grid.worldcommunitygrid.org/boinc/wcg_cgi/fcgi. In order to force the clients to change their setting, we will disable the scheduler at https://grid.worldcommunitygrid.org/boinc/wcg_cgi/fcgi. This means that your client will try 10 times to connect before querying the website again for the current location of the scheduler. It will then get the new location and connect properly. No action is needed on your part, but you will see messages in the software client. |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The issue with uploading/downloading is not yet resolved. The change that we made on July 20th was to ensure that our volunteers could reliably access the website and forums. We have also moved scheduler requests as well since they are independent of the filesystem issue as well.
We are working with the support groups for Red Hat Linux and IBM GPFS (the filesystem) to resolve this issue. Users will unfortunately continue to see messages such as this until the issue is resolved: 24/07/2012 12:43:58 | World Community Grid | Started upload of c4cw_target06_088990280_0_0 24/07/2012 12:43:58 | World Community Grid | Started upload of c4cw_target06_088989962_0_0 24/07/2012 12:44:20 | World Community Grid | Temporarily failed upload of c4cw_target06_088990280_0_0: connect() failed 24/07/2012 12:44:20 | World Community Grid | Backing off 1 hr 29 min 3 sec on upload of c4cw_target06_088990280_0_0 24/07/2012 12:44:20 | World Community Grid | Temporarily failed upload of c4cw_target06_088989962_0_0: connect() failed 24/07/2012 12:44:20 | World Community Grid | Backing off 11 min 18 sec on upload of c4cw_target06_088989962_0_0 24/07/2012 12:44:21 | | Project communication failed: attempting access to reference site 24/07/2012 12:44:24 | | Internet access OK - project servers may be temporarily down. |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We continue to experience this issue. We have investigated a few items suggested by the support groups, but nothing that has resolved the issue or led us to understand the root cause.
In the meantime, we are implementing some workarounds that will allow us limit the duration of this issue when it occurs. Although this won't eliminate the issue, the goal is to minimize the impact on the end users. Part of the workarounds include disabling scheduler requests and file uploads when the issue occurs. As a result, there will be times when both file uploads/downloads and schedule requests are disabled. |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We have worked on a number of items this weekend to both mitigate and further investigate this ongoing issue. At the moment, we have started running some new code for some of our backend processes that allow them to quickly pause when this issue appears. This should significantly reduce the time that our volunteers are not able to upload/download work.
|
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We are going to disable the servers for a few moments here shortly as we change some settings, the outage should be only minutes.
Thanks, -Uplinger |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Server disable was completed successfully and things should be running again.
Thanks, -Uplinger |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Over the past 12 hours we have been running some tests to help determine the cause of our issues. These have included some diagnostic tools on the filesystem as well as network tracing between the storage system and the servers. We have disabled those now and things should be returning to normal.
Thank you for your patience, -Uplinger |
||
|
|
![]() |