| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 352
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
seems to be working. Again well done..great job
|
||
|
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges:
|
knreed: "We have worked on a number of items this weekend to both mitigate and further investigate this ongoing issue." in the Known Issues [read only] forum at http://www.worldcommunitygrid.org/forums/wcg/...33481_lastpage,yes#386302.
Applause for your dedication, techs. Thngs seem to be running much more smoothly from this end. The irregular server outages are hardly noticeable now if at all. Just a thought - ignore if not relevant. Some time ago I had an issue with a machine that kept freezing up for periods of 1-2 min and getting WU timeouts and restarts with "exiting with zero status". It turned out to be a dodgy not-very-old system/BOINC data hard drive (WD 500GB Green) that was going into "500GB floppy disc mode", ie slowing right down but never returning even a single hard error, several times per day. The drive passes the WD DOS diagnostics perfectly every time. Maybe you have a server hardware/firmware problem? |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
This issue is not solved - but we are making progress.
----------------------------------------We currently disabled the application changes we made this weekend. We did this so that we can monitor the impact of disabling Ethernet flow control (specifically disabling the sending of 'Pause Frames') on our servers and LAN switches. This seems to have improved the situation significantly - but definitely not eliminated it (we have had about 10 minutes of outage over the past 24 hours). If you are interested, here is a description of Ethernet Flow control and how it can interfere with TCP flow control: http://virtualthreads.blogspot.com.br/2006/02/beware-ethernet-flow-control.html We are also working with technical leads for the entire data center that we are hosted at to help us identify the root cause. [Edit 1 times, last edit by knreed at Aug 1, 2012 3:56:36 PM] |
||
|
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 3010 Status: Offline Project Badges:
|
Thanks Kevin for keeping us all in the picture - by the sounds of it, you're making progress, so hopefully, you'll soon have this issue resolved once and for all.
----------------------------------------![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
No wonder WCG-servers appeared to be down, or should I say, 'paused'?
But why bother with a low-level Ethernet flow-control? Isn't TCP flow-control good enough? What's cooking, WCG-Techs? ; |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
No wonder WCG-servers appeared to be down, or should I say, 'paused'? But why bother with a low-level Ethernet flow-control? Isn't TCP flow-control good enough? The default on both the switches and the servers was to have both levels of flow-control on. For three years it worked fine. However, the system is behaving much better with TCP flow control only. However, this is not the root cause - we still see symptoms of the problem appearing, but the tcp-ip flow control is doing a much better job managing through the situation an dramatically reducing the times it leads to an impact to our volunteers. We are continuing to look for the root cause. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
|
||
|
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges:
|
Starting to see C4SW uploads stalled again. Anyone else?
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
![]() ![]() |
||
|
|
BSD
Senior Cruncher Joined: Apr 27, 2011 Post Count: 224 Status: Offline |
My stalled result files uploaded about two hours ago, but the corresponding reported completed tasks are now stalled.
p.s. Oh they just went. A little drano helps ![]() |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
Starting to see C4SW uploads stalled again. Anyone else? Same here on HCC and HFCC. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
|