Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 352
Posts: 352   Pages: 36   [ Previous Page | 17 18 19 20 21 22 23 24 25 26 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2178924 times and has 351 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server Errors.

seems to be working. Again well done..great job
[Jul 30, 2012 12:52:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Errors.

knreed: "We have worked on a number of items this weekend to both mitigate and further investigate this ongoing issue." in the Known Issues [read only] forum at http://www.worldcommunitygrid.org/forums/wcg/...33481_lastpage,yes#386302.
Applause for your dedication, techs.

Thngs seem to be running much more smoothly from this end. The irregular server outages are hardly noticeable now if at all.

Just a thought - ignore if not relevant.
Some time ago I had an issue with a machine that kept freezing up for periods of 1-2 min and getting WU timeouts and restarts with "exiting with zero status". It turned out to be a dodgy not-very-old system/BOINC data hard drive (WD 500GB Green) that was going into "500GB floppy disc mode", ie slowing right down but never returning even a single hard error, several times per day. The drive passes the WD DOS diagnostics perfectly every time.
Maybe you have a server hardware/firmware problem?
[Jul 31, 2012 6:13:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Errors.

This issue is not solved - but we are making progress.

We currently disabled the application changes we made this weekend. We did this so that we can monitor the impact of disabling Ethernet flow control (specifically disabling the sending of 'Pause Frames') on our servers and LAN switches. This seems to have improved the situation significantly - but definitely not eliminated it (we have had about 10 minutes of outage over the past 24 hours). If you are interested, here is a description of Ethernet Flow control and how it can interfere with TCP flow control: http://virtualthreads.blogspot.com.br/2006/02/beware-ethernet-flow-control.html

We are also working with technical leads for the entire data center that we are hosted at to help us identify the root cause.
----------------------------------------
[Edit 1 times, last edit by knreed at Aug 1, 2012 3:56:36 PM]
[Aug 1, 2012 3:54:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 3010
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Errors.

Thanks Kevin for keeping us all in the picture - by the sounds of it, you're making progress, so hopefully, you'll soon have this issue resolved once and for all.
----------------------------------------

[Aug 1, 2012 4:43:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server Errors.

No wonder WCG-servers appeared to be down, or should I say, 'paused'?

But why bother with a low-level Ethernet flow-control? Isn't TCP flow-control good enough?

What's cooking, WCG-Techs? thinking
;
[Aug 1, 2012 7:39:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Errors.

No wonder WCG-servers appeared to be down, or should I say, 'paused'?

But why bother with a low-level Ethernet flow-control? Isn't TCP flow-control good enough?


The default on both the switches and the servers was to have both levels of flow-control on. For three years it worked fine. However, the system is behaving much better with TCP flow control only.

However, this is not the root cause - we still see symptoms of the problem appearing, but the tcp-ip flow control is doing a much better job managing through the situation an dramatically reducing the times it leads to an impact to our volunteers. We are continuing to look for the root cause.
[Aug 2, 2012 1:17:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server Errors.

Hello knreed
Reference: knreed [Aug 2, 2012 1:17:44 PM] post

Thanks for the response.
;
[Aug 2, 2012 6:12:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Errors.

Starting to see C4SW uploads stalled again. Anyone else?
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


[Aug 10, 2012 4:37:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BSD
Senior Cruncher
Joined: Apr 27, 2011
Post Count: 224
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server Errors.

My stalled result files uploaded about two hours ago, but the corresponding reported completed tasks are now stalled.

p.s. Oh they just went. A little drano helps silly
[Aug 10, 2012 4:46:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Errors.

Starting to see C4SW uploads stalled again. Anyone else?


Same here on HCC and HFCC.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Aug 10, 2012 4:47:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 352   Pages: 36   [ Previous Page | 17 18 19 20 21 22 23 24 25 26 | Next Page ]
[ Jump to Last Post ]
Post new Thread