Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 203
Posts: 203   Pages: 21   [ Previous Page | 12 13 14 15 16 17 18 19 20 21 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 153585 times and has 202 replies Next Thread
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1931
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-19 (Networking Issue Update)

To automate retries I use Adri's wcgresults from
https://sourceforge.net/projects/wcgtools/files/
On Linux use command crontab -e to create a timer to run every 15 minutes with option -x.
Beside that not everyone is running Linux (probably a minority among the hosts), this is just a crutch to get by for the time being, rather than have Krembil fix this problem at its source...

Ralf
----------------------------------------

[Sep 21, 2022 6:55:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Robokapp
Senior Cruncher
Joined: Feb 6, 2012
Post Count: 248
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-19 (Networking Issue Update)

small indie company and team of volunteers, remember?
[Sep 21, 2022 11:34:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Phill23
Advanced Cruncher
Joined: Jan 3, 2006
Post Count: 59
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-19 (Networking Issue Update)

Experiencing nothing but issues when it comes to downloading the work units, however the upload has been perfectly fine. The retry button I think will soon be worn out with the amount of times I've been pressing it recently :(

Some friends in the US haven't been having the issue thought but here (UK) and a friend in Germany, have been having the same issues. Having the same HTTP error as it seems a few people are :( Such a shame.
[Sep 22, 2022 10:34:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JEvenden
Cruncher
Joined: Aug 18, 2005
Post Count: 2
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-19 (Networking Issue Update)

sad I have serval machine which have ben running fine getting work and processing till yesterday. Most now not getting work and this one I am on has 6 hung in transfer and 4 out of the usual 8 running. i will be full stop in 4 hours.
[Sep 22, 2022 11:32:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 764
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-19 (Networking Issue Update)

To automate retries I use Adri's wcgresults from
https://sourceforge.net/projects/wcgtools/files/
On Linux use command crontab -e to create a timer to run every 15 minutes with option -x.
Beside that not everyone is running Linux (probably a minority among the hosts), this is just a crutch to get by for the time being, rather than have Krembil fix this problem at its source...

Ralf


True, I'm in a minority running only Linux now.
wcgresults could probably be run on Windows under Cygwin or WSL.
Others have posted ways to automate retry in this and/or other threads.
Also someone just posted a script to retry on multiple systems from one.

Paul.
----------------------------------------
Paul.
[Sep 22, 2022 2:34:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7578
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-19 (Networking Issue Update)

Some friends in the US haven't been having the issue thought but here (UK) and a friend in Germany, have been having the same issues. Having the same HTTP error as it seems a few people are :( Such a shame.

Nope. Still having the same issues here. It is still a Krembil problem, probably everywhere.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Sep 22, 2022 4:20:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-19 (Networking Issue Update)

I’m in the US and still having issues!
[Sep 22, 2022 4:41:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1931
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-19 (Networking Issue Update)

True, I'm in a minority running only Linux now.
wcgresults could probably be run on Windows under Cygwin or WSL.
Others have posted ways to automate retry in this and/or other threads.
Also someone just posted a script to retry on multiple systems from one.

Paul.
Linux or not, the bottom line is that workarounds like this are not going to fix the situation. In fact, they are likely to make things worse if workarounds/crutches like this are being used in larger numbers, due to an ever increasing barrage of connection attempts on the server sides.

And as I mentioned before, I doubt that this is a "bandwidth" issue, the longer it keeps going, the more I am convinced that this is a limitation of the number of concurrent file handles on the (cluster) file system of the server(s), the number of concurrent connection on the database(s) being used or the number of concurrent connections on the web server(s) being used. Most likely even a combination of those things.
Don't think that it is a direct problem of processing power of either the database or web server(s), as the later at least is able to send proper 503 error messages back. If the web server would be so overburdened that it wouldn't answer at all when a connection is being tried, much rather a 408 (timeout) client side http error would be to be expected...

Ralf
----------------------------------------

[Sep 22, 2022 4:52:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 869
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-19 (Networking Issue Update)

Ralf - good analysis...
And as I mentioned before, I doubt that this is a "bandwidth" issue, the longer it keeps going, the more I am convinced that this is a limitation of the number of concurrent file handles on the (cluster) file system of the server(s), the number of concurrent connection on the database(s) being used or the number of concurrent connections on the web server(s) being used. Most likely even a combination of those things.
Don't think that it is a direct problem of processing power of either the database or web server(s), as the later at least is able to send proper 503 error messages back. If the web server would be so overburdened that it wouldn't answer at all when a connection is being tried, much rather a 408 (timeout) client side http error would be to be expected...
There's definitely a shortage of "infrastructure" - whether it can be [partially] solved by adjusting system configuration parameters (e.g. available file handles) is unclear, so that brings us back to why they still don't seem to have all the "servers" they apparently planned for.

Of course, when there's not much work available the servers don't get hammered so hard and it might look as if the problems have been resolved - but no, they've just been deferred! And now we have more OPNG and non-retry ARP1 work so it's no surprise it has kicked off again...

Blount had a point when singling out the network people -- I'd love to be a fly on the wall when Igor Jurisica or one of his [small] team contacts them (yet again?) to ask when their extra servers will be available, as I suspect the frustration levels must be quite high...

Keep the critique going - eventually we might get some much more detailed responses!

Cheers - Al.

P.S. there may end up being a total bandwidth issue as when there's lots of work available my download rates seem to plummet by 80% or more (which suggests natural throttling...) However, perhaps when the network/infrastructure issues are resolved there'll also be more total bandwidth? I wonder how much total external capacity Sharcnet has :-)
[Sep 22, 2022 9:04:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 203   Pages: 21   [ Previous Page | 12 13 14 15 16 17 18 19 20 21 | Next Page ]
[ Jump to Last Post ]
Post new Thread