Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Official Messages Forum: News Thread: 2022-08-19 (Networking Issue Update) |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 203
|
Author |
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1931 Status: Offline Project Badges: |
To automate retries I use Adri's wcgresults from Beside that not everyone is running Linux (probably a minority among the hosts), this is just a crutch to get by for the time being, rather than have Krembil fix this problem at its source...https://sourceforge.net/projects/wcgtools/files/ On Linux use command crontab -e to create a timer to run every 15 minutes with option -x. Ralf |
||
|
Robokapp
Senior Cruncher Joined: Feb 6, 2012 Post Count: 248 Status: Offline Project Badges: |
small indie company and team of volunteers, remember?
|
||
|
Phill23
Advanced Cruncher Joined: Jan 3, 2006 Post Count: 59 Status: Offline Project Badges: |
Experiencing nothing but issues when it comes to downloading the work units, however the upload has been perfectly fine. The retry button I think will soon be worn out with the amount of times I've been pressing it recently :(
Some friends in the US haven't been having the issue thought but here (UK) and a friend in Germany, have been having the same issues. Having the same HTTP error as it seems a few people are :( Such a shame. |
||
|
JEvenden
Cruncher Joined: Aug 18, 2005 Post Count: 2 Status: Offline Project Badges: |
I have serval machine which have ben running fine getting work and processing till yesterday. Most now not getting work and this one I am on has 6 hung in transfer and 4 out of the usual 8 running. i will be full stop in 4 hours.
|
||
|
PMH_UK
Veteran Cruncher UK Joined: Apr 26, 2007 Post Count: 764 Status: Offline Project Badges: |
To automate retries I use Adri's wcgresults from Beside that not everyone is running Linux (probably a minority among the hosts), this is just a crutch to get by for the time being, rather than have Krembil fix this problem at its source...https://sourceforge.net/projects/wcgtools/files/ On Linux use command crontab -e to create a timer to run every 15 minutes with option -x. Ralf True, I'm in a minority running only Linux now. wcgresults could probably be run on Windows under Cygwin or WSL. Others have posted ways to automate retry in this and/or other threads. Also someone just posted a script to retry on multiple systems from one. Paul.
Paul.
|
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7578 Status: Offline Project Badges: |
Some friends in the US haven't been having the issue thought but here (UK) and a friend in Germany, have been having the same issues. Having the same HTTP error as it seems a few people are :( Such a shame. Nope. Still having the same issues here. It is still a Krembil problem, probably everywhere. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
bfmorse
Senior Cruncher US Joined: Jul 26, 2009 Post Count: 294 Status: Offline Project Badges: |
I’m in the US and still having issues!
|
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1931 Status: Offline Project Badges: |
True, I'm in a minority running only Linux now. Linux or not, the bottom line is that workarounds like this are not going to fix the situation. In fact, they are likely to make things worse if workarounds/crutches like this are being used in larger numbers, due to an ever increasing barrage of connection attempts on the server sides.wcgresults could probably be run on Windows under Cygwin or WSL. Others have posted ways to automate retry in this and/or other threads. Also someone just posted a script to retry on multiple systems from one. Paul. And as I mentioned before, I doubt that this is a "bandwidth" issue, the longer it keeps going, the more I am convinced that this is a limitation of the number of concurrent file handles on the (cluster) file system of the server(s), the number of concurrent connection on the database(s) being used or the number of concurrent connections on the web server(s) being used. Most likely even a combination of those things. Don't think that it is a direct problem of processing power of either the database or web server(s), as the later at least is able to send proper 503 error messages back. If the web server would be so overburdened that it wouldn't answer at all when a connection is being tried, much rather a 408 (timeout) client side http error would be to be expected... Ralf |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 869 Status: Offline Project Badges: |
Ralf - good analysis...
And as I mentioned before, I doubt that this is a "bandwidth" issue, the longer it keeps going, the more I am convinced that this is a limitation of the number of concurrent file handles on the (cluster) file system of the server(s), the number of concurrent connection on the database(s) being used or the number of concurrent connections on the web server(s) being used. Most likely even a combination of those things. There's definitely a shortage of "infrastructure" - whether it can be [partially] solved by adjusting system configuration parameters (e.g. available file handles) is unclear, so that brings us back to why they still don't seem to have all the "servers" they apparently planned for.Don't think that it is a direct problem of processing power of either the database or web server(s), as the later at least is able to send proper 503 error messages back. If the web server would be so overburdened that it wouldn't answer at all when a connection is being tried, much rather a 408 (timeout) client side http error would be to be expected... Of course, when there's not much work available the servers don't get hammered so hard and it might look as if the problems have been resolved - but no, they've just been deferred! And now we have more OPNG and non-retry ARP1 work so it's no surprise it has kicked off again... Blount had a point when singling out the network people -- I'd love to be a fly on the wall when Igor Jurisica or one of his [small] team contacts them (yet again?) to ask when their extra servers will be available, as I suspect the frustration levels must be quite high... Keep the critique going - eventually we might get some much more detailed responses! Cheers - Al. P.S. there may end up being a total bandwidth issue as when there's lots of work available my download rates seem to plummet by 80% or more (which suggests natural throttling...) However, perhaps when the network/infrastructure issues are resolved there'll also be more total bandwidth? I wonder how much total external capacity Sharcnet has :-) |
||
|
|