Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Official Messages Forum: News Thread: 2022-09-15 Update (Networking & Workunits) |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 214
|
Author |
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2068 Status: Recently Active Project Badges: |
@TPCBF
Yup, full bore again on OPNG, which is the only thing I run for now. I do not ask for any type of CPU tasks. |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1928 Status: Offline Project Badges: |
@TPCBF Ok, so can we now blame this whole mess officially on you? Yup, full bore again on OPNG, which is the only thing I run for now. I do not ask for any type of CPU tasks. Ralf |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2068 Status: Recently Active Project Badges: |
@TPCBF Ok, so can we now blame this whole mess officially on you? Yup, full bore again on OPNG, which is the only thing I run for now. I do not ask for any type of CPU tasks. Ralf |
||
|
Blount
Senior Cruncher Joined: Aug 19, 2005 Post Count: 364 Status: Offline Project Badges: |
Cyclops, Please tell us your network group understands the severity of the download HTTP issue. ? This is not a transient issue that just needs a retry. It is a high volume failure that prevents massive numbers of downloads. Keeps the CPUs from being loaded with tasks, etc. I have to believe this is causing network issues at the server side.
|
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1928 Status: Offline Project Badges: |
Cyclops, Please tell us your network group understands the severity of the download HTTP issue. ? This is not a transient issue that just needs a retry. It is a high volume failure that prevents massive numbers of downloads. Keeps the CPUs from being loaded with tasks, etc. I have to believe this is causing network issues at the server side. I don't think that they understand the severity of this issue. The more so, as it seems to have been fixed for a few days last week, as there were pretty much no download issues at all. But since the beginning of the week, for as long as it has been working last week, this has been nothing but a hot mess now... . And no, while the basic networking error message calls it likely a "transient" error, it isn't transient (temporary) at all. It is a permanent issue that needs to be actively fixed, not just sit and wait until it magically fixes itself. It won't. In fact, the last couple of days, for me, this has been the worst I have ever seen. This morning, when checking on some OS updates on two servers, I noticed that they ran only 6 and 8 WU respectively, all the while being 6C/12T "Silver" Xeons. Turned out that overnight, they had worked off all the previously downloaded WUs, but each were stuck with one single (MCM1) file to download, which they have been retrying for +8h, at least. So this is seriously impacting the processing of WUs in general. And I think that the longer the issue persists,and the more people come up with automated ways to retry those downloads, the worse the whole situation gets. But instead of (however slowly) fixing things and getting things back to normal (as it was before the move, not the "new normal"), it seems that they just keep digging that whole we're in even deeper... Ralf |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2070 Status: Offline Project Badges: |
So this is seriously impacting the processing of WUs in general. I agree with you here.And I think that the longer the issue persists,and the more people come up with automated ways to retry those downloads, the worse the whole situation gets. Not that quite, do I agree.If there were to be a smart way to minimize retries and file transfers (for the time being, who knows how long this will last), but maybe - just maybe - also in the long run, the sooner files will be transferred and be done, so that there is more place for other people to try a file transfer. Since many people don't read the forum, there will be a select group of people using an automated way (instead of wildly clicking the Retry button). People just 'hammering' away on the Retry button in despair is even worse for the server, I think. For a start, I haven't touched the Retry button since the introduction of - here it comes again - Adri |
||
|
Sphynxx
Cruncher Joined: Nov 24, 2010 Post Count: 47 Status: Offline Project Badges: |
Since WCG is still in test phase, they need to be tested and that includes by those that use scripts. If they can't handle it now, I hate to imagine what's going to happen when they start posting their owns stats and everyone returns, including the people that run it for pay via crypto. It could be a very short restart if they don't figure it out with only muted pressure..
----------------------------------------[Edit 1 times, last edit by Sphynxx at Sep 22, 2022 8:08:41 PM] |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1928 Status: Offline Project Badges: |
Since WCG is still in test phase, they need to be tested and that includes by those that use scripts. If they can't handle it now, I hate to imagine what's going to happen when they start posting their owns stats and everyone returns, including the people that run it for pay via crypto. It could be a very short restart if they don't figure it out with only muted pressure.. I think you're missing the point here. The question was how much the constant retries using automated scripts are worsening the current download situation. The only way to fix this is for Krembil to finally get their shit together...There was no mention at this point of any stats... Ralf |
||
|
Sphynxx
Cruncher Joined: Nov 24, 2010 Post Count: 47 Status: Offline Project Badges: |
I got your point, I just don't agree with it.
---------------------------------------- |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1928 Status: Offline Project Badges: |
I got your point, I just don't agree with it. Did you read Krembil's (well, Cyclops, their mouth piece) reply about it being just "transient http" errors, just wait and it will go away? Sorry, doesn't sound like actively testing anything... :(Ralf |
||
|
|