Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Official Messages Forum: News Thread: Comprehensive Issue List & Report Thread (Feb. 24, 2023) |
No member browsing this thread |
Thread Status: Active Thread Type: Sticky Thread Total posts in this thread: 423
|
Author |
|
cuphi
Cruncher Joined: Aug 8, 2021 Post Count: 7 Status: Offline Project Badges: |
After I saw the announcement about the increase in WU's this week I decided to add WCG back on my Ryzen 9. It started to download two tasks but before it completed the downloads I got a "Project Backoff" and now it's a five hour wait until it tries again.
|
||
|
Jean-David Beyer
Senior Cruncher USA Joined: Oct 2, 2007 Post Count: 335 Status: Offline Project Badges: |
After I saw the announcement about the increase in WU's this week I decided to add WCG back on my Ryzen 9. It started to download two tasks but before it completed the downloads I got a "Project Backoff" and now it's a five hour wait until it tries again. My experience the last few days is that I am being offered more WCG work than in the previous six months or so. These work-units seem to have 10 or more files in each, and not all of them download correctly on the first attempt. Now the Boinc-Client has an algorithm that decides when it tries again, and that algorithm does not work well for the present condition. But if you look at the Transfers tab, it will list the files that need to be downloaded (and predict when it will try again). You can then select a file and press Retry and it will try again. It may fail, but it often works. Now, my experience is that I can get all the files in 10 minutes or less. For some work units, all the files come down without any assistance from me. |
||
|
cuphi
Cruncher Joined: Aug 8, 2021 Post Count: 7 Status: Offline Project Badges: |
Retrying the transfers just extends the Back Off timer because it is coming from the WCG servers, not my PC.
|
||
|
Traveller42
Cruncher Joined: May 7, 2017 Post Count: 21 Status: Offline Project Badges: |
Actually, that Project Backoff is locally generated. After a number of consecutive failures, it will assume the project has an issue and delay all transfers for that project. That Project Backoff will disappear as soon as any of the Retries succeed.
It appears that the rate of failure is higher than 50%, but these failures are transients. The success rate might be as low as 20%, but I done the analysis to determine that value with any confidence. The nature of the failures suggests the issue is in the instances that are spun up to handle each transfer, or in the load balancer handing off the traffic. |
||
|
cuphi
Cruncher Joined: Aug 8, 2021 Post Count: 7 Status: Offline Project Badges: |
When a file fails to download the BOINC client is reporting one of two scenarios:
1.) If the file is over 1MB in size there is no progress listed. 2.) If the file is less than 1MB is size the download will fail after 107 bytes have been downloaded. This could be a simple rounding error where the BOINC client doesn't show progress for receiving 107 bytes on files larger than 1MB is size. I just wanted to mention it in case case it helps someone figure out a fix sooner rather than later. |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1931 Status: Offline Project Badges: |
When a file fails to download the BOINC client is reporting one of two scenarios: When a download fails, there is only one scenario, it always fails at 107 bytes, and while this only with a file size of <K, it will show as 0.1KB if small enough to show up with two decimal digits, but it just won't show up on a MB size file...1.) If the file is over 1MB in size there is no progress listed. 2.) If the file is less than 1MB is size the download will fail after 107 bytes have been downloaded. This could be a simple rounding error where the BOINC client doesn't show progress for receiving 107 bytes on files larger than 1MB is size. I just wanted to mention it in case case it helps someone figure out a fix sooner rather than later. And yes, this is known for weeks, just apparently nobody had time yet to really look into the issue. Let's see what happens next week... Ralf |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 869 Status: Offline Project Badges: |
When a file fails to download the BOINC client is reporting one of two scenarios: When a download fails, there is only one scenario, it always fails at 107 bytes, and while this only with a file size of <K, it will show as 0.1KB if small enough to show up with two decimal digits, but it just won't show up on a MB size file...1.) If the file is over 1MB in size there is no progress listed. 2.) If the file is less than 1MB is size the download will fail after 107 bytes have been downloaded. This could be a simple rounding error where the BOINC client doesn't show progress for receiving 107 bytes on files larger than 1MB is size. I just wanted to mention it in case case it helps someone figure out a fix sooner rather than later. And yes, this is known for weeks, just apparently nobody had time yet to really look into the issue. Let's see what happens next week... Ralf My understanding is that the issue has been looked into and is being considered a part of the networking issues because they can't get all their servers active at the moment, resulting in various issues under periods of heavy load. (The servers that are accessible can only offer a finite number of simultaneous connections, after all!) This seems to be [indirectly] confirmed in the "2022-08-26 Update" News post by WCG, Cheers - Al. |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1931 Status: Offline Project Badges: |
For information, 107 bytes is the size of the HTTP page sent out for a 503 error ("Service unavailable"). :-) This is an issue that doesn't exist since yesterday. The "testing" is running for two months now, with not the slightest change in sight. So not sure what has been indeed been 'looked into".(It's a bit like people doing a ping to diagnose network connectivity). My understanding is that the issue has been looked into and is being considered a part of the networking issues because they can't get all their servers active at the moment, resulting in various issues under periods of heavy load. (The servers that are accessible can only offer a finite number of simultaneous connections, after all!) This seems to be [indirectly] confirmed in the "2022-08-26 Update" News post by WCG, And an underlying 503 error is not really a "networking" issue per se, it indicates rather an issue with the database and/or file system. After all, it is the IP stack that would be generating that response, so that has been able to establish a TCP connection, or it would not have been able to send that error back..And it is a bit odd that for those two months, we have to wait for "that data center guy coming back from summer break". If it would be indeed a true "networking" issue, it would rather have been a client side 408 error (though, as things are going so far, I would not be surprised to see a 418 error code ) Ralf |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7576 Status: Offline Project Badges: |
I would not be surprised to see a 418 error code I like the humor. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1931 Status: Offline Project Badges: |
I would not be surprised to see a 418 error code I like the humor. Cheers Ralf |
||
|
|