| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 214
|
|
| Author |
|
|
bfmorse
Senior Cruncher US Joined: Jul 26, 2009 Post Count: 442 Status: Offline Project Badges:
|
This morning I checked several of my logs, the problem first showed up here at about
----------------------------------------Some retries have worked through on their own but others not so much. If the download was progressing reasonably, I did not intervene; until a short while ago when the BACKING OFF effected a shortfall in the RUNNING tasks. At which point I forced retries until there was at least one READY TO START. Then allowed to system to proceed at its discretion. It APPEARS that something at Krembil’s data site changes about that time: Configuration changes, reverts or expires, Full or partial system backup(s), Some other automated process. [Edit 1 times, last edit by bfmorse at Oct 1, 2022 2:31:21 PM] |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
I'm experiencing massive download failures for ARP1-tasks only (WU download error: couldn't get input files): "-200 (wrong size)" since 02:00 UTC, 1 September. (Apart from transient HTTP errors for all other tasks that in the end do succeed in downloading.)
----------------------------------------In case anyone is wondering: they (the ARP1-tasks that fail to download) are ALL - without exceptions - failing with "-200 (wrong size)", although some ARP1-tasks have succeeded in downloading (I've counted about 10 of them succeeded while >150 failed in downloading). [Edit 2 times, last edit by adriverhoef at Oct 1, 2022 3:45:02 PM] |
||
|
|
sam6861
Advanced Cruncher Joined: Mar 31, 2020 Post Count: 107 Status: Offline Project Badges:
|
Wrong size errors appears to be limited to old version of BOINC or library or something. I may soon update the rest of my Linux Debian computers.
----------------------------------------Debian bullseye with BOINC 7.16.16: Random wrong size errors. -- My debian985 with AMD FX-4100. -- My debian643 with Intel Core2duo 4300 -- My debian538 with Intel Atom N270, no ARP1, tried once with a slow 7.1 Debian bookworm with BOINC 7.20.2: No wrong size errors. -- My debian264 with Intel i7-2600. -- My debian485 with Ryzen 2600g, back on Debian bullseye it had download errors. After update to bookworm, no download errors, just slow download and retries. Windows 10 with 7.20.2, no wrong size errors for 2 of my Windows 10 PC. Just a slow download and retries. [Edit 1 times, last edit by sam6861 at Oct 1, 2022 8:22:47 PM] |
||
|
|
astromatto
Cruncher Joined: May 26, 2007 Post Count: 4 Status: Offline Project Badges:
|
not entirely, I am running 7.20.2 and I see a dozen (and still see) wrong size errors, ARP seems to be the only project affected by this error.
I run ArchLinux so all my libraries are pretty much new as well |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
I am getting a lot of retries for MCM1 because the initial tasks have been getting wrong size errors; I also get occasional retries for OPN1/OPNG for the same reason. So far I've not had a "wrong size" error on any of those myself.
The predominance here seems to be ARP1, though, probably because the files are big enough that whatever is happening to truncate files in transit is more likely to bite ARP1 (and when it hits MCM1 it usually seems to be the 100MB+ master data file that fails to download). I've only ever had one "wrong size" error on any of my systems [that I noticed, anyway] and it was immediately preceded in the BOINC log by a "Project communication failed" message for an attempted download of a different file, which may or not have been relevant How well does the BOINC client cope if a transfer gets cut off at the server end? Here's hoping that if they can add some more resources to handle downloads and uploads better the "wrong size " (and checksum error) faults will more or less die off too. Cheers - Al. P.S. If I recall correctly, it was an occasional problem with MCM1 work at IBM-WCG too... |
||
|
|
Richard Haselgrove
Senior Cruncher United Kingdom Joined: Feb 19, 2021 Post Count: 360 Status: Offline Project Badges:
|
New error mode seen, shortly after 19:00 UTC today. Affecting downloads of OPNG data files. Found by using http_debug in the BOINC client error log.
----------------------------------------[http] HTTP error: Error in the HTTP2 framing layer Edit: full error log 04/10/2022 20:11:33 | World Community Grid | [http] HTTP_OP::init_get(): https://download.worldcommunitygrid.org/boinc...024de94267ebdacd42f.pdbqt Host: Linux Mint 20.3 BOINC: 7.20.2 Noticed because it caused an immediate retry, not after a slight delay like the 503s. [Edit 1 times, last edit by Richard Haselgrove at Oct 4, 2022 7:49:08 PM] |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Wrong size errors appears to be limited to old version of BOINC or library Nope. I'm running an up-to-date system.I think it's important to report that I may have found (at least a part of) the cause. In the old IBM-days I found it was very convenient to download many files at once. In cc_config.xml, I had this setting: <max_file_xfers_per_project>16</max_file_xfers_per_project>This had my attention today. Since I was seeing "Wrong size" errors each hour, sometimes up to 27 ARP1-tasks failing within an hour, I thought it was a good idea to tweak that number in cc_config.xml. So I decided to lower this setting considerably and change it to 4: <max_file_xfers_per_project>4</max_file_xfers_per_project>Since then I am seeing no "Wrong size" errors anymore. Although the download retries are still there (of course), the "Wrong size" errors - for me - are completely gone (after downloading 10 ARP1-tasks). You may say, "that's nothing!", but it really tells a lot more than nothing, because hardly any ARP1-task managed to cross the bridge, so to speak, in the past few days. On a friend's machine, I saw that its setting in cc_config.xml was: <max_file_xfers_per_project>8</max_file_xfers_per_project>Since it was also displaying the "Wrong size" error, I've changed the 8 into a 2 there: <max_file_xfers_per_project>2</max_file_xfers_per_project> On another machine, where the setting has been '2' all the time, since 12 August: <max_file_xfers_per_project>2</max_file_xfers_per_project>not one single "Wrong size" error happened in the past few months. |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
That's an interesting experimental observation about max_file_xfers_per_project, which tends to suggest connections might be being cut off for one reason or another and that it is likely to be made worse by having more connections on the go at once...
----------------------------------------My value for that has always been 2, and I had never noticed a "wrong size" or "checksum" download error until one was reported at about 05:00 UTC on 2022-10-04. I have a script that digs data specific to download issues on a specific day out of boinc.log, converts the dates to UTC and writes them to a file -- here is what happened immediately around the "wrong size" error (with the date [04-Oct-2022] removed and file names shortened to fit scrren width)... 04:50:47 [World Community Grid] Backing off 00:03:27 on download of 20[...].7z I checked the rest of the log, and the download for the file that failed had not been backed off previously, so I presume it got lucky(?) when its entry reached the top of the pending queue -- there were a lot of other items waiting at the time! So this file was downloading when that "Project communication failed" incident happened, and three seconds later the size error was reported; not a coincidence, I suspect! I don't know how easy it would be for truncated downloads caused by premature closing of connections to be detected as such, perhaps allowing a retry instead of wasting any other completed downloads associated with that task. By the way, I have seen a few other "Project communication failed" messages over recent days, but they seemed to coincide with periods where no download was actually in progress... Cheers - Al. P.S. I am running client 7.20.2 on the systems that have download issues with WCG, and my systems are kept reasonably up-to-date - as Adri says, this is not an issue unique to old versions of things... [Edit: put the date in the right month!..] [Edit 1 times, last edit by alanb1951 at Oct 6, 2022 3:51:51 AM] |
||
|
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 2173 Status: Offline Project Badges:
|
On another machine, where the setting has been '2' all the time, since 12 August: So long story short, don't f@&# around with the default settings of the BOINC client... <max_file_xfers_per_project>2</max_file_xfers_per_project>not one single "Wrong size" error happened in the past few months. Ralf |
||
|
|
poppinfresh99
Cruncher Joined: Feb 29, 2020 Post Count: 49 Status: Offline Project Badges:
|
Slow downloads may be caused by the following.
----------------------------------------I only run Mapping Cancer Markers, but this project alone might be slowing downloads. My computer repeatedly downloads the following huge (107 MB) file... mcm1.dataset-sarc1.txt which could be slowing everything down. My computer tries to download more tasks but "Tasks are committed to other platforms" or some other error, so it runs out of tasks. When it runs out of tasks, BOINC deletes mcm1.dataset-sarc1.txt With IBM, perhaps certain files wouldn't be deleted when all tasks that needed it were finished? The solution could be for WCG to either... (1) not have mcm1.dataset-sarc1.txt be deleted until "sarc2" or whatever comes out (2) fixing the "Tasks are committed to other platforms" issue Regardless, perhaps the simple solution is for everyone to store at least a couple days of work and to only run a few projects (like only Mapping Cancer Markers)? [Edit 1 times, last edit by poppinfresh99 at Oct 8, 2022 10:20:56 PM] |
||
|
|
|