World Community Grid - View Thread - Completed, Valid Workunits in category Extreme

Adri,

I hadn't come back to this thread until now, so I missed your earlier reply - sorry about that :-)

Regarding the comment about Extreme tasks and download errors in my P.S. that you (rightly) picked up on as it wasn't precise enough, my concern is for how long it takes to move Extreme tasks along...

Firstly, I had misquoted the [current] deadline for Extreme tasks earlier (it's 1.5 days, not 3!) -- I'm surprised that neither Mike nor yourself picked that up :-)

However, the P.S. was based on the premise that if it's taking a while to do downloads and they are still producing errors the host is wasting bandwidth (which is not good news for the user!) and delaying the workunit(s) in question -- I should've pointed out that I was thinking of "repeat offenders" (where the number of errors may rack up faster than they can return successful results!) and added that download issues seem far more common for MCM1 and OPN1/G than for ARP1 (based on my observations of Linux wingmen, anyway!)... It was intended as observation rather than criticism...

For what it's worth, I have had a couple of "wrong size" errors, albeit weeks ago; - one was at the same time as a total inability to connect to the site in any form (so I guess the connection got cut from the far end...) and the other was when I accidentally did a retry on quite a lot of pending/deferred downloads whilst a big download was already going on (and I suspect the BOINC client cut that one off, rather than it being an issue at the far end -- the rapid attempts to connect, even with the default 2 maximum, might be causing the same issues you commented on elsewhere regarding having more concurrent connections[1]...)

Also regarding download errors, one of the Error returns in Mike's mega-unit (10 tries - wow!) was a "wrong size" error, but it was on one of the later retries! (The other Errors look like Not Started by Deadline, the curse [shared with No Reply] of all BOINC projects...)

You actually noted an earlier generation of that cell earlier in this thread; this is two generations later, so it looks as if the intervening generation results came in quite quickly too! That's promising for what can happen when there aren't multiple retries :-) -- however, when my small daily sample of about 10 tasks is seeing 20% or more retries for deadline failures...

Cheers - Al.

[1] During one of the earlier long forum/website outages one or two WCG folks were on one of the BOINC forums (specific to the IBM->Krembil move) and Richard Haselgrove posted some interesting stuff about connection buffers and how it appears to just kill the oldest buffer if there's a "crisis" - I wonder if that's what causes some of the download errors (especially for those who have high connection counts or have automated retries using the boinccmd brute force command at extremely short intervals...) -- I now take care to NOT do bulk retries when there's already a long transfer running, especially if there's a large backlog...