| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 352
|
|
| Author |
|
|
Link64
Senior Cruncher Joined: Feb 19, 2021 Post Count: 206 Status: Offline Project Badges:
|
Oh, I also review the RESENDS "waiting to start" on my systems 3 or 4 times a day and set them to running to TRY and avoid or at least minimize SERVER ABORTED Status. Why would you do that? To force your computer to waste cycles on something, that has been already crunched? Server aborts are there to minimize waste of computing ressources and by that speed up the project. Sorry, but minimizing them is simply stupid and the opposite of helping the project in the best way you can.![]() [Edit 5 times, last edit by Link64 at Jul 10, 2025 9:06:31 AM] |
||
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
bfmorse
I looked at the first one and it seems that you received your copy as soon as the other copy had passed its due date and then it was returned soon after. Any crunching by you would have been wasted as they already had 2 valid results so yours was aborted to save your crunching from being wasted. Mike |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
As far as I am aware (if my memories of a past code dive are correct), the various "couldn't use" statuses such as "Server Aborted" and "Too Late" don't count as host errors so should not affect "reliable host" status!
As has already been said, there's no point worrying about Server Aborted tasks, especially if the above is true; some unneeded retries will run anyway (which is why the lack of a grace period for ARP1 is annoying given the potentially unnecessary download traffic), but unless a client system fills up on retries (unlikely. I'd have thought) it shouldn't unduly hinder the [eventual] collection of new work and those retries that aren't for "No Reply" tasks. Cheers - Al. |
||
|
|
bfmorse
Senior Cruncher US Joined: Jul 26, 2009 Post Count: 442 Status: Recently Active Project Badges:
|
Well, I suppose I should put this in perspective...
I had 18 server aborted work units (WU's) on July 9th - which, to me, is frustrating. When I started processing them, I have no idea that the original volunteers run an apparent Queue of at least six days. (A reasonable queue with UNLIMITED as the cap on WU's could do that too.) That six day delay in returning their WU causes an automatic WU release for, in these cases, me. But I should put my Aborted Work Units in perspective: I looked at "My Activity over time" on my "OVERVIEW" page, the Mapping Cancer Markers' graph shows that I typically successfully return and get credit for at least 1,500 Results each day. Other projects' WU's I successfully return are not included in those numbers. All in all, my wish is that queues be monitored and not run so long as to cause a new WU to be released for someone else to process. Thereby making unnecessary work whether they are aborted or not. |
||
|
|
Link64
Senior Cruncher Joined: Feb 19, 2021 Post Count: 206 Status: Offline Project Badges:
|
All in all, my wish is that queues be monitored and not run so long as to cause a new WU to be released for someone else to process. Thereby making unnecessary work whether they are aborted or not. Well, we all wish, that WCG has a steady WU supply, so we can run with nearly no cache at all, but the reality is as it is, some people get frustrated and run too large caches either because of that or because it's what worked OK on other projects for them in the past. Or they simply had their computer off for a couple of days, this can be another couse for late tasks, I've seen that couple of times when suddenly a batch of resends was aborted by the server.Anyway, regardless of what we wish, we simply need to adapt to reality. The solution that works best for me is a cache size of 1.2 days. That's the largest possible cache, with which the resends do not go instantly into EDF-mode (earliest deadline first), so I give the computers that had the original tasks assigned the longest possible time to complete them and the server to abort the resends before my computer starts processing them. BTW., if you process 1500 tasks/day, 18 aborted by server isn't that much IMHO. ![]() [Edit 2 times, last edit by Link64 at Jul 11, 2025 8:52:10 AM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
All in all, my wish is that queues be monitored and not run so long as to cause a new WU to be released for someone else to process. Thereby making unnecessary work whether they are aborted or not. I am in agreement with link64 as to reasons for late returns happening, though I can think of other "innocent" reasons as well... As for cutting down the number of retries that end up being unnecessary, in olden days (WCG/IBM) projects had a 24 hour grace period associated with them. The client didn't know about this (seeing only the deadline time) but the server would allow that grace period for No Reply cases before sending a retry. This made it far more likely that overdue results would end up aborted by the client (known as "Not started by deadline"[*1]) so the proportion of Server Aborted retries for tasks from applications with short run times (most WCG projects!) was a lot lower then... However, the grace period was taken off as part of clearing up the outstanding work before the IBM->Krembil migration -- it doesn't seem to have been restored. The ratio of delayed tasks that end up No Reply is a lot higher now. Some convert to Error status fairly quickly (as the client aborts them!) and some don't return before the WU is assimilated, but some of the rest will return late and validate -- some of those will cause Server Aborts but others will result in a WU having three (or more!) validated results! Personally, I'd like to seem them bring back the grace period and reduce the deadlines by a day at the same time -- nowadays I can't see any reason for a 6 day deadline on something like ARP1[*2], let alone MCM1 or [presumably] MAM1 when it finally launches, and shortening the deadline might help with the occasional problems with either backed-up validations or issuing of [needed?] retries. Cheers - Al. *1 -- WCG don't give that as a status, but it is fairly easy to recognise which Error returns are actually "Not started by deadline" *2 -- I would suggest that anyone who needs more than 72 to 96 wall-clock hours (from collection to completion), let alone 144, to run an ARP1 task probably shouldn't be running ARP1... :-) [Edit 2 times, last edit by alanb1951 at Jul 11, 2025 1:57:32 PM] |
||
|
|
Link64
Senior Cruncher Joined: Feb 19, 2021 Post Count: 206 Status: Offline Project Badges:
|
Personally, I'd like to seem them bring back the grace period and reduce the deadlines by a day at the same time -- nowadays I can't see any reason for a 6 day deadline on something like ARP1[*2], let alone MCM1 or [presumably] MAM1 when it finally launches, and shortening the deadline might help with the occasional problems with either backed-up validations or issuing of [needed?] retries. Short deadlines are a major PITA for people running more than one BOINC project, in particular if they don't crunch 24/7 or maybe even not every day, for example not on weekends. When thinking about what can be improved, we must not forget, that BOINC isn't just for those, who build crunching farms, which can sometimes cost more than a car. But even for those power-crunchers, if they are running just WCG, short deadlines make it harder to keep their systems busy, since they can't build up a cache, which can last long enough for at least the most of the server hick-ups.The 6 days here are actually quite short already, Einstein has 14 days for their CPU WUs, Milkyway has 12 days and both projects have nearly perfectly stable servers. As long as the results are not needed for example for to create new WUs, there's no need for very short deadlines. AFAIK, only ARP WUs are created based on previous results, so there's even no need for the shorter deadlines for MCM resends. ![]() |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
Again, I mostly agree with that, but as this is supposed to be a WCG Project Status thread I will back out now, rather than discussing issues for multi-project users and the lack of generally available, easy to use, flow management for many BOINC projects.
However, I must note that I see a far higher percentage of WUs with missed deadline wingmen at Einstein (which also has long deadlines for GPU work, by the way!) than at WCG. And when I had Milkyway as a main project rather than a reserve I used to see higher percentages of WUs with missed deadlines there as well... While this has been an interesting discussion, I don't actually see WCG changing anything about task deadlines (or re-introducing grace periods). So ARP1 will continue to clear work more slowly than it could, and some users will continue to be irritated by Server Aborted tasks... Cheers - Al. P.S. I run five "standard" systems, all of them deliberately power-limited and configured for custom workloads with fast turnaround; I turn in between 650 and 730 MCM1 tasks a day depending on how many ARP1 tasks my systems actually get! I do realize that not everyone can/will put in the effort to set systems up like that... :-) |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
I run 3 systems at the moment with 52 threads available. If I have a steady supply, I return between 350 and 400 work units a day. Of that total, I return about 4 to 7 ARP units a day, once again based on the steadiness of supply. I have set the systems with fairly short caches, but big enough so I rarely run out unless there is an extended outage. I rarely see any server aborted units from either project, but they do crop up from time to time.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1294 Status: Offline Project Badges:
|
I haven't seen any ARP WUs in a while. MCM has been flowing well.
We are getting close to the end of generation 148. I hope they will start 149 over the weekend. |
||
|
|
|