Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 3152
|
![]() |
Author |
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 937 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Any that have not reached a checkpoint are Aborted By Server. How does the server know whether a task/work unit has reached a checkpoint? To my knowledge when a checkpoint is made no information is sent to the server Every time the server receives a scheduler request it sifts through the list of tasks reported as on the host (if there are any) to deal with things like resending lost tasks and aborting unnecessary tasks. As you say, it doesn't know about checkpoints, just whether a task has started or not! If a WU has been cancelled (bad batch?) an abort will be sent whether the task is running or not. (Not a common occurrence in general!) For a viable WU, the task will only be aborted if it is not started. (That applies whatever the reason for potential abort might be,) [Reference: Source file boinc/sched/handle_request.cpp at GitHub as current on 2025-03-14] Cheers - Al. P.S. The "if not checkpointed" idea has been around a long time, and I'll admit that I subscribed to it until a few years ago when someone pointed out that it [probably] wasn't the case; I then explored the source code to see for myself :-) [Edited to add date of source check, rewrite the first sentence and fix a typo.] [Edit 3 times, last edit by alanb1951 at Mar 14, 2025 1:28:26 AM] |
||
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1277 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks Al. I agree Mike did say "as I understand it"
----------------------------------------![]() |
||
|
MJH333
Senior Cruncher England Joined: Apr 3, 2021 Post Count: 265 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Every time the server receives a scheduler request it sifts through the list of tasks reported as on the host (if there are any) to deal with things like resending lost tasks and aborting unnecessary tasks. As you say, it doesn't know about checkpoints, just whether a task has started or not! Hi Al,If a WU has been cancelled (bad batch?) an abort will be sent whether the task is running or not. (Not a common occurrence in general!) For a viable WU, the task will only be aborted if it is not started. (That applies whatever the reason for potential abort might be,) Many thanks for this explanation. On a few occasions, I have noticed that an ARP1 task running on my machine has already validated. I contemplated aborting it, but was worried about that affecting the machine's "reliable" status. In the current circumstances, do you think there is any real downside in aborting, were this situation to occur again? Cheers, Mark |
||
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 792 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My memory is fuzzy, but if I remember correctly, "reliable" is a server setting, and I think WCG defined that as the last 10 tasks returned on time and validated successfully. Don't quote me on that though. I don't think a user abort is a permanent stain on the device, but that's just my opinion based off a very distant memory.
----------------------------------------
|
||
|
gj82854
Advanced Cruncher Joined: Sep 26, 2022 Post Count: 98 Status: Recently Active Project Badges: ![]() ![]() |
If I find a WU running on my hosts and there has already been 2 valid WUs returned, I abort the running work. I would rather spend the cycles on another more useful target. I haven't noticed any degradation in reliable status as a result. There are 2 things that make that observation fuzzy though. One is the lack of accelerated and extreme work being distributed and, second, the general lack of work due to the various hosting site issues. Plus, I don't abort that many WUs by doing that so the server may not mark me unreliable. If I were to abort 7 or 8 at a time, that might be a different story.
|
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12324 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Again no movement of extremes. There are 318 all of which appear to be stuck.
3 accelerated moved. Those seem to be the only accelerated moving out of 451. 2 are now in 134 and 1 in 135. 602 normals moved out of 26,329 in the generations being released. There are now 6,963 held up in generation 143. Mike |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12324 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Al
My recollection predates your search of the Source Code so I will update my memory. Mike |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2148 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
On reliability …, from knreed's post 672219:
The stats that determine if a host is reliable is number of consecutive valid and average turnaround time. Both of these are only updated when a device returns a result. Furthermore, from post 671952: the reliable mechanism in the BOINC code applies to everything on World Community Grid And, when asked about the three ultra extremes: "could you not have a short list of the fastest machines to receive them? That way they would close up faster.", Kevin responded: I agree that would be nice, but no such mechanism exists in BOINC. Adri |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12324 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So far as the 3 Ultras in generations 21 & 22 are concerned, the important thing is to get them moving again. The Extreme protocol should help them to catch up.
After them, There are 8 extremes in generations 104 - 110, followed by 12 in generations 115 - 124, None of those generations contain more than 2 units. We then have 295 extremes in generations 125 - 131. All of them appear to be stuck. There are 451 accelerated in generations 132 - 136, of which only 3 seem to be moving. Also, some of the normals appear to be stuck. Please bear in mind that the final generation would be 182. Mike |
||
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1277 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have noticed that an ARP1 task running on my machine has already validated. I contemplated aborting it. I have contemplated the same. Reason why I have never aborted a task like this is because I am not sure if it would get sent out to another host after I aborted such task From my experience I believe you get the runtime and points when the result is returned. ![]() |
||
|
|
![]() |