Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3152
Posts: 3152   Pages: 316   [ Previous Page | 307 308 309 310 311 312 313 314 315 316 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2502832 times and has 3151 replies Next Thread
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 937
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Any that have not reached a checkpoint are Aborted By Server.

How does the server know whether a task/work unit has reached a checkpoint? To my knowledge when a checkpoint is made no information is sent to the server
Mike's comment showed a common belief, but it isn't actually right; as he said "as I understand it" in the part you didn't quote I'll try to clarify what does happen...

Every time the server receives a scheduler request it sifts through the list of tasks reported as on the host (if there are any) to deal with things like resending lost tasks and aborting unnecessary tasks. As you say, it doesn't know about checkpoints, just whether a task has started or not!

If a WU has been cancelled (bad batch?) an abort will be sent whether the task is running or not. (Not a common occurrence in general!)

For a viable WU, the task will only be aborted if it is not started. (That applies whatever the reason for potential abort might be,)

[Reference: Source file boinc/sched/handle_request.cpp at GitHub as current on 2025-03-14]

Cheers - Al.

P.S. The "if not checkpointed" idea has been around a long time, and I'll admit that I subscribed to it until a few years ago when someone pointed out that it [probably] wasn't the case; I then explored the source code to see for myself :-)

[Edited to add date of source check, rewrite the first sentence and fix a typo.]
----------------------------------------
[Edit 3 times, last edit by alanb1951 at Mar 14, 2025 1:28:26 AM]
[Mar 14, 2025 1:05:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1277
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Thanks Al. I agree Mike did say "as I understand it"
----------------------------------------

[Mar 14, 2025 7:01:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
MJH333
Senior Cruncher
England
Joined: Apr 3, 2021
Post Count: 265
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Every time the server receives a scheduler request it sifts through the list of tasks reported as on the host (if there are any) to deal with things like resending lost tasks and aborting unnecessary tasks. As you say, it doesn't know about checkpoints, just whether a task has started or not!

If a WU has been cancelled (bad batch?) an abort will be sent whether the task is running or not. (Not a common occurrence in general!)

For a viable WU, the task will only be aborted if it is not started. (That applies whatever the reason for potential abort might be,)
Hi Al,
Many thanks for this explanation.
On a few occasions, I have noticed that an ARP1 task running on my machine has already validated. I contemplated aborting it, but was worried about that affecting the machine's "reliable" status.
In the current circumstances, do you think there is any real downside in aborting, were this situation to occur again?
Cheers,
Mark
[Mar 14, 2025 9:37:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 792
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

My memory is fuzzy, but if I remember correctly, "reliable" is a server setting, and I think WCG defined that as the last 10 tasks returned on time and validated successfully. Don't quote me on that though. I don't think a user abort is a permanent stain on the device, but that's just my opinion based off a very distant memory.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[Mar 14, 2025 11:01:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
gj82854
Advanced Cruncher
Joined: Sep 26, 2022
Post Count: 98
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

If I find a WU running on my hosts and there has already been 2 valid WUs returned, I abort the running work. I would rather spend the cycles on another more useful target. I haven't noticed any degradation in reliable status as a result. There are 2 things that make that observation fuzzy though. One is the lack of accelerated and extreme work being distributed and, second, the general lack of work due to the various hosting site issues. Plus, I don't abort that many WUs by doing that so the server may not mark me unreliable. If I were to abort 7 or 8 at a time, that might be a different story.
[Mar 14, 2025 1:09:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Again no movement of extremes. There are 318 all of which appear to be stuck.

3 accelerated moved. Those seem to be the only accelerated moving out of 451. 2 are now in 134 and 1 in 135.

602 normals moved out of 26,329 in the generations being released.

There are now 6,963 held up in generation 143.

Mike
[Mar 14, 2025 2:15:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Al

My recollection predates your search of the Source Code so I will update my memory.

Mike
[Mar 14, 2025 2:19:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2148
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

On reliability …, from knreed's post 672219:
The stats that determine if a host is reliable is number of consecutive valid and average turnaround time. Both of these are only updated when a device returns a result.

Furthermore, from post 671952:
the reliable mechanism in the BOINC code applies to everything on World Community Grid

And, when asked about the three ultra extremes: "could you not have a short list of the fastest machines to receive them? That way they would close up faster.", Kevin responded:
I agree that would be nice, but no such mechanism exists in BOINC.

Adri
[Mar 14, 2025 2:58:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

So far as the 3 Ultras in generations 21 & 22 are concerned, the important thing is to get them moving again. The Extreme protocol should help them to catch up.

After them, There are 8 extremes in generations 104 - 110, followed by 12 in generations 115 - 124,

None of those generations contain more than 2 units.

We then have 295 extremes in generations 125 - 131.

All of them appear to be stuck.

There are 451 accelerated in generations 132 - 136, of which only 3 seem to be moving.

Also, some of the normals appear to be stuck.

Please bear in mind that the final generation would be 182.

Mike
[Mar 14, 2025 6:22:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1277
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I have noticed that an ARP1 task running on my machine has already validated. I contemplated aborting it.

I have contemplated the same. Reason why I have never aborted a task like this is because I am not sure if it would get sent out to another host after I aborted such task

From my experience I believe you get the runtime and points when the result is returned.
----------------------------------------

[Mar 15, 2025 12:02:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3152   Pages: 316   [ Previous Page | 307 308 309 310 311 312 313 314 315 316 | Next Page ]
[ Jump to Last Post ]
Post new Thread