World Community Grid - View Thread

World Community Grid Forums

Category: Active Research

Forum: Africa Rainfall Project

Thread: Work Available

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 3152

[ ]

Author

This topic has been viewed 2502817 times and has 3151 replies

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 937
Status: Recently Active
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Work Available

Any that have not reached a checkpoint are Aborted By Server.

How does the server know whether a task/work unit has reached a checkpoint? To my knowledge when a checkpoint is made no information is sent to the server

Mike's comment showed a common belief, but it isn't actually right; as he said "as I understand it" in the part you didn't quote I'll try to clarify what does happen...

Every time the server receives a scheduler request it sifts through the list of tasks reported as on the host (if there are any) to deal with things like resending lost tasks and aborting unnecessary tasks. As you say, it doesn't know about checkpoints, just whether a task has started or not!

If a WU has been cancelled (bad batch?) an abort will be sent whether the task is running or not. (Not a common occurrence in general!)

For a viable WU, the task will only be aborted if it is not started. (That applies whatever the reason for potential abort might be,)

[Reference: Source file boinc/sched/handle_request.cpp at GitHub as current on 2025-03-14]

Cheers - Al.

P.S. The "if not checkpointed" idea has been around a long time, and I'll admit that I subscribed to it until a few years ago when someone pointed out that it [probably] wasn't the case; I then explored the source code to see for myself :-)

[Edited to add date of source check, rewrite the first sentence and fix a typo.]

----------------------------------------
[Edit 3 times, last edit by alanb1951 at Mar 14, 2025 1:28:26 AM]

[Mar 14, 2025 1:05:33 AM]

Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1277
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

90 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

10 year badge for Mapping Cancer Markers

2 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: Work Available

Thanks Al. I agree Mike did say "as I understand it"

----------------------------------------

[Mar 14, 2025 7:01:49 AM]

MJH333
Senior Cruncher
England
Joined: Apr 3, 2021
Post Count: 265
Status: Offline
Project Badges:

10 year badge for Africa Rainfall Project


Re: Work Available

Every time the server receives a scheduler request it sifts through the list of tasks reported as on the host (if there are any) to deal with things like resending lost tasks and aborting unnecessary tasks. As you say, it doesn't know about checkpoints, just whether a task has started or not!

If a WU has been cancelled (bad batch?) an abort will be sent whether the task is running or not. (Not a common occurrence in general!)

For a viable WU, the task will only be aborted if it is not started. (That applies whatever the reason for potential abort might be,)

Hi Al,
Many thanks for this explanation.
On a few occasions, I have noticed that an ARP1 task running on my machine has already validated. I contemplated aborting it, but was worried about that affecting the machine's "reliable" status.
In the current circumstances, do you think there is any real downside in aborting, were this situation to occur again?
Cheers,
Mark

[Mar 14, 2025 9:37:14 AM]

hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 792
Status: Offline
Project Badges:

45 day badge for Help Cure Muscular Dystrophy

20 year badge for Mapping Cancer Markers

1 year badge for Outsmart Ebola Together

90 day badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project


Re: Work Available

My memory is fuzzy, but if I remember correctly, "reliable" is a server setting, and I think WCG defined that as the last 10 tasks returned on time and validated successfully. Don't quote me on that though. I don't think a user abort is a permanent stain on the device, but that's just my opinion based off a very distant memory.

----------------------------------------

i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[Mar 14, 2025 11:01:27 AM]

gj82854
Advanced Cruncher
Joined: Sep 26, 2022
Post Count: 98
Status: Recently Active
Project Badges:


Re: Work Available

If I find a WU running on my hosts and there has already been 2 valid WUs returned, I abort the running work. I would rather spend the cycles on another more useful target. I haven't noticed any degradation in reliable status as a result. There are 2 things that make that observation fuzzy though. One is the lack of accelerated and extreme work being distributed and, second, the general lack of work due to the various hosting site issues. Plus, I don't abort that many WUs by doing that so the server may not mark me unreliable. If I were to abort 7 or 8 at a time, that might be a different story.

[Mar 14, 2025 1:09:35 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12324
Status: Offline
Project Badges:

45 day badge for Discovering Dengue Drugs - Together

5 year badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

5 year badge for Uncovering Genome Mysteries

5 year badge for FightAIDS@Home - Phase 2


Re: Work Available

Again no movement of extremes. There are 318 all of which appear to be stuck.

3 accelerated moved. Those seem to be the only accelerated moving out of 451. 2 are now in 134 and 1 in 135.

602 normals moved out of 26,329 in the generations being released.

There are now 6,963 held up in generation 143.

Mike

[Mar 14, 2025 2:15:20 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12324
Status: Offline
Project Badges:


Re: Work Available

Al

My recollection predates your search of the Source Code so I will update my memory.

Mike

[Mar 14, 2025 2:19:34 PM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2148
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

90 day badge for Nutritious Rice for the World

1 year badge for GO Fight Against Malaria

100 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

20 year badge for Smash Childhood Cancer

50 year badge for OpenPandemics - COVID-19


Re: Work Available

On reliability …, from knreed's post 672219:

The stats that determine if a host is reliable is number of consecutive valid and average turnaround time. Both of these are only updated when a device returns a result.

Furthermore, from post 671952:

the reliable mechanism in the BOINC code applies to everything on World Community Grid

And, when asked about the three ultra extremes: "could you not have a short list of the fastest machines to receive them? That way they would close up faster.", Kevin responded:

I agree that would be nice, but no such mechanism exists in BOINC.

Adri

[Mar 14, 2025 2:58:24 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12324
Status: Offline
Project Badges:


Re: Work Available

So far as the 3 Ultras in generations 21 & 22 are concerned, the important thing is to get them moving again. The Extreme protocol should help them to catch up.

After them, There are 8 extremes in generations 104 - 110, followed by 12 in generations 115 - 124,

None of those generations contain more than 2 units.

We then have 295 extremes in generations 125 - 131.

All of them appear to be stuck.

There are 451 accelerated in generations 132 - 136, of which only 3 seem to be moving.

Also, some of the normals appear to be stuck.

Please bear in mind that the final generation would be 182.

Mike

[Mar 14, 2025 6:22:39 PM]

Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1277
Status: Offline
Project Badges:


Re: Work Available

I have noticed that an ARP1 task running on my machine has already validated. I contemplated aborting it.

I have contemplated the same. Reason why I have never aborted a task like this is because I am not sure if it would get sent out to another host after I aborted such task

From my experience I believe you get the runtime and points when the result is returned.

----------------------------------------

[Mar 15, 2025 12:02:34 AM]

[ ]