World Community Grid - View Thread

World Community Grid Forums

Category: Active Research

Forum: Africa Rainfall Project

Thread: Work Available

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 3595

[ ]

Author

This topic has been viewed 5883747 times and has 3594 replies

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Work Available

I thought that might provoke something.

Generation 147 has now started.

Mike

[Jun 12, 2025 11:07:53 PM]

TonyEllis
Senior Cruncher
Australia
Joined: Jul 9, 2008
Post Count: 286
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

90 day badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

10 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project

20 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: Work Available

Once "stuck" workunits are released, is it feasible to issue workunits in the Extreme category only to machines considered as reliable. This should also help to minimise delays caused by overdue workunits.

----------------------------------------

Run Time Stats https://grassmere-productions.no-ip.biz/

[Jun 13, 2025 2:14:20 AM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1327
Status: Offline
Project Badges:

14 day badge for Discovering Dengue Drugs - Together

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

14 day badge for Computing for Sustainable Water

2 year badge for Uncovering Genome Mysteries

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project


Re: Work Available

Regarding "stuck" workunits...

It is possible that some stuck units don't need a change of time step to get them moving. An example would be some Darwin tasks which went "Too Late" when 5 or 6 returned results failed to validate for some reason (Unixchick saw a few of these during April/May, along with some where it took 6 or 7 tasks to get a validation!).

I don't think we have ever been told what tools IBM set up to look at why ARP1 WUs stall so it is unclear how easy it is for them to determine an appropriate restart. There was some [brief] discussion on this in the Extremes thread at one point).

As for "Reliable" hosts, I take that to mean systems that have returned a sequence of validated results. Unfortunately, that takes no account of how long it took to do so! However, as the client should give precedence to tasks with shorter deadlines at some point (the infamous "panic mode"...) the combination of that with sending out three tasks instead of two should be sufficient.

As an aside, according to the server documentation there is also the capability to analyse host performance to see things like average return times, which might be helpful here. However, I believe it entails running a periodic task to perform said analysis, and I don't know whether WCG runs that or not (if the facility actually exists!). I suspect it hammers the database, so it would probably only be run once or twice a day.

Cheers - Al.

[Jun 13, 2025 3:40:42 AM]

geophi
Advanced Cruncher
U.S.
Joined: Sep 3, 2007
Post Count: 113
Status: Offline
Project Badges:

1 year badge for Help Fight Childhood Cancer

45 day badge for The Clean Energy Project - Phase 2

14 day badge for Computing for Clean Water

14 day badge for Uncovering Genome Mysteries

90 day badge for Outsmart Ebola Together

180 day badge for FightAIDS@Home - Phase 2

180 day badge for Smash Childhood Cancer

1 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: Work Available

On the definition of reliable, this is what knreed said in an Aug 2 2021 post in this thread were reliable hosts

https://www.worldcommunitygrid.org/forums/wcg...,41910_offset,1220#662797

"reliable" hosts (which are hosts with a history of returning results quickly and that returned a number of consecutive jobs without errors)."

In a post previous to that, (July 26 2021) he said the definition of "reliable" included those hosts returning results in 2 days or less. Previously it was 2.5 days.

https://www.worldcommunitygrid.org/forums/wcg...,41910_offset,1200#662425

Whether those same criteria and configuration are setup and working in the Jurisca Lab implementation of WCG, I don't know.

[Jun 13, 2025 6:18:46 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1327
Status: Offline
Project Badges:


Re: Work Available

geophi,

Thanks for that... It sent me back to the documentation to see what I'd missed!

The "reliable" hosts thing is, indeed, as Kevin said - I'd forgotten that there was a configurable option for the response time expected for "reliable"...

So the answer to TonyEllis's question is yes -- all they have to do is tag the initial WUs with a priority at least as high as the need-reliable priority configured for the particular application (and that should already be set in order to deal with retries...)

I'm guessing it's already set up like that now -- this year's turnaround on Extremes seemed to be quite good :-)

As for the periodic task I remembered... it is only relevant if the "Multi-size apps" option is applied to a project. That is supposed to help in sending large jobs to fast systems and smaller jobs to slower ones (e.g. Android devices that are likely to not be running 24/7). That may be useful if MAM1 production runs show as much variability as some of the beta runs!

Cheers - Al

[Jun 14, 2025 8:42:54 AM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:


Re: Work Available

As I understood it, reliable meant returning within the deadline and validate. If either criteria failed another 10 had to conform to regain reliable status.

The problem is in setting as low a deadline as possible to speed up the process but without too many missing the deadline to reduce the machine numbers too much.

We are currently on about 1.5% in the current extreme setting so a 36 hour deadline should get enough done to catch up.

But please bear in mind that reducing the TimeStep means increasing crunching time so fewer machines would classify as reliable.

Mike

----------------------------------------
[Edit 1 times, last edit by Mike.Gibson at Jun 14, 2025 1:47:21 PM]

[Jun 14, 2025 1:34:41 PM]

gj82854
Advanced Cruncher
Joined: Sep 26, 2022
Post Count: 122
Status: Offline
Project Badges:

10 year badge for Mapping Cancer Markers


Re: Work Available

Send them all to me. I'll crunch them. I can run 150 concurrently and meet the 36 hour window.
i'm running 30 concurrently now....

[Jun 14, 2025 3:52:06 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1327
Status: Offline
Project Badges:


Re: Work Available

According to the [old] BOINC wiki, the "deadline" for reliable is a project parameter, not the work-unit deadline (but that's me being pedantic!)...

If that's still correct, setting that parameter to (say) 36 hours would work fine at present but if it turns out that they need to half the time step it might need to be pushed out to 48 hours...

On the other hand, there are quite a lot of systems out there nowadays that can run an ARP1 task in well under 10 hours and [at least, on Linux] I'm seeing about 50% of each day's wingmen's work being returned within 24 hours. So perhaps leaving it at the [hypothesised] 36-hour setting would be fine anyway :-) -- there ought to be lots of capacity to handle such tasks in under 24 hours (especially if not all stalled tasks need a time step change anyway...).

[Edited to note that gj82854's post that landed whilst I was compiling this serves to confirm my point!]

Perhaps someone from WCG might chip in to tell us the current setting of the reliable_max_avg_turnaround parameter for ARP1 if they see these posts :-)

Cheers - Al.

----------------------------------------
[Edit 2 times, last edit by alanb1951 at Jun 14, 2025 4:11:12 PM]

[Jun 14, 2025 4:05:37 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:


Re: Work Available

Please also bear in mind that the latest average completion time is almost 85 hours.

Mike

[Jun 14, 2025 4:38:28 PM]

gj82854
Advanced Cruncher
Joined: Sep 26, 2022
Post Count: 122
Status: Offline
Project Badges:


Re: Work Available

Is that wall clock time or CPU time? A lot can impact the wall clock time such as suspensions due to higher priority tasks etc.

[Jun 14, 2025 6:19:52 PM]

[ ]