World Community Grid - View Thread

World Community Grid Forums

Category: Active Research

Forum: Africa Rainfall Project

Thread: Work Available

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 3593

[ ]

Author

This topic has been viewed 5821699 times and has 3592 replies

Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

45 day badge for Nutritious Rice for the World

1 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

20 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Work Available

However, one of the problems relates to the definition of 'reliable'. Because of the length of time needed to complete each unit and the need to hold a cache, very few machines can classify as 'reliable' even if they never have an error.

I would say they need to tighten the criteria. If you keep the default buffer of 0.1 + 0.5 days, you would probably qualify with no problem. "Reliable" should be special, not ordinary.

The intent should be to get good results back early of course.

----------------------------------------
[Edit 1 times, last edit by Jim1348 at Nov 26, 2019 4:03:49 PM]

[Nov 26, 2019 4:02:52 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

180 day badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Work Available

Jim

If I was to implement your settings which would mean a minimum cache of 2.4 hours and a maximum cache of 14.4 hours, I would never get any WUs as they are taking 27 hours without counting any queuing time.

Owing to the paucity of availability, the settings need to be at least 1.5 days + 1.5 days in order to get 1 and have another waiting. That would mean a turnaround of 3 days which is less than half the allowed time. I think a better definition of 'reliable' would be half the allowed time, which could be implemented as an across the board definition.

Mike

----------------------------------------
[Edit 1 times, last edit by Mike.Gibson at Nov 26, 2019 6:26:51 PM]

[Nov 26, 2019 5:48:35 PM]

Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:


Re: Work Available

I have no problem with half the allowed time, if that works for you. But I have not had problems with the 0.1 + 0.5 day cache settings on any of my machines, except for work unit availability. I run Ryzens (1700, 2600, 2700) mainly, but also Coffee Lakes, all under Ubuntu 18.04.3. I think they should tailor "reliable" for the better machines, which are becoming more common anyway.

However, whatever works for them is OK with me.

EDIT: The bottleneck seems to be identifying "reliable" machines. There may be no solution except sending out enough to find them. The more the "unreliable" machines suck them up, the longer that will take.

----------------------------------------
[Edit 2 times, last edit by Jim1348 at Nov 26, 2019 7:06:33 PM]

[Nov 26, 2019 6:38:53 PM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding

45 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

20 year badge for Nutritious Rice for the World

2 year badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

2 year badge for Discovering Dengue Drugs - Together - Phase 2

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

20 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

50 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

100 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project

50 year badge for OpenPandemics - COVID-19


Re: Work Available

Greetings,

I am making a few configuration changes after discussing it with the team. The rules for reliable hosts (per app version) had a clause that said avg turn around was 1.5 days. This was set for all projects. We are going to see if this helps get us more reliable hosts, but we are relaxing that to 2.5 days avg turn around. This change is project wide. Also, we are changing the reliable host time to complete from 35% of original deadline to 50% of the original deadline, this means a 7 day workunit will have 3.5 days instead of the 2.8 we allocated before. We will be trying these settings for the next few weeks as this problem didn't show up right away for this project. It will give us a chance to evaluate if this is the right solution project wide or if changes in the code are needed to do it application by application.

I'm working on the deployment now, so it should be in place in about 30-45 minutes.

Thanks,
-Uplinger

[Nov 26, 2019 7:42:03 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Work Available

Thanks for reading the comments 🙌

PS, could not fathom 21 per 30 minutes (1008 a day), until seeing the noon stats

Statistics Date		Total Run Time
(y:d:h:m:s)		Points Generated		Results Returned
11/26/2019		1:171:12:41:04		2,325,425		439
11/25/2019		3:308:18:05:40		6,817,641		1,309
11/24/2019		6:183:00:58:27		11,161,094		2,087
11/23/2019		8:002:16:09:25		13,827,241		2,450
11/22/2019		8:273:23:31:07		15,276,844		2,603
11/21/2019		8:281:07:34:31		15,097,973		2,536

Quite a bit less than the daily validation suggested... 1700-2000 before the randomization, than catching up to 2500.

----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 26, 2019 7:52:22 PM]

[Nov 26, 2019 7:44:25 PM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:

14 day badge for Help Cure Muscular Dystrophy

2 year badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

45 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

200 year badge for Mapping Cancer Markers

10 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

2 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Re: Work Available

Since the average time to process one of these units is a little over a day, now that Uplinger has loosed the log jam, I suspect the daily totals to rebound in a day or two. Not only is the availability back to a steady trickle, but since he has tweeked the reliable host issue, that should also increase the throughput. Hopefully we will see the effect in completed units by tomorrow.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Nov 26, 2019 8:26:02 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:


Re: Work Available

Keith

Thank you for listening to our raving. If you consider an anology with marketing, Delft would be the manufacturer, WCG would be the intermediary (or shopkeeper) and we would be the customer. Whilst the customer is not always right, woe betides a shopkeeper who doesn't listen to his customers.

I have now changed my cache settings in device profile to connecting every 1.5 days with 1.5 days extra cache to allow for my 27 hours crunching time in order to have a WU waiting for crunching to finish.

Mike

----------------------------------------
[Edit 1 times, last edit by Mike.Gibson at Nov 26, 2019 9:02:23 PM]

[Nov 26, 2019 9:01:36 PM]

littlepeaks
Veteran Cruncher
USA
Joined: Apr 28, 2007
Post Count: 748
Status: Offline
Project Badges:

90 day badge for Nutritious Rice for the World

90 day badge for Help Fight Childhood Cancer

14 day badge for Influenza Antiviral Drug Search

180 day badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

90 day badge for Computing for Sustainable Water

180 day badge for Uncovering Genome Mysteries

1 year badge for Outsmart Ebola Together

1 year badge for FightAIDS@Home - Phase 2

180 day badge for Smash Childhood Cancer

1 year badge for Microbiome Immunity Project

180 day badge for Africa Rainfall Project

2 year badge for OpenPandemics - COVID-19


Re: Work Available

Keith --

Thanks for making work units available again for this project.
I wondered why I hadn't receive any in a long time.
And I had rigged a solenoid to press my left mouse button every 15 seconds to hit the update button. (Just kidding, of course tongue

)
But anyway, I did received 2 WUs this afternoon, which should give me my bronze badge.

[Nov 26, 2019 11:07:19 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Work Available

I'd like to talk a little more about what "reliable" means, if I may.

I've come to the conclusion that I have a problem with any system which relies upon saying that a reliable host is one that returns a result within x days. That immediately excludes any machine with a cache larger than x. Why exclude them?

BOINC Manager is designed to try to ensure that all tasks are returned by their deadline. Surely the test should be more along the lines of 'How many WUs has this machine returned which were over deadline?', or perhaps 'How long ago is it since this machine returned a WU which was over deadline?'.

Example: A (sub-)project has a deadline of 7 days. WUs average 1 day to process. I have a machine which is always on and has a cache of three days to guard against outages over a weekend. Most WUs will be returned after around 3 days -- too late to record the machine as reliable if the definition is 'within 2 days'. BUT, if you send my machine a WU with a deadline of 2 days, BOINC Manager will panic and start that WU straight away, and you'll get it back in 1 day - well within the two day deadline.

Shouldn't you be measuring how well machines do what they're told to do, and not just performance against some unknowable deadline? If BOINC Manager doesn't know about it, and so cannot react to it, it is simply arbitrary and not a useful measure.

[Nov 27, 2019 12:15:51 AM]

RTorpey
Advanced Cruncher
Joined: Aug 24, 2005
Post Count: 67
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding - Phase 2

2 year badge for Help Cure Muscular Dystrophy

1 year badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

45 day badge for Influenza Antiviral Drug Search

1 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

10 year badge for Uncovering Genome Mysteries

100 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

100 year badge for Microbiome Immunity Project

20 year badge for Africa Rainfall Project


Re: Work Available

Agreed. I have a mix of older and newer machines so the return time varies from 10-60hrs. But, They are solid machines and I very rarely get an error (so far, none on ARP)!

[Nov 27, 2019 1:27:37 AM]

[ ]