World Community Grid - View Thread

World Community Grid Forums

Category: Active Research

Forum: Africa Rainfall Project

Thread: Work Available

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 3593

[ ]

Author

This topic has been viewed 5830266 times and has 3592 replies

cjslman
Master Cruncher
Mexico
Joined: Nov 23, 2004
Post Count: 2082
Status: Offline
Project Badges:

90 day badge for Human Proteome Folding - Phase 2

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

1 year badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

90 day badge for Computing for Sustainable Water

10 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

90 day badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: Work Available

Who If you consider an anology with marketing, Delft would be the manufacturer, WCG would be the intermediary (or shopkeeper) and we would be the customer.

Who's on first laughing

CJSL
Gotta keep crunching...

----------------------------------------

I follow the Gimli philosophy: "Keep breathing. That's the key. Breathe."
Join The Cahuamos Team

----------------------------------------
[Edit 1 times, last edit by cjslman at Nov 27, 2019 1:47:54 AM]

[Nov 27, 2019 1:46:52 AM]

hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:

45 day badge for Help Cure Muscular Dystrophy

20 year badge for Mapping Cancer Markers

1 year badge for Outsmart Ebola Together

90 day badge for FightAIDS@Home - Phase 2

2 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Work Available

I can see how fast turnaround matters for repair work units (_2, _3, etc.) with much shorter deadlines, as well as for Beta work units and FAHB work units that are AsyncRE. But -- not to be pedantic over definitions -- "reliable" to me simply means 1) consistently turned in within the deadline, and 2) no errors (or no error within many weeks/months or error ratio is ridiculously low).

I generally keep a 1 day cache to account for my ISP outages and thieves stealing Comcast wiring as well as WCG maintenance and unplanned outages, but for ARP1 I'm keeping a cache of 3-4 days of MIP1 work units so that I don't have to babysit my devices that often and gives them a chance to get a few ARP1 work units. Otherwise they'd quickly run dry and idle or require more babysitting restocking with fresh MIP1.

Thank you Uplinger for the detailed explanations! And for the config changes. I'd love to see the createWork cache get up to 30K or more work units (60K or more results) in the future.

----------------------------------------

i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

----------------------------------------
[Edit 3 times, last edit by hchc at Nov 27, 2019 2:01:04 AM]

[Nov 27, 2019 1:57:00 AM]

DrMason
Senior Cruncher
Joined: Mar 16, 2007
Post Count: 153
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

1 year badge for Discovering Dengue Drugs - Together

180 day badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

14 day badge for Influenza Antiviral Drug Search

180 day badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for The Clean Energy Project - Phase 2

45 day badge for Computing for Clean Water

14 day badge for Drug Search for Leishmaniasis

14 day badge for GO Fight Against Malaria

100 year badge for Mapping Cancer Markers

20 year badge for FightAIDS@Home - Phase 2

50 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project

50 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Re: Work Available

First, to dispel a misconception: having an average runtime of "x" and a cache smaller than "x" DOES NOT mean that you get no units. I have a machine right now that has cache settings of store at least 0.01 days of work, and store an additional 0.1 days of work, that is crunching an ARP unit right now. That machine alone has crunched at least 4 units in the past several days. That machine has taken between 15.89 and 18.3 hours to complete and return units (which is obviously longer than 0.01 and .1 days). So, the idea that low cache numbers results in no units simply isn't correct.

Second, I feel like there's a difference in the value of "reliability" from the viewpoint of the project versus the viewpoint of some users and I don't think there's anything wrong with that. It seems like a lot of people aren't looking at it from the researchers' perspective. I understand the need (in some limited cases) or desire to cache units to crunch later. If you have intermittent internet, or a bandwidth cap, it allows people to still contribute to WCG when it's convenient. But there's always a tension between convenience ("we don't need the result right away, so we can wait a couple extra days") and what will make the project unworkable ("we've been waiting on this unit for x days now and it's holding us back from creating the next batch of units").

It sounds like they use "reliable" computers to primarily crunch in the second kind of case: re-sends on errored out, no response, aborted, or too late units. There, time would be of the essence. So what's wrong with WCG creating a definition of reliability based on "this computer makes the project run smoothest"? They are now "late" and at risk of slowing down the project, so it would be to the benefit of the project to send them to the hosts that have proven a consistently fast (or, in another word, "reliable") turnaround time. If it is redefined to allow hosts that do not crunch them immediately and quickly then that could slow down the project and these slowdowns compound quickly over time.

It's not a month into ARP, so there's no issue with testing things out to see what runs most smoothly, especially since they are not fully ramped up. But we should keep in mind that WCG is looking at the projects on a different level than the crunchers, and they may have different priorities and viewpoints. There needs to be some deference to those priorities. And to address an analogy by Mike.Gibson above, we are not the customer. The researchers are the customer. We, the crunchers, are a supplier of a good (computer power) that we donate to WCG, which is given to the customer. It's always good to know where you are in the supply chain biggrin

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by DrMason at Nov 27, 2019 2:18:16 AM]

[Nov 27, 2019 2:15:24 AM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project


Re: Work Available

Well said DrMason.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Nov 27, 2019 2:35:44 AM]

DCS1955
Veteran Cruncher
USA
Joined: May 24, 2016
Post Count: 668
Status: Offline
Project Badges:

14 day badge for Uncovering Genome Mysteries

10 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Work Available

Are we seeing our first drop off on WU? HSTB Redux... down to 828 WU yesterday. I am running on fumes to get to gold.

----------------------------------------

[Nov 27, 2019 3:18:43 AM]

DrMason
Senior Cruncher
Joined: Mar 16, 2007
Post Count: 153
Status: Offline
Project Badges:


Re: Work Available

Hey dcs1955

It seems there was a slight error in the number of workunits being sent out for a couple of days. I think this has been fixed now, and units are being pushed back out. Because of the length of the units (and I suppose some people's caching routines), the effects of the error will lag. So, in the coming days, we should see those numbers rebound.

----------------------------------------

[Nov 27, 2019 3:25:53 AM]

hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:


Re: Work Available

Well said DrMason. I don't understand the first paragraph though:

I re-read the last page or two of posts to make sure, but I haven't seen anyone assert that low buffers means no ARP1 tasks, so it appears like you created and responded to a strawman to me. I know you didn't quote anybody, but can you please point me to somebody who made the argument that "low cache numbers results in no units" in case I missed someone taking that position?

At least my position is that small buffers require more frequent babysitting, meaning if I fill a 1 day buffer with MIP1 work then switch to ARP1, I would have to check every 1 day in order to make sure the device is 100% full of work instead of sitting idle. I prefer not to micromanage all this, hence why I set the period to 3 days, which means I fill up 3 days' worth of MIP1 work and whatever ARP1 work it receives is simply icing on the cake. But at least I don't have to babysit my devices except every 3 days.

----------------------------------------

i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

----------------------------------------
[Edit 7 times, last edit by hchc at Nov 27, 2019 4:41:53 AM]

[Nov 27, 2019 4:08:19 AM]

DrMason
Senior Cruncher
Joined: Mar 16, 2007
Post Count: 153
Status: Offline
Project Badges:


Re: Work Available

@hchc Eh, I'm newish to the forums so am still figuring out etiquette, but it was a page or two ago. Is that the standard practice - if it's a page or two ago, quote it? I'll try to remember in the future. Posted the quote below to reference. wink

Jim

If I was to implement your settings which would mean a minimum cache of 2.4 hours and a maximum cache of 14.4 hours, I would never get any WUs as they are taking 27 hours without counting any queuing time.

Owing to the paucity of availability, the settings need to be at least 1.5 days + 1.5 days in order to get 1 and have another waiting. That would mean a turnaround of 3 days which is less than half the allowed time. I think a better definition of 'reliable' would be half the allowed time, which could be implemented as an across the board definition.

Mike

Not trying to construct a strawman, but I suppose it's possible that I misunderstood what Mike.Gibson was referring to in this post. If so, feel free to correct me; I never discount the possibility that I'm wrong haha.

I kinda like what you've done with your caches; if my system stops working or my internet craps out, I know what system to try out. It's a cool approach! I saw you said that thieves are stealing ISP wiring? Dang dude, that's next level...

The approach I use kind of assumes that the internet is not interrupted, and that work is constantly being done and reported, so it may not work for everyone. But my approach is that I use mainly the device manager. If the cache is set very low, it will fetch new work whenever a project finishes crunching. I then adjust the projects to limit some and encourage the others. I check the boxes of just the one or two programs I want to encourage. Since those units are scarce, then I check the box to have WCG fill the rest of the threads with whatever other projects if none of the units I want are available. To maximize efficiency, I limit the number of MIP (since the level 3 cache requirements has a knock-on effect on other work units if too many MIP units are crunching at the same time), set the project I want to encourage to "unlimited" just in case, and then set the cancer units to a high enough number to fill the rest of the threads just in case. If you have few enough machines (or groups of similar machines), you can create a profile for each tailored to how much MIP to limit and how many threads to fill with cancer workunits (if HSTB or ARP aren't available), and then they never need babysitting again. But, it takes a fair amount of effort at the start haha.

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by DrMason at Nov 27, 2019 5:15:37 AM]

[Nov 27, 2019 5:13:45 AM]

hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:


Re: Work Available

DrMason said:

Gotcha dude! I didn't read back far enough. I guess that post confused me too, so sorry for dropping the strawman thing on you and getting all debatey. tongue

Yeah, Mike's post I believe focused more on meeting the requirement for "reliable" fast turnaround time, which is hard to achieve on a really old system. My oldest system takes 36 hours to do a ARP1 work unit. My fastest knocks them out in about 12 hours.

DrMason said:

I kinda like what you've done with your caches; if my system stops working or my internet craps out, I know what system to try out. It's a cool approach! I saw you said that thieves are stealing ISP wiring? Dang dude, that's next level...

Yeah, I started out preferring a 0.1 day cache -- that way I get fresh work and turn it around immediately. My computers were super reliable and got a ton of repair work. But with such a tiny cache, sometimes the WCG maintenance window is 4 hours so that wouldn't cover it. And I've lost Internet for 1-2 days here so settled with 0.5 days then now 1 day cache. 1 day is kinda my sweet spot.

And yeah, even in a nice neighborhood here, every year we'll wake up and the neighborhood cable Internet node boxes are busted open and wiring or equipment is stolen. I mean, I hate Comcast with a thousand suns, but stealing Comcast gear knocks out Internet for a whole neighborhood.

DrMason said:

The approach I use kind of assumes that the internet is not interrupted, and that work is constantly being done and reported, so it may not work for everyone. But my approach is that I use mainly the device manager. If the cache is set very low, it will fetch new work whenever a project finishes crunching. I then adjust the projects to limit some and encourage the others. I check the boxes of just the one or two programs I want to encourage. Since those units are scarce, then I check the box to have WCG fill the rest of the threads with whatever other projects if none of the units I want are available. To maximize efficiency, I limit the number of MIP (since the level 3 cache requirements has a knock-on effect on other work units if too many MIP units are crunching at the same time), set the project I want to encourage to "unlimited" just in case, and then set the cancer units to a high enough number to fill the rest of the threads just in case. If you have few enough machines (or groups of similar machines), you can create a profile for each tailored to how much MIP to limit and how many threads to fill with cancer workunits (if HSTB or ARP aren't available), and then they never need babysitting again. But, it takes a fair amount of effort at the start haha.

Oh wow, that's interesting. I'm aware of the L3 cache issues with Rosetta/MIP1 and maybe other projects, but I'm not optimizing MIP1 at this point.

My current system is mostly "ARP1 100% if possible," but since supply is scarce as the project ramps up, I just fill up with other stuff then untick the boxes so that only ARP1 is selected which gives me a 100% chance that the clients ask for ARP1 work instead of getting spread thin.

Sorry for misreading you man, and thanks for sharing your system on how you balance your workload.

----------------------------------------

i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

----------------------------------------
[Edit 2 times, last edit by hchc at Nov 27, 2019 5:31:59 AM]

[Nov 27, 2019 5:28:02 AM]

floyd
Cruncher
Joined: May 28, 2016
Post Count: 47
Status: Offline
Project Badges:

2 year badge for Outsmart Ebola Together


Re: Work Available

When I read through this thread I get the impression that the dilemma is (A) we need to return ARP results as fast as possible and (B) many of us, including me, don't want to run without a work cache. My idea is to make the deadline for ARP tasks shorter than for other tasks, say five days. That way I could still have a cache of one or two days for other work but selectively trigger panic mode for ARP tasks, making them bypass the queue.

[Nov 27, 2019 10:08:54 AM]

[ ]