World Community Grid - View Thread

World Community Grid Forums

Category: Active Research

Forum: Africa Rainfall Project

Thread: Work Available

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 3596

[ ]

Author

This topic has been viewed 5889975 times and has 3595 replies

DCS1955
Veteran Cruncher
USA
Joined: May 24, 2016
Post Count: 668
Status: Offline
Project Badges:

20 year badge for Mapping Cancer Markers

14 day badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Work Available

hchcs said:

* Easy way: Fill up a buffer (say...1-3 days, personal preference) with other work such as MIP1 or MCM1, then untick those projects and only tick ARP1, then cross your fingers for the next x days.

* Slightly harder: Automate the process with Windows Task Scheduler or Linux crontab and [politely] ask for work once or twice per hour.

I follow step one, on the machine I have been getting the most of ARP's since randomization I get a good backlog of MIP, then only select HSTB and ARP. Since it executes MIP in 1.5 hrs I send out a request for new data. I have been getting 2-3 WU of ARP/HSTB per day. If they include -2 resends I suspend some MIP's to prioritize crunching. (Mystery why they do not automatically get to the top of que since their due date is sooner than the MIPs???)

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by dcs1955 at Dec 25, 2019 5:48:57 PM]

[Dec 25, 2019 5:47:13 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

5 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Work Available

The problem with suspending any work is that no work will be requested whilst any units are suspended.

I find that if I suspend newer units to get arp1 to start, I can usually resume them straight away and arp1 will continue and new work will not be blocked.

That assumes that the arp1 deadlines are before that of the relevant other unit. By relevant I mean the number of threads on your machine minus the number of arp1 units and plus one.

Failing that, I control the work units using app_config.

Mike

[Dec 25, 2019 7:16:40 PM]

DCS1955
Veteran Cruncher
USA
Joined: May 24, 2016
Post Count: 668
Status: Offline
Project Badges:


Re: Work Available

Mike Gibson said

The problem with suspending any work is that no work will be requested whilst any units are suspended.

Yeah I just suspend them just before one of the MIP's is completed. So only one hour and half request is lost.

Mike do know why BOINC does not start up the WU with the soonest due date?

----------------------------------------

[Dec 25, 2019 7:59:02 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:


Re: Work Available

No idea why not. Perhaps Uplinger could answer that.

However,arp1 and mcm1 both have 7 day deadlines. I received 2 standard arp1 units at 20:49 GMT (UTC) which followed on from my existing mcm1 units. As 2 mcm1 units were due to finish within 1.5 hours, I did not intervene and they started as soon as the mcm1 units finished.

mip1 units have a 10 day deadline so arp1 and mcm1 units run ahead of them.

The only time that units do not run in deadline order is if they are re-sends. They do not seem to get priority for about a day after receipt. Wierd? Then you do need to use app_config if you want them to run immediately.

It possibly depends on how long your machine takes to run arp1 units. My machine takes about 24 hours, so losing a day should not affect my machines reliability status. Having said that, I haven't seen a re-send for some time.

Mike

[Dec 25, 2019 10:45:17 PM]

DCS1955
Veteran Cruncher
USA
Joined: May 24, 2016
Post Count: 668
Status: Offline
Project Badges:


Re: Work Available

M. Gibson said

It possibly depends on how long your machine takes to run arp1 units. My machine takes about 24 hours, so losing a day should not affect my machines reliability status. Having said that, I haven't seen a re-send for some time.

Mike

You may be onto something. My "Favored" machine does them in 14hr which probably explains why they are not prioritized. It is strange that BOINC does not follow the KISS principle. On the WU release side I get resends about 1/3 of the time. My machine may get resends because it crunches them quickly. That makes sense.

----------------------------------------

[Dec 26, 2019 12:56:08 AM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1327
Status: Offline
Project Badges:

14 day badge for Discovering Dengue Drugs - Together

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

14 day badge for Computing for Sustainable Water

2 year badge for Uncovering Genome Mysteries

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project


Re: Work Available

M. Gibson said

Actually, BOINC is applying KISS... By processing stuff in [more or less] the order in which it arrives, it ensures that the work with the longest deadlines doesn't keep getting pushed to the back of the queue and ending up being run in panic mode!

At the moment (as Mike indicated) most tasks have a 7-day deadline if they aren't retries (but see below...), and one project is 10-day. Unfortunately, there's a spoiler project in the mix, as FAH2 has a 1-day deadline and if you allow it to download a lot of those they will soon need to run in panic mode! That's easily resolved by telling WCG how many you want at a time and using app_config.xml to constrain the number that will run at the same time.

Now, retries are a whole different matter simply because they may have shortened deadlines and that will eventually force them to the front of the queue. I'm unsure of the retry deadline algorithm but it does seem to take into account how much time has elapsed since the original task was issued (so the ARP1 retry I got today because of a "can't find files" error on someone else's Linux box still has the full 7 days as that was an instant fail(!) whereas I've had others with shorter deadlines - indeed, I've just noticed I have an MCM1 job with a 4-day deadline...)

I have a laptop that tends to start FAH2 tasks almost as soon as it receives them, even though I only get one at a time! I suspect that means that it takes into account more than just the estimated run time (about 4.5 hours); perhaps buffer sizes are also taken into account? (I have buffers of a day or less)

And however it decides when tasks need to queue-jump, bear in mind that the more candidates for queue-jumping there are, the sooner it's likely to start to happen...

Cheers - Al.

P.S. Yes - we get the resends because we have fast-turnaround "reliable" machines; there don't seem to be that many retries for ARP1 despite the deadlines and task sizes, but over 10% of the FAH2 work I get is retries for all sorts of different failures... 33% seems a bit high, though - you are privileged! smile

[Dec 26, 2019 2:52:33 AM]

DCS1955
Veteran Cruncher
USA
Joined: May 24, 2016
Post Count: 668
Status: Offline
Project Badges:


Re: Work Available

Al thanks for splaining this.. I am not going to play games in the future in suspending WU to force ARP to crunch. I was into serious FAH2 crunching, until I switched to a tethered connection with my cell phone so it is not practical for the trickle handshake. I hope FAH1 restarts soon since it is not completed as far as Klick's last report. What ever happened to his/her reports?? PS maybe 33% is an over estimation but I see three or four per week.

----------------------------------------

[Dec 26, 2019 5:24:41 AM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:


Re: Work Available

Al

For your information, re-sends are normally given 50% of normal time so 3.5 days (rather than 4 days) unless they get bounced within about a day when they would get the normal time of 7 days. I suspect that is so that the deadlines for both versions are fairly close.

I would agree that a combination of restricting the cache in device profiles and the number running at any one time using app_config is the best combination but the total jobs in app_config should be slightly higher than the number of threads to allow for shortages in supply.

I have 8 threads so have set app_config to 3 arp1, 7 mcm1 and 1 mip1 to allow for the intermittent supply of arp1 . My machine crunches arp1 in about 1 day, the maximum arp1 I have ever had at one time is 5 and I can return 2 consecutively to remain a 'reliable' machine.

Device profiles are set for unlimited (arp1), 8 (mcm1) and 2 (mip1). The extra one for each of mcm1 and mip1 allows for the time between uploading one unit and downloading the next unit. Those are in permanent supply it seems.

Mike

[Dec 26, 2019 10:17:54 AM]