World Community Grid - View Thread - High Priority Lockout

World Community Grid Forums

Category: Support

Forum: Suggestions / Feedback

Thread: High Priority Lockout

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 22

[ ]

Author

This topic has been viewed 3740 times and has 21 replies

Dena
Cruncher
USA
Joined: Sep 9, 2006
Post Count: 13
Status: Offline
Project Badges:

45 day badge for Discovering Dengue Drugs - Together

45 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Computing for Clean Water

20 year badge for Mapping Cancer Markers

180 day badge for Smash Childhood Cancer

180 day badge for Microbiome Immunity Project

14 day badge for Africa Rainfall Project

2 year badge for OpenPandemics - COVID-19


High Priority Lockout

I run both SETI and WCG on an Apple G5 with BOINC 6.10.58 and am having a problem with SETI caused by WCG. Until this summer I ran with one day of queued of work and had no problems. Starting this summer, SETI has had three day outage every week so I bumped my queue to four day in order to ensure I didn't run out of work over the outage. First I discovered that the queue size affects all projects and I am unable to change just one. That was not a big issue but I discovered that for WCG work units that are sent out for the third time, the time limits for a big work unit may be as short as four days. This causes BOINC to enter high priority mode and lock out normal operation for SETI. As the result it's possible to enter the outage short on SETI work.
The solutions is simple to correct. If WCG work units were issued unconditionally with a time limit of 10 or more days, it would be possible to process them before they caused BOINC to enter high priority mode.
I want to continue crunch for both projects but if I keep getting high priority work units I may need to reduce the impact WCG is having on BOINC.

[Sep 8, 2010 12:02:21 AM]

RaymondFO
Veteran Cruncher
USA
Joined: Nov 30, 2004
Post Count: 561
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding

10 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

1 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

2 year badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

1 year badge for Influenza Antiviral Drug Search

10 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

10 year badge for The Clean Energy Project - Phase 2

10 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

10 year badge for GO Fight Against Malaria

5 year badge for Computing for Sustainable Water

10 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: High Priority Lockout

The four (4) day windows are "repair" work units whereas all other work units are for ten (10) days which is more than enough time for a computer to complete a work unit. The BOINC will normally enter the high priority if BOINC believes the work unit will not be returned by the deadline, and this cannot be avoided. I honestly do not know how to prevent your reliable computer from receiving these repair work units, other than consistently returning work units much later than you do now. I believe (I am not sure about this part) but WCG will distribute repair work units to internally rated reliable computers with turnaround times less than two or four days.

[Sep 8, 2010 12:24:14 AM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: High Priority Lockout

Dena,

If your cache is set permanently to over 4 days cache, the client should eventually stop receiving these 'repair' jobs. The rule is that if a device does report work frequently over 2 days, the client is taken out of the highly reliable group... but it takes some days of WCG work reporting... it's not like the first few results coming in late.

Someone wrote somewhere else that on Sunday he stopped all project work fetches, increased the cache size from normal to a number to be able to bridge the outage and then allowed Seti only to backfill. After he sets the cache back to low and allows all projects to fetch work again. The scheduler then takes care for the short deadline jobs to be completed in time... the cache is effectively being emptied first before work fetch resumes.

The good part is the debt system will balance things out, so after the rushing and cache emptying has ended, your other other projects will return to get more time to compensate, but that works over multiple days, a week sometimes and with this Seti scheme it will be a permanent disarray of days where no work of them is done.

Thanks for continuing to contribute to WCG.

edit: think Seti goes completely off-line, so advise to flip on the scheduled connecting function and set it to for instance 1 hour a day so other projects can report that what is done... from 23:00 to 24:00 UTC. In the client you can set the network schedule for each day, then activate the Network based on preferences. I'm sure over at the alien seekers forum someone will have written up a workaround for their problem that's effecting the whole grid community.

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

----------------------------------------
[Edit 1 times, last edit by Sekerob at Sep 8, 2010 6:26:40 AM]

[Sep 8, 2010 6:20:50 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: High Priority Lockout

Hello Dena,
My point of view about running BOINC and SETI is the following:

I would run SETI only, if it has absolutely no influence on BOINC.

Personally, I do not like SETI, because it does not solve any problem on earth, and we have enough problems on earth, and in my opinion, we should solve those problems fist.

But as far as I see, if I run BOINC at 100% there are no resources left for SETI.

So, in other words, SETI simply must have negative influence on BOINC, simple mathematical
logic......

I would like to ask for further explanation about the influence of SETI on BOINC,

I would like too know as much as possible about running both programms.
Could you make it clear at first please, do really mean running them simultanously, at the same time?

Thank you.
Martin Schnellinger

[Sep 8, 2010 4:38:09 PM]

Dena
Cruncher
USA
Joined: Sep 9, 2006
Post Count: 13
Status: Offline
Project Badges:


Re: High Priority Lockout

Well all hello broke loose last night. As I have been returning work units I have been issued mostly repair units to refill my queue. This locked out normal processing for WCG and the long term work units have aged to the point that they are now classed as high priority work units. I now have both cores processing WCG in high priority and none working on SETI work units.
I could abort some work units to clean this mess up now, but I am not going to do that to my wingman. What I am doing instead is setting No New Task on WCG so I don't get any more repair units and see if I can flush the WCG work units through normal processing. I like to run set and forget because I have other things to do with my time, but till this problem is corrected I will need to spend more time than I want to spend hand feeding WCG.
It is a big mistake issuing repair units with short times on them because they will often force high priority mode. High priority was intended to allow the client to clean up problems on it's end. What the short dead line is doing is forcing high priority mode to clean up records you don't want hanging around on the server. This causes problems on the client because it takes away some if it's ability to manage work leading to the problem I now face.
I question how well the debit balancing works under some conditions. I know of three ways it might not work as one might expect and I am exposed to all of them.
1. I run a resource share of 100%/100% for SETI and WCG. The idea being if a project lacks work, I am not to worried about payback. This has worked out well for WCG as SETI has a good deal of down time One was around three weeks when they had a server outage without a fall back system or the money on hand to replace it. They run on a shoe string budget and outages due to hardware failure are not uncommon.
2. If not running 100%/100%, I suspect share is not applied if a project runs out of work. I consider this a fair decision because some projects may go months without work so share should only be applied when work is available for the project.
3. I turn my system off about 8 hours a day and I am unsure if the share information is retained through a power off cycle.
As for playing around with when you get work for SETI, you don't do that. With SETI you get work when you can because the pipe is blocked to the point where you get a download window when you can find one. Here is a link to show you why you don't control SETI downloads. Link to Cricket
From what I gather, some SETI crunchers do crunch WCG work units but often they only crunch SETI and run a 10 day queue to avoid work shortages. I try to spread the word that WCG always has work units so they will never run dry but many would rather complain about the lack of work units instead of joining a useful second project. Go figure.

P.S. I have been running a 4 day queue for several weeks now and SETI has a 4.4 day turn around time. I would assume that WCG has about the same turn around time by now and I am still getting repair units. I don't mind the idea of repair units as long as I can process them like normal units.

----------------------------------------
[Edit 1 times, last edit by Dena at Sep 8, 2010 4:58:43 PM]

[Sep 8, 2010 4:53:54 PM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

180 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

1 year badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: High Priority Lockout

If your cache is set permanently to over 4 days cache, the client should eventually stop receiving these 'repair' jobs. The rule is that if a device does report work frequently over 2 days, the client is taken out of the highly reliable group... but it takes some days of WCG work reporting... it's not like the first few results coming in late.

The quickest method to make sure you won't get any short-deadline WCG-work is to set "Connected..." to 4.01 days (or higher), since the scheduling-server won't give-out work if "Connected..." > "deadline", except if you've completely out of work in a project you'll actually allowed a single wu, even if you've got no hope of returning it by the deadline...

With a 4-day "Connected..." you'll also making sure (if at all possible), all work is finished 4 days before the deadline, this can be important for the SETI-work, since this make sure no work has a deadline during the time SETI is shut-down.

Note, on the flip-side, a 4-day "Connected..." will also make sure the cache is always atleast 4 days, if you've got one "main" project and one or more low-resource-share "backup"-projects, you'll likely to fill-up with upto 4 days from the backup-projects (since it's v6.10.xx). If "Connected..." is zero and uses 4 days "Additional..." instead, you'll not fill-up with 4 days from the backup-projects. (A zero resource-share project will only be asked if idle cpu, so for such projects you won't normally get a 4-days cache either way...).

As for falling-out of the "reliable"-range, avg_turnaround is weighted 70% old and 30% new, meaning if you're at 0.25 days (6 hours) and returns 3 results after 3 days, you'll at 2.06 days and will therefore be out of the "reliable"-range. Similarly, a single result returned after 7 days means 2.275 days, so a single result is enough to bring you out of the "reliable"-rating. Oh, and in case you misses the deadline on a result, you'll immediately jump to whatever the deadline was.

Except if you misses the deadline, avg_turnaround is updated by the validator, so if you're not running single-redundancy-work, it can be some delays in increase/decrease of the average turnaround-time.

BTW, on the flip-side, even if you've at 10 days average, 7 results returned after 1 day, and you're within the "reliable"-range again. 7 is the same number you'll need to get-back a full quota, in case you've hit a bad batch of Proteome folding or something, so you've fallen-down to 1 in quota.

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

[Sep 8, 2010 5:06:03 PM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: High Priority Lockout

Dena,

As you found, the receipt of repair jobs perpetuates that opinion at WCG that your client is still reliable. There's dirty ways to be knocked out such as grabbing a bunch of work and then suspending them for over 4 days and then complete them, but maybe a mail to the support team will allow them to flip a switch.

I think Seti is down for 3 days and then there's 4 days of competition to upload. To me there's some design with intend in what they're doing there more than meets the eye. They know people will be hoarding work so the caching will only cause more of the panic state conditions, Seti getting the prime attention. Not true of course, but if it were, real bad BOINC citizenship ;>)

And yes share information is a part of actual computed time, be it an hour a day or all day. Part time crunching will make the balancing to take longer to equilibrate though.

4.4 days deadline is very hmmm with their pipe problems i.e. they try squeezing 7 days production into 4 days network availability, but guess they're tremendously short on storage resources too so they have to off-load also at much higher pace.

Frankly, I think you're better off to schedule 15 days WCG and then 15 days Seti. Much less of a race. At the end of the month each have their equal share without constant competition.

Would that work for you?

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

----------------------------------------
[Edit 1 times, last edit by Sekerob at Sep 8, 2010 5:19:06 PM]

[Sep 8, 2010 5:18:28 PM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: High Priority Lockout

Ah, yes the old 4.01 connect trick of Ingleside... it's such a fun easy solution. :D

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

[Sep 8, 2010 5:20:53 PM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:


Re: High Priority Lockout

3. I turn my system off about 8 hours a day and I am unsure if the share information is retained through a power off cycle.

There's no problem with retaining the info even if you shut-down for 8 hours/day, BOINC-client also knows you does shut-down for 8 hours/day and adjust how much work it should get to keep your cache-size full.

As for how resource-share is balanced, short-term it's not, but long-term it should be more or less trying to follow your settings. But of course, a 0.1%-share or something will generally always get too much work, while a project like LHC@home that can take many months between work will never manage to reach even a low share like 25% or something...

From what I gather, some SETI crunchers do crunch WCG work units but often they only crunch SETI and run a 10 day queue to avoid work shortages. I try to spread the word that WCG always has work units so they will never run dry but many would rather complain about the lack of work units instead of joining a useful second project. Go figure.

It's always been like this, even SETI@home expressely advices users to join other projects, still 80% - 90% is only attached to SETI@home. Going by http://boinc.netsoft-online.com/e107_plugins/boinc/get_cpcs.php roughly 5.5% of SETI-computers also runs WCG, with the most popular project alongside SETI being Einstein@home at 11.3%.

On the flip-side, 7.3% of WCG-computers also runs SETI@home. wink

P.S. I have been running a 4 day queue for several weeks now and SETI has a 4.4 day turn around time. I would assume that WCG has about the same turn around time by now and I am still getting repair units. I don't mind the idea of repair units as long as I can process them like normal units.

WCG doesn't display this kind of info, but lots of WCG-work in HP doesn't neccessarily mean it's a high turnaround...

In any case, as mentioned in my other post, setting "Connected..." > 4.01 days should stop you getting the repair-WCG-work.

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

[Sep 8, 2010 5:27:15 PM]

Dena
Cruncher
USA
Joined: Sep 9, 2006
Post Count: 13
Status: Offline
Project Badges:


Re: High Priority Lockout

The SETI outage is because of lack of resources. Over the years they have processed a large amount of data but it has not been examined for results. Over the 3 day outage, they are taking over the data base so they can look at the results. They don't have the resources to look at the data and distribute work at the same time.
Also in this window, they are doing the development work on next release of Boinc.
People are not happy about this, but few are donating the money to resolve the problem. Unlike WCG, much of SETI's funds come from donations.

[Sep 8, 2010 5:44:27 PM]

[ ]