World Community Grid - View Thread - Reserve WUs (workunits) stopped filling up the stock.

World Community Grid Forums

Category: Support

Forum: Website Support

Thread: Reserve WUs (workunits) stopped filling up the stock.

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 40

[ ]

Author

This topic has been viewed 4526 times and has 39 replies

JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

1 year badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project

1 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

10 year badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

180 day badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Reserve WUs (workunits) stopped filling up the stock.

From then on, keep your client connected as often and as long as you can.

andzgrid, I mean "as often and as long as it is OK for you", of course. I am not trying to push you to additional constraints or to extra connection charges.

Cheers. Jean.

----------------------------------------

Team--> Decrypthon -->Statistics/Join -->Thread

[Jul 23, 2009 1:15:20 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Reserve WUs (workunits) stopped filling up the stock.

Hello Ingleside.

Reference is made to your (Jul 22, 2009 10:37:52 PM) post.

Regarding the matter of the "pre-Run toCompletion-runTimeDuration-estimate_vs_post-Run statedRunTimeDuration" (let's call this item, say, "PRUDS"), I have not given this item close scrutiny, but from what I can recall now, I get many ±1hrs or ±2hrs, and rarely ±3hrs PRUDS deviation.

JmBoullier's (Jul 22, 2009 9:57:58 PM) post suggested quite the reverse from your recommendation for the connectTimes and buffer. He recommended 0.0 connectTimes and 10days buffer. I think that these two sets of settings will workout to effect the same thing: the deadline would work out to a similar range of values: 8-days for your recommendation and 10-days for his suggestion. Those two parameters are similar enough as the similarity between the statement "half-full" and the statement "half-empty".
.

[Jul 24, 2009 9:47:55 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Reserve WUs (workunits) stopped filling up the stock.

Hello JmBoullier

Thanks for your (Jul 22, 2009 9:57:58 PM) post which I found informative and helped me in my effort to better understand the things surrounding my issue. I have been picking up bits and pieces of information here and there, and together with your informative post as well as from others, I think I can now tie it all together.

Also, I followed your suggestion in my sync-up today and I got a "heavier" workload now. I got 68-WUs with each having a 7,8,or 9-hr "to-completion" runTimes. If those runTimes hold true, I definitely now have close to what I have in mind for a WU workload. Except for 7 WUs with a 4-day deadline (I wonder what happened here..), the rest each carried a 10-day deadline. This is definitely an improvement.

Thanks, and I'll keep WCG posted
.

[Jul 24, 2009 10:41:15 PM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

180 day badge for Discovering Dengue Drugs - Together

1 year badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Reserve WUs (workunits) stopped filling up the stock.

Except for 7 WUs with a 4-day deadline (I wonder what happened here..), the rest each carried a 10-day deadline. This is definitely an improvement.

Short answer: you got 7 wu's with a 4-day deadline because your "Connect..." is too low, "Connect..." should only be zero if you've connected 24/7.

Long answer:
For someone connected 24/7, there's little practical difference between:
A: "Connect..." 3 days + "Additional..." 4 days
B: "Connect..." zero days + "Additional..." 7 days.

For someone connected 24/7, both cache-settings A and cache-settings B will give 7 days total cache-size. The only real difference between A and B for someone connected 24/7 is, A won't give any work with less than 3-days deadline, except if got an idle cpu.
Method B on the other hand will happily give you many wu's with a 2-day deadline, actually you can get upto 2 days worth of wu with a 2-day deadline...

For someone that is not connected 24/7 on the other hand... there's 2 reasons for choosing method A instead of method B.

Reason #1: If you example only connects every 3 days, any work with < 3 days deadline can't be returned by the deadline, since you won't re-connect early enough. As long as "Connect..." > 3 days, you won't download work with too short deadline.

Reason #2: This is a little harder to explain, so let's make a little example:
Computer, connects every 3 days, total cache 7 days.
Day-0: Downloads 3 days of work with 10-days deadline, and afterwards 2 days of work with a 5-days deadline, and at last 2 days of work with 10-days-deadline. Total cache, 7 days.

If cache-settings B:
Day-0 to Day-3: Since no deadline-problems, starts crunching wu's in download-order, meaning starts crunching the work with 10-days deadline.

Day-3: Return the finished work, download more work, let's say 3 days of work with 10-day deadline.
Cached work, in cache-order:
2 days of work with 2-day-deadline.
2 days of work with 7-day-deadline
3 days of work with 10-day-deadline.

Day-3 to Day-5: the 2-day-deadline-work runs "High priority".

Day-5: still one day left till next connection, even all work finished before the deadline, it can't be returned by the deadline.

Day-6: 2 days worth of results is returned 1 day after the deadline. For all BOINC-projects except CPDN and WCG, it's a good chance users won't get any "Points" for this work, and results won't be scientifically usable.

Compare this with cache-setting A:
Day-0 to Day-3: Client detects work with 5-day-deadline is in deadline-trouble, and starts running this work "High Priority".
Afterwards, crunches 1 day of work with 10-day-deadline.

Day-3: Return the finished work, download more work, let's say 3 days of work with 10-day deadline.
Cached work, in cache-order:
4 days of work with 7-day-deadline
3 days of work with 10-day-deadline.

This will continue, and normally no work is returned after the deadline.

So, to sum-up, use method A because:
Reason #1: To not get work with shorter deadline than how often you connect.
Reason #2: To make sure all cached work is finished early enough to be reported by their deadline.

So, if you connects every 2 days, "Connect..." should be 2.1 days or something. If you connects every 5 days, "Connect..." should be 5.1 days or something and so on. Still, due to the 10-day normal deadline on WCG-work, "Connect..." should max be 9 days, even if 9 days between connections.

Also, if 10 days or more between connections, WCG is unfortunately not usable, and you should choose another BOINC-project instead.

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

[Jul 25, 2009 12:30:52 AM]

JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:


Re: Reserve WUs (workunits) stopped filling up the stock.

andzgrid,
First I am glad that things are improving considerably.

About short deadline WUs (often called rush jobs or emergency WUs or replacement WUs in these fora)

These WUs are sent by the servers to replace a WU that had been sent to somebody else and has been
- not returned in time
- or returned in error
- or found "inconclusive" at validation time.

These WUs are currently receiving a deadline equal to 40 % of the original deadline, i.e. 4 days nowadays. In principle if a replacement WU falls again in one of the 3 situations above it could receive a deadline of 0.4 x 4 = 1.6 day, but this is extremely rare for the following reason.
Servers send these rush jobs only to what we call "reliable" devices, which means returning error free results with a fast turnaround time (less than two days currently).
So the fact that you have received a few of these rush WUs confirms that the servers are happy with your machine.
Most often the Boinc client will process such rush WUs in high priority mode, i.e. it will suspend a task with a normal deadline and start the rush WU immediately to make sure that it will be ready to return as fast as possible. However if it can happen that you need to be offline for 4 days or more these rush WUs might be a problem for you.

Setting the "Connect" and "Additional work" parameters

I must admit that I had not envisaged that you could receive such short deadline tasks, simply because once we will have succeeded to ensure you a large cache of work you will not return normal tasks in less than two days and the servers will no longer send you rush WUs.
If it is difficult for you to ensure that possible rush WUs that you might receive at the beginning can be returned in time, i.e. that you can connect earlier than planned if necessary, then you should choose the solution advised by Ingleside and set "Connect" to 4 days and "Extra work" to 5 days.
However I suspect that if the client knows that you may be offline 4 days at once it will send you less tasks than the 9 days you are aiming at, because tasks that you would receive on day 0 in excess of 6 days would possibly nor be returned in time if you have a connection on day 6 for example. That is why my advice was for a "Connect" parameter set to 0.
But if you connect more often than 4 days in reality this possible problem is less likely to happen because tasks at the end of the queue will have been received later and will have a due date later as well.

Now you know the implications of each solution and you can decide which one you prefer. A reasonable compromise could be to choose Ingleside's solution at the beginning until normal WUs get returned with a turnaround time greater than two days, then switch to my solution to help your cache of work to grow up to its possible maximum.

One last comment about your "pre-Run toCompletion-runTimeDuration-estimate_vs_post-Run statedRunTimeDuration" (let's call this item, say, "PRUDS")", it would make communication simpler if we called it by its real name, i.e. Duration_Correction_Factor (DCF) (see Ingleside's post dated Jul 22, 2009 10:37:52 PM). smile

Read you later. Jean.

----------------------------------------

Team--> Decrypthon -->Statistics/Join -->Thread

[Jul 25, 2009 3:36:22 AM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:


Re: Reserve WUs (workunits) stopped filling up the stock.

These WUs are currently receiving a deadline equal to 40 % of the original deadline, i.e. 4 days nowadays. In principle if a replacement WU falls again in one of the 3 situations above it could receive a deadline of 0.4 x 4 = 1.6 day, but this is extremely rare for the following reason.

wu.delay_bound isn't changed in the database, instead the scheduling-server sets delay_bound = wu.delay_bound, and afterwards adjusts delay_bound in case of "reliable".

Meaning, the "reliable"-deadline should be the same, even if subsequent re-issues is neccessary.

I must admit that I had not envisaged that you could receive such short deadline tasks, simply because once we will have succeeded to ensure you a large cache of work you will not return normal tasks in less than two days and the servers will no longer send you rush WUs.

This holds true.... until WCG starts a new sub-project, with "normal" shorter deadlines, and suddenly users starts getting a ton of work they've no hope of returning before the deadline, because their "Connect..." is too low...

Method B "after time" works with WCG, while method A works immediately with WCG, and all other BOINC-projects...

However I suspect that if the client knows that you may be offline 4 days at once it will send you less tasks than the 9 days you are aiming at, because tasks that you would receive on day 0 in excess of 6 days would possibly nor be returned in time if you have a connection on day 6 for example. That is why my advice was for a "Connect" parameter set to 0.

Hmm, not sure with v6.2.xx, but you'll likely right for v5.10.45 atleast...

Hmm, since you're connecting at day-6, you'll example re-connecting at day 2.3, 6.2 and 10.1. Let's also say each wu takes exactly 6 hours to crunch. If so, the scenarios for the two cache-methods is probably something like:

A: day-0, download 6 days of work with 10-day deadline.

day 2.3: Reports 2 days of finished work, and downloads 2 days more work. This gives cache, in cache-order:
4 days of work with 8-day deadline.
2 days of work with 10-day deadline.

day 6.2: Reports 4 days of finished work, and downloads 4 days more work. This gives cache, in cache-order:
2 days of work with 6-day deadline.
4 days of work with 10-day deadline.

day 10.1: Reports 4 days of finished work, and downloads 4 days more work. This gives cache, in cache-order:
2 days of work with 6-day deadline.
4 days of work with 10-day deadline.

day 14.0: Reports 4 days of finished work, and downloads 4 days more work. This gives cache, in cache-order:
2 days of work with 6-day deadline.
4 days of work with 10-day deadline.

Day 17.9: Reports 3.75 days of finished work, and downloads 3.75 days more work. This gives cache, in cache-order:
2.25 days of work with 6 day deadline.
3.75 days of work with 10 day deadline.

Well, I can continue, but regardless, with cache-method A, all work is returned by their deadline.

Let's compare with method B:
day-0: Downloads 9 days of work with 10-day deadline.

day 2.3: Reports 2 days of finished work, and downloads 2 days more work. This gives cache, in cache-order:
7 days of work with 8-day deadline.
2 days of work with 10-day deadline.

day 6.2: Reports 4 days of finished work, and downloads 4 days more work. This gives cache, in cache-order:
3 days of work with 4-day deadline.
2 days of work with 6-day deadline.
4 days of work with 10-day deadline.

Day 10.0: 3 days worth of work is not reported by their deadline.

day 10.1: Reports 4 days of finished work, thereof 3 days of finished work reported after their deadline, and downloads 4 days more work. This gives cache, in cache-order:
1 day of work with 2-day deadline.
4 days of work with 6-day deadline.
4 days of work with 10-day deadline.

Day 12.0: 1 day worth of work is not reported by their deadline.

day 14.0: Reports 4 days of finished work, thereof 1 day after it's deadline, and downloads 4 days more work. This gives cache, in cache-order:
1 days of work with with 3-day deadline.
4 days of work with 6-day deadline.
4 days of work with 10-day deadline.

Day 17.0: 1 day worth of work is not reported by their deadline.

Day 17.9: Reports 3.75 days of finished work, and downloads 3.75 days more work. This gives cache, in cache-order:
1.25 days of work with 2-day deadline.
4 days of work with 6-day deadline.
3.75 days of work with 10-day deadline.

Well, I can continue, but it seems that for each 4 new days, 1 days worth of work is reported after their deadline with method B...

Hmm, between method A: "Always returns all work by the deadline", and method B: "25% of crunched work is worthless, just discarded by project and not used, and the only reason gets any 'points' for this is WCG is AFAIK the only BOINC-project that gives credit for useless work returned after the deadline", what should a user choose...

Well, to me the answer is obviously method A...

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

----------------------------------------
[Edit 1 times, last edit by Ingleside at Jul 25, 2009 1:00:35 PM]

[Jul 25, 2009 12:56:21 PM]

JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:


Re: Reserve WUs (workunits) stopped filling up the stock.

Ingleside,
The fact that filling up the cache up to or very close to the normal deadline value will cause a real number of "too late" returns is inherent to this practice. This is why we (CAs) usually recommend to not set the cache so high.

Your solution should normally decrease this risk (as long as andzgrid can connect at least every 5 days) but it does not answer his/her initial question which was to be able to use the full 10-day capacity of the cache. At least your detailed example should clearly show the fact that it might not be a desirable objective as long as the normal deadline of WCG WUs is not much higher.

Cheers. Jean.

----------------------------------------

Team--> Decrypthon -->Statistics/Join -->Thread

[Jul 26, 2009 9:46:41 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Reserve WUs (workunits) stopped filling up the stock.

Hello WCG.

Attention:
•Ingleside
posts:
- Jul 25, 2009 12:30:52 AM
- Jul 25, 2009 12:56:21 PM
•JmBoullier
posts:
- Jul 25, 2009 3:36:22 AM
- Jul 26, 2009 9:46:41 AM

Gentlemen:
The treatment that each of you gave the subject of WU-cache, WU-deadlines, connectTimes and their inter-relations, I must say, I find as splendid. Let the theory now meet the real world. Below are the results of what happened under my case using the setting (connect:0.0 days; cache:10-days) for the latest two(2) sync-up instances I made.

---------------------------------------------------------------------------------------
Sync-up Datime (GMT) | WUs (up / down)-loaded | deadlines
---------------------------------------------------------------------------------------
2009.07.24Fr.1905 | 47+2 / 66+2 | 7 WU (4-days) + 61 WU (10-days)
2009.07.28Tu.1155 | 64+4 / 92+4 | 96 WU (10-days)
---------------------------------------------------------------------------------------

My estimated ideal workload for my machine is about 100 WUs with each WU having a 10-day deadline; and this estimate is based on the assumption of an average of 8hr-runtime for WUs. The WU runtimes that I have been getting, averages at about 4hrs and therefore my machine has been able to finish a batch WUs about 50% (4hrs actual/8hrs budget) of the days alloted before the deadline. In any case, things seem to be going where I want them to be and I am quite happy with the latest sync-up results.

If the WCG server takes into account the WU runtimes in its metering of WUs, then a given workload to my machine could be further optimized so that the machine is ready for a sync-up after about 7-days. What would be the best setting that would come closest to achieving this target result? The connect:0-days_cache:10-days setting seem to be working well for my case. I do find, however, that the other setting, connect:10-days_cache:0-days, is the setting that should (in my mind) produce the results that I am actually getting now (but produced from connect:0-days_cache:10-days setting).

There indeed appears to be no difference between the two sets of settings as far as what the deadline for the WUs should be set to. Ultimately, it is the deadline that a cruncher needs to meet; it just happens that the cruncher need to connect before the deadline, the earlier, the better. To do away with the possible confusion, I suggest that the cache parameter and the connectTimes parameter be removed from user choice. That is, all a cruncher need to set is the targetDeadline parameter. The server then calculates what the settings for WU count, WU cache, and WU runTimes need to be to support the user setting. In my case, I set targetDeadline:10-days; and make sure I connect on the 9th day at the latest. Any counterpoints, gentlemen?

Thanks and bye for now.
.

[Jul 28, 2009 4:55:12 PM]

JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:


Re: Reserve WUs (workunits) stopped filling up the stock.

As it has been said and explained in several posts above in this same thread setting your Connect parameter to 10 days will most probably stop your machine from receiving new WUs.
If you really want to experiment with what I consider as extreme settings be careful to not exceed 9 days for the Connect parameter.

Regarding the replacement of the Connect and Extra Work parameters with something I am not sure I understand exactly, this should be discussed in the forum of the BOINC developers at Berkeley, but I am afraid you will not get much support. The objectives of Boinc and the needs of the majority of its users are completely different from what you are trying to achieve.

Cheers. Jean.

----------------------------------------

Team--> Decrypthon -->Statistics/Join -->Thread

[Jul 28, 2009 11:21:47 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Reserve WUs (workunits) stopped filling up the stock.

Any counterpoints, gentlemen?.

Counterpoints???? You may have misunderstood, but this is not a debating society.

My proposed "high-tech" solution

This is a SUPPORT forum not a development process.

You are encouraged (and directed) to forward your suggestions on BOINC development directly to BOINC since WCG does not change the fundamental BOINC design in the ways that you may have suggested.

[Jul 29, 2009 5:55:21 PM]

[ ]