Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 76
Posts: 76   Pages: 8   [ Previous Page | 1 2 3 4 5 6 7 8 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 532348 times and has 75 replies Next Thread
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

Hi Mysteron347
Thanks for adding your thoughts on this topic, and for keeping the discussion alive. I was starting to think that I was the only person who cared.

Your proposed scheme would be ideal, but I think knreed has said that it would be too difficult.

With my proposed modification #1, instead of sending out parent WUs with your 20000 structures, they would send out only 4000, or whatever number the vast majority of the crunchers selected to do parent WUs could run to completion. In your example, they would both complete the 4000 and not hit the 60%/6h barrier at 5500 and 5000, and there would be no wastage. The crunching times and credit claims (and BOINC benchmark scores, if available) from these parent WUs would enable the amount of work per structure for the job to be calculated, and the remaining 16000 structures would be divided up accordingly.
[Nov 23, 2009 8:13:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

Suggestion 1:
Shorten the 1st-generation WUs that are sent, to increase the % that run to completion. Do not send the full no of structures, just enough to get a good estimate of the amount of work per structure. I am thinking of about 25% of the full job.

Rick, I may be misreading you but it seems that you assume that each parent WU is a complete job from the scientist's viewpoint. In fact a job has already been sliced in many parent WUs** and if these slices are sometimes too thick it is because it is not possible to have a good estimate of the difficulty, the number and when tough positions happen in a job. These tough positions are unpredictable when the parent WUs are prepared, and even at child or grandchild levels the tough positions of what has already been processed in the upper levels is no guarantee of if and when other tough positions will appear in what remains to be processed.

This is the main reason why it is so difficult to build WUs that you would love. And actually it is the reason why this particular hierarchy of WUs has been elaborated.

** Scientists' jobs in other projects are "sliced" the same but for other projects it is easier to normalize the thickness of the slices.
For example in your "CMD2_0148-TPM1A.clustersOccur-1Z2C_D.clustersOccur_260_2" WU, "260" is the number of this slice.

Knowing Kevin a little you can be assured that if he finds better (and still practical) ways to refine this process he will at least try them. Keep thinking and proposing possible alternatives (that may help him) but please be patient, there is still much time to go before the end of this project.

Cheers. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Nov 23, 2009 11:17:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

Reading, just not injecting assumptions and presumptions or misunderstanding, my observation atm is that the device matching is not as fine grained as I'd expected it to be. Certainly on a result done few days ago if my wingman runs 6 hours, not reaching 60% and mine doing 9.74 hours doing the full, in linear, the 60% was reached at 5.84 hours so I know that 9.74 - 5.84 = 3.9 hours were binned, at least. For the slower machine that equates to at minimum an equivalent of 6.33 hours. Does not convey a happy feeling.

The 75% having no credit differential is open to interpretation. Could be high claim granted to both, low claim granted to both, could be pure quorum 2 claim mean and still have positions passed into a child.

Anyway, I'm fine with whatever other solutions such as:

A. Have volunteers opt to get WU's compiled to run the full length no matter what.
B. Assume the positions that had no match, the overage of 1 result in the quorum to be considered 100% okay when such a device has had a perfect score on the last 30 results (or any other number that gives scientists a high enough rate of confidence).

Right now at present pace and project growth we'll be going on till mid next decade, unless the positions truly become much lighter than they presently are or the scientists start culling the targets of lesser interest, or improve efficiency somehow. On that duration, any 1% redundancy reduction has a tremendous effect towards reaching project end.

PS, if the logs would actually print the positions completed we'd have real visible numbers for I don't consider credit in any form or fashion a useful part in computing waste. Credit claiming is just plain too iffy, one in quorum having run on a real Q6600 core and the other on a I7 hyperthreaded virtual core comes too mind as a possible quorum combo.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Nov 23, 2009 11:49:45 AM]
[Nov 23, 2009 11:48:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

JmBoullier: " ... it seems that you assume that each parent WU is a complete job from the scientist's viewpoint."
Yes, I was, as I had no info to the contrary.
" ... the tough positions of what has already been processed in the upper levels is no guarantee of if and when other tough positions will appear in what remains to be processed."
But after the early part of each WU, the progress % shown in BOINC manager is almost always quite accurate. Can't that accuracy be used to generate descendant WUs with the appropriate size?
And should the parent WUs not have fewer structures anyway, assuming the current 10h target length?

There has to be a better way though. Somehow, somewhere.
Carry On, Kevin.
----------------------------------------
[Edit 1 times, last edit by Rickjb at Nov 23, 2009 3:33:18 PM]
[Nov 23, 2009 3:17:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

There has to be a better way though. Somehow, somewhere.

Make a *short* CMD2 and a *long* CMD2 subprojects.
Target the 6 hours for the long but let them run till completion.
Target 2 hours for the short and let them run to completion.
No fancy calculations, no parents, children, odd formulas for multiple cutoff levels ... run them like every other project (except RICE where I hope there is a better formula).
[Nov 23, 2009 6:37:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

But after the early part of each WU, the progress % shown in BOINC manager is almost always quite accurate. Can't that accuracy be used to generate descendant WUs with the appropriate size?

It's always accurate but not always meaning and increasing the same.

If the WU is gone for completing all its positions the percentage accurately represents the number of positions which have been completed but it will increase more or less quickly depending on if simple positions are crunched or if it is tough ones, and that difference can be right at the beginning of the WU if the parent one has been cut off while in a "tough zone". It can also start very quickly and reach a tough zone much further in the process. And there may be several tough zones or none...

If the WU is meant to be cut off at the 60 % limit the percentage will accurately show the percentage of CPU time done versus 6 hours, except that it is still updated only when a position is done.
I don't remember having had a WU reaching the 12 hour limit, therefore I don't know if another percentage (vs 12 hours) is displayed then.

Note that if such a WU is suddenly speeding up and can now pass the 60 % limit the percentage will revert to the first mode, and vice-versa. I have noticed such changes only in the first minutes of a job but I think it can happen at any time

I have watched many HCMD2 WUs and I have even tried to "micromanage" them sometimes by letting them run 2.5 minutes to see if they would be fast or slow. Believe me, you cannot reliably predict what their final times will be. It will work for a majority of them, but not enough for basing a procedure which would avoid wastage completely. So, considering the time it would take I don't think this is realistic.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Nov 23, 2009 6:42:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mysteron347
Senior Cruncher
Australia
Joined: Apr 28, 2007
Post Count: 179
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

I believe that the difficulties which concern knreed is a matter of precisely how the 'fill-in' unit returns would be used, and could be a simple matter of interpretation.

If, given the 5,000/5,500 example a fill-in of (5,001-5,500) was created, then to verify 1-5,500, a minimum of THREE result units would have to be interpreted - the original two results for the parent and the fill-in. This complication would be compounded by the possibility that the fill-in is sent to an AWOL processor, or it is itself split into child workunits or that verification may fail and a _2 unit be generated, waited-for and then added to the co-ordination scheme. A nightmare scenario.

The key to my proposal is that the returned unit is partitioned, and the effective units generated are
parent : 1-5,000
Workunit A: 5,001-5,500
Workunit B: 5,501-9,125
Workunit C: 9,126 -12,750
Workunit D: 12,751-16,375
Workunit E: 16,376-20,000

There are EXACTLY two result returns for each of these units, and the existing mechanism (modified as I propose) may be used to deal with AWOL processors or fail-to-verify instances. The parent may be done and dusted for weeks before Workunit A, for instance - in the same way as the current mechanism works and has been proved.

=========

Moving the wall out is not desired as it will increase average run-time, contrary to lawrencehardin's user-comfort note.

Making the wall lower will similarly increase average run-time, as more processors will cross the Rubicon.

Making the wall higher will cause more processors to hit it. Only the very fastest will go over, and the faster (but not fastEST) processors simply hit higher than their companions. As noted, this leads to the waste we're discussing.

((Quite what it is that's hitting/clearing the wall is a mental image I'm having difficulty expunging..))

The issue once again comes down to the SLOPE of the line - back to pure processing speed. The closer the match in the slope, the less the splash from the wastage.

I believe that considering turnaround time in the cruncher-matching mechanism is not productive. Turnaround time would seem to affect Pending-Validation but not credit-granted (which in turn measures work done) I'm sure that crunchers would prefer to see improved efficiency at the cost of PV time - and improved efficiency should also act to decrease PV time...

Tightening the speed differential allowed would seem to be the way to go - perhaps even insisting that the processor given the second replica be of EXACTLY the same speed-class (relax after 24 or 48 hours to prevent "stuck" units) might give a worthwhile improvement.


I'm confident that if my return-result partitioning proposal was implemented, we'd see an improvement - and if that could be proved, the next improvement proposal just might be a little surprising... devilish
[Nov 23, 2009 6:57:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

There has to be a better way though. Somehow, somewhere.

Make a *short* CMD2 and a *long* CMD2 subprojects.
Target the 6 hours for the long but let them run till completion.
Target 2 hours for the short and let them run to completion.
No fancy calculations, no parents, children, odd formulas for multiple cutoff levels ... run them like every other project (except RICE where I hope there is a better formula).

All results within a RICE quorum and the total pool compute unique seeds and similar sometimes WU's are held to wait for part of the quorum to check what seeds were already done before sending more for the quorum out, so I'd be interested to know how you could improve on that algorithm. Nothing wrong with pushing the efficiency envelope, but if it's 100%, it's 100%.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Nov 24, 2009 6:14:29 PM]
[Nov 24, 2009 6:06:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

I think we have a translation issue regarding my comments about RICE. I was not saying the algorythm was bad ... I was saying that I hoped it was better than CMD2 ... and you have now confirmed for me that it is biggrin
[Nov 24, 2009 10:01:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

About the discussion on efficiency, if I could pull back to that...
Correct me if I missed anything here...
This could all really end up being a totally hypothetical discussion. I seem to recall seeing something on the boards about how this project will 'speed up' as it were as we approach the end. I've been unclear if this has to do with batch progress (later batches get smaller) or absolute progress... I'm basing the rest of this off it being absolute progress... Assuming computing power remains a constant, the only way that is possible would be if the compounds we are testing get simpler to allow faster computing.
If you look at our current compounds (the mid 160s) one of them has a tail of sorts. My brief time with QMC at home showed me that more complicated compounds take more time to crunch. If, as I am lead to believe the compounds will be simpler as we whip through batches the more computers should be able to pull off work units without hitting the walls.
Looks like a good chunk of this (as far as I can tell non) problem will solve itself...

Edit: By walls, I'm referring to the graph drawn above, where the 6 hour and 12 hour marks are shown as 'walls' that halt computing on the spot
----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 25, 2009 5:53:39 AM]
[Nov 25, 2009 5:51:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 76   Pages: 8   [ Previous Page | 1 2 3 4 5 6 7 8 | Next Page ]
[ Jump to Last Post ]
Post new Thread