Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 12
Posts: 12   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3652 times and has 11 replies Next Thread
martin64
Senior Cruncher
Germany
Joined: May 11, 2009
Post Count: 445
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
If pairing doesn't really work...

...like in this example:

CMD2_ 0497-2VRW_ B.clustersOccur-2Z7Q_ A.clustersOccur_ 11_ 110061_ 113315_ 1-- 614 Valid 03.06.10 15:26:45 04.06.10 10:09:21 8.80 105.6 / 118.6 <--- mine
CMD2_ 0497-2VRW_ B.clustersOccur-2Z7Q_ A.clustersOccur_ 11_ 110061_ 113315_ 0-- 614 Valid 03.06.10 15:26:38 04.06.10 07:05:30 6.00 54.3 / 48.9


If we take points as progress indicator, the wingman did about 41% the amount of work my E6300 @1.86 GHz did. So my question is what is going to happen with that WU?

The normal procedure is, afaik, that the 41% are considered done, and the remaining 59% are sent out twice again. In terms of points this means 69.7x2=139.4 points that need to be calculated for.

A better alternative would be to discard the wingman's 41% and just send out the complete WU again to a known high performer, doing only 118.6 points rather than 139.4
smile

Regards,
Martin
----------------------------------------

[Jun 4, 2010 8:00:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: If pairing doesn't really work...

Sad thought!!

why not just tell the wingman we don't want their help
[Jun 4, 2010 9:58:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
martin64
Senior Cruncher
Germany
Joined: May 11, 2009
Post Count: 445
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: If pairing doesn't really work...

Sad thought!!

why not just tell the wingman we don't want their help

It's not meant to be any offense - just a matter of optimisation. I don't really understand what the benefit is when we crunch more than we would have to.

The sad thing about the project is the fact that sometimes work is just lost, which seems to be kind of unavoidable. So the loss should be as small as possible. Maybe you can explain why it is less sad to throw away the 60% rather than the 40%?

Regards,
Martin
----------------------------------------

[Jun 4, 2010 10:19:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: If pairing doesn't really work...

by my calcs he missed 6 hr/60% cutoff by about 30 min, you made it by about 30 min, pretty good matching. because of this you got almost 50% more time to finish.

As for resends/repair jobs are not sent to "known high performers" they are sent to reliable performers. my 600Mhz box does get repair jobs, if it had landed there, you would have got approx 10% complete after 6 hrs, by your system we throw that out also and resend again until results match your desired standard, appearing to me as possiably much more waste.

as for telling crunchers, "your results are not as fast as your wingman, so we are throwing it out and redoing it". How many do you want to get rid of? I for one would just quit "Wasting" my cpu cycles and find a better place for them.
[Jun 4, 2010 11:48:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
martin64
Senior Cruncher
Germany
Joined: May 11, 2009
Post Count: 445
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: If pairing doesn't really work...

Hi fredski,

there must me a means at WCG to keep track of computer performance, otherwise they could not do the pairing. So unless my machine is one of the fastest, it should be possible to send out a WU to a faster or equally machine.

The sad thing to me is that we throw away computing time in this project, whatsoever. This should be minimised. When I asked last time, this ended up with an improvement, uplinger introduced the pairing. But I still think that the wasted effort could and should be further minimised. Some more suggestions, probably harder to implement:

Of course it would be better to continue the interrupted WU and both get finished. I understood that it is not possible to send out the interrupted result to some other computer. But would it be possible to send it out to the very same computer once again as a new task, starting where it interrupted?

The discussed "let-it-run" option would be a reasonable idea.

Next idea would be to send out just one result of a WU and let it come back. Then let the wingman do exactly the same amount of work, no wasted computer time at all. Of course this would mean more server load and longer time to validate, but I would very much appreciate the fact that nothing gets lost.

With up to 60% of my computer time wasted, I will step out of CMD2 after reaching Silver.

Regards,
Martin
----------------------------------------

[Jun 5, 2010 7:51:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mysteron347
Senior Cruncher
Australia
Joined: Apr 28, 2007
Post Count: 179
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: If pairing doesn't really work...

In my nearly forty years of IT - from programming, analysis and system design - I have noticed a consistency.

I'm brought into a failing project and the managers all run away, pointing at me. I get no co-operation and devise new techniques to solve the problems.

When I've got the problem licked, I can't fight the managers off with a stick. They'll criticise my techniques as being "non-standard" or "not approved."

At the very best outcome, the system is forced to run at only a small portion of its potential. Sometimes, it fails completely because the managers want to 'rationalise' what I did - without understanding the fundamentals.

The more the managers ignore me or pooh-pooh my decisions and techniques; the more the responses are political rather than technical, the greater the probability that the latter outcome will be the case.

The commonality is this : the project is inevitably managed using "politicians' delight" arguments - those that are simple enough for a politician to follow, theoretically solve the problem and will survive a cursory examination but which will fail when implemented, needing more "big-noting" executive decisions from the managers.

As I have said before, the solution lies in splitting the longer returned unit into separate smaller returns and transmitting ONE "make-up" unit rather than two - and we're aware from the later-generations numbering on this project that units can be created of any arbitrary size. Sadly, I had no technical response to that proposal.

I am sure that if what I've suggested was implemented, wastage would be eliminated. It would obviate the need for matching and could be applied to ALL processing.

The Crunch-till-you-drop scenario would apply only to a minority of crunchers who CHOOSE to adopt that option. It would not address the wastage as far as the other crunchers are concerned. It would mean that the report deadline would have to be extended for CTYD adopters and does nothing for the other time-waster - "too late" units where new tasks are despatched while crunching is still in-progress(*)

CTYD would take as much programming effort as would implementing what I have proposed in my estimation (and after 40 years, I believe I have some vague inkling of what I'm claiming.) It would also only suit a minority, albeit a vocal minority, rather than being a solution for all.

But after forty years, I'm also quite used to being shouted down, criticised, pooh-poohed and talking to brick walls. It no longer worries me - I can only do what I am allowed to do and I don't want to get involved in the politics. Not my area. I'm a technician and only understand technical arguments.

Sad to see that you propose to leave HCMD2, martin64. Personally, I'm on a fortnight's excursion to another project; back soon.

(*) yes, I know that this would require BOINC changes to report the current progress of a unit and is thus outside of current project-control.
[Jun 6, 2010 4:26:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: If pairing doesn't really work...

As I have said before, the solution lies in splitting the longer returned unit into separate smaller returns and transmitting ONE "make-up" unit rather than two - and we're aware from the later-generations numbering on this project that units can be created of any arbitrary size. Sadly, I had no technical response to that proposal.


BOINC is not currently designed to handle what you have proposed. It has no capability to match or examine data between two different workunits. It would be a very large effort to change this behavior in order to do this.

Also - as you consider your solutions, the biggest challenge in this project is not that any given workunit is requires more or less computation than another workunit. It is the fact that within some workunits there are specific iterations that are significantly more difficult than others (by 10-100 times more difficult). These are unpredictably encountered.
[Jun 7, 2010 1:50:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: If pairing doesn't really work...

I started tracking the valid results for this projects as when this topic got revived. On the first 21 (W7 and Linux), there was 1 that had a combo of 6 hours and wingman a longer time. After 47 valid I've now 2 with this condition and 1 that met as 6 hours with 6 hours, with relative close credit award. The sample size is relatively small, but project wide I would not guesstimate it to be far off and thus the by me 'perceived' waste being relatively small.

I'll keep tracking on a daily basis so as to maintain some feel for what my devices produce.

Now within the skill that BOINC actually has, on chopping HCMD2 work units up to the point where every unit can be completed no matter what clients meet, it can only mean it has to be done per the lowest common denominator (so I understand). That could mean an exponential growth of work units to distribute which in turn might impact the overall grid performance. As it stands we're on around 700,000 daily average and no idea how confident WCG is pumping this up to a million and more with several projects coming into production and HFCC returning in a little while. The schedulers/feeders/assimilators will be very busy me thinketh.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Jun 7, 2010 3:51:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mysteron347
Senior Cruncher
Australia
Joined: Apr 28, 2007
Post Count: 179
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: If pairing doesn't really work...

Sorry, Sek. Not the case.

Currently, most of the time, both crunchers complete the tasks thanks to speed-matching. I believe this was quoted as 85% or roughly five-in-six.

In the case that one cruncher hits a wall (no matter which wall) and the other completes (whether or not it hit the 6hr. barrier) then with the current system, two (or 2*n) child tasks are generated. With what I have proposed, only one (or 2*n -1) task would be generated - the 'twin' would be derived from the data already returned by the faster cruncher.

In the case that BOTH crunchers hit walls, then whereas the current system generates two (or possibly 2*n) child tasks, my proposal would generate three (or 1+2*n)

However, given the aforementioned 85%, the number of children generated for an incomplete WU is currently 2 (in the simple case) with my proposition it is 0.85*1 + 0.15*3 = 1.25.

Hence there would in fact be FEWER tasks to co-ordinate, not more - AND the child tasks would be shorter on average. There would be a plentiful supply of tiny tasks to keep the pensioners gainfully occupied - not being almost counterproductive as they currently are when mismatched.

And the speed of the machine, used control the matching (and hence minimise child-generation (that's BOINCtraceptives...) could simply be (number of positions processed) div (number of minutes taken to process) rather than the matrix currently employed. Would seem to be much simpler.
[Oct 16, 2010 12:24:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: If pairing doesn't really work...

Sorry, Sek. Not the case.

Currently, most of the time, both crunchers complete the tasks thanks to speed-matching. I believe this was quoted as 85% or roughly five-in-six.

The "85%" I only found in relation to [from the outside showing] credit disparity. See this http://www.worldcommunitygrid.org/forums/wcg/...ead,27770_offset,0#256089 and the follow up post.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Oct 16, 2010 8:44:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 12   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread