Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 164
Posts: 164   Pages: 17   [ Previous Page | 7 8 9 10 11 12 13 14 15 16 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
This topic has been viewed 44014 times and has 163 replies Next Thread
Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2384
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

You are running too many work units at once with HT on.
That should not be possible unless it's improperly coded for multithreading and use of the cache. The only two BOINC projects I know of with that problem are MIP & ARP.

CPDN has the problem with some of their work units. Anything that uses a lot of cache can be a problem. You can say that they shouldn't have the problem if you want to. But that is irrelevant to the point of understanding how HT works.

PS - Insofar as I know the OS decides what goes in what cache. All the programmer can do is limit the amount of writes to memory. Some programs are better at that than others. Maybe there are other tricks. But in the end, all the crunchers can do is check to see if we are getting a slowdown with multiple work units. The cure is to limit the number (as with an app_config.xml), not turn off HT.
Strange that so many books & journal articles have been written on the subject:
https://www.google.com/search?q=programming+c...&biw=1920&bih=969

https://www.google.com/search?q=programming+c...rceid=chrome&ie=UTF-8
----------------------------------------

...KRI please cancel all shadow-banning
[Aug 30, 2020 2:12:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12146
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

Aurum420

Are these machines of yours running just arp on every core or every thread?

If so, the possibility is that with HT you are getting more conflicts at checkpoints which is slowing things down.

I find that running arp on more than half my threads results in a significant slowdown and each extra thread makes it slower still. But I don't have any figures to back that up.

As all the projects are worthwhile, I run a mix of 3 different projects.

Mike
[Aug 30, 2020 2:38:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2384
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

This is my approach to coping with ARP. N is the number of physical CPU cores and T = 2N is the number of CPU threads.
1. My fastest CPUs (FP MIPS Whetstone > 4000) have hyperthreading disabled and run N-1 ARP WUs exclusively. This alleviates the biggest problem with ARP by shortening the time between checkpoints to about 1.5 hours.
2. My midrange CPUs (3400 < FP MIPS Whetstone < 4000) have HT enabled but their app_config limits ARP WUs to N-1 = (T-2)/2 and also runs OPN & HST. This has the extremely annoying over 3 hour checkpoints and I'll probably stop running ARP on these altogether.
2. My low end CPUs (FP MIPS Whetstone < 3400) never run ARP.
----------------------------------------

...KRI please cancel all shadow-banning
----------------------------------------
[Edit 1 times, last edit by Aurum420 at Aug 30, 2020 5:10:34 PM]
[Aug 30, 2020 5:08:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

I run 128 ARP wus simultaneously on the Dell 7425. They average 31 hours each.Seems reasonable to me. If I'm a coder, I have to ask myself , is the extra coding to handle a variety of cache sizes worth the little extra benefit that might be realized? The WCG work units have always been built to the lowest common denominator to allow for the widest range of participation. If one has one of the latest and greatest processors, the WUs will probably not take full advantage of the processor features. It is what it is....
[Aug 30, 2020 5:45:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12146
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

Rather than stopping arp completely on your mid-range machines, first try restricting arp to half the threads and increase the other. See if that boosts the performance. Wean them off gradually to see when you hit optimum combination. I find half is about right.

Mike
[Aug 30, 2020 6:17:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2384
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

Rather than stopping arp completely on your mid-range machines, first try restricting arp to half the threads and increase the other. See if that boosts the performance.
That's what i said I do. I don't see much improvement.
Wean them off gradually to see when you hit optimum combination. I find half is about right.
Mike
At 7 day validation rate that could take half a year. I really dislike the 3 hour checkpoints so that's the main reason to cut back. My fastest CPUs sans HT run ARP WUs at about 9 to 12 hours.
----------------------------------------

...KRI please cancel all shadow-banning
[Aug 31, 2020 1:06:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 874
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

@Aurum420

Is it possible with arp1 to become a Lone Ranger??? I've yet to make the cut:
Minimum Quorum: 2 & Replication: 2
I wonder if IBM staff will ever answer my question. I've yet to see a Quorum One yet and my Results Status has 180 ARP Pending Validation, 195 Valid & 375 In Progress. I surmise the answer is no there will never be Quorum One status for ARP but it would be nice to understand why.

This was explained in the notes for the original Beta test (and has been mentioned elsewhere since, if I recall correctly) - to quote from the Beta announcement of May 29th 2019
For this project, the only method available to validate results is to run redundant copies and check for binary equivalence.

Given that each work unit is for a specific time and place (unlike CPDN, where they do multiple runs with the same start-point but subtly different control parameters), one can't afford to have an unverified result, so no quorum one.

Cheers - Al.

P.S. I am not from IBM smile ...
[Aug 31, 2020 5:09:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2384
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

For this project, the only method available to validate results is to run redundant copies and check for binary equivalence.
Thanks for finding that line. I had thought that was part of starting off slow. Allowing Quorum One would nearly double ARP throughput. I thought it would be worth doing a statistical analysis to find out if it's still needed.
----------------------------------------

...KRI please cancel all shadow-banning
[Aug 31, 2020 1:49:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2384
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

One would think it wouldn't take too many of these to rethink their Quorum 2 policy:
----------------------------------------

...KRI please cancel all shadow-banning
[Sep 7, 2020 2:03:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

OK, this resulted in a redundant effort, but what would it then have to be?

By the looks _2 could not be cancelled as that client either was out of touch or already had started the job. Runtime of 160 hours suggest, long started, so the server wont cancel but at the same time does not know if it will ever report.

_3 was a response to _2 not answering, which then did not answer either, whilst _2 is still chugging away which invoked _4, finally a quick one by presumably a reliable host.

System worked, the next iteration could be generated, not held up any longer.
[Sep 7, 2020 8:23:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 164   Pages: 17   [ Previous Page | 7 8 9 10 11 12 13 14 15 16 | Next Page ]
[ Jump to Last Post ]
Post new Thread