Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 117
Posts: 117   Pages: 12   [ Previous Page | 3 4 5 6 7 8 9 10 11 12 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 497746 times and has 116 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: sigificant credit drop - only for me or did someone else see this?



Most of my points come from 2 dual xeon 40c/80t machines which each have 16gb of ram and work super well.
Switching to 100% MIP resulted in massive drops in points and runtime too. I can see the points drop that everyone is talking about, but I didn't see where the drop in runtime came from.
Maybe there was a fight for resources? I don't really know.

Anyway, if I adjust the config files to run a limited number of WUs, how many should I run max without affecting performance? I've seen a lot of talk about 2, but with my rigs, would 4 be ok?

Running 1/4 of the threads on MIP should be ok, so 20 units on your 80t machines. What I do not know, how a dual processor system will affect this. This will only work, if half of the MIP units run on one processor, the rest on the other. Not sure if this will work automatically.
What I would try is to start with zero MIP units, monitor the CPU temps and slowly increase the number of MIPs running. Once the CPU temps decrease noticably, this is a sign the CPU is underutilized as the L3 cache is not sufficient for the cache hungry MIP units.
[Jun 22, 2018 8:23:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2391
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: sigificant credit drop - only for me or did someone else see this?

Sorry it's taken us a while to respond. Please don't leave, we appreciate the time ALL of you have invested in the project. It has taken us a while to reproduce and then really pin down what causes the problem.

The short version is that Rosetta, the program being used by the MIP to fold the proteins on all of your computers*, is pretty hungry when it comes to cache. A single instance of the program fits well in to a small cache. However, when you begin to run multiple instances there is more contention for that cache. This results in L3 cache misses and the CPU sits idle while we have to make a long trip to main memory to get the data we need. This behavior is common for programs that have larger memory requirements. It's also not something that we as developers often notice; we typically run on large clusters and use hundreds to thousands of cores in parallel on machines. Nothing seemed slower for us because we are always running in that regime.

I don't know all of the details about how the points are assigned, and I don't know if/how the credit assignment will be modified. But I believe that issue stems from the fact that a single instance Rosetta is well behaved (very few cache misses) on most consumer chips, but on machines with smaller caches and few memory channels a second (or third or forth) instance cannot fit in to the caches and you see the run time scaling issues which result in fewer points/hour (i.e. if a single instance of Rosetta had these cache issues the scaling from one to multiple instances would not be as dramatic nor would the change in points/hour).**

We are looking to see if if we can improve the cache behavior. Rosetta is ~2 million lines of C++ and improving the cache performance might involve changing some pretty fundamental parts. We have some ideas of where to start digging, but I can't make any promises.

Long term, identifying these issues may end up improving Rosetta for everyone that uses it so pat yourselves on the back for that!

Doug

* a newer version of the program used in HPF1 & 2
** this might be the case for machine with very small (less than 4MB) caches, it's just always slow

This post should be pinned as the first thing in this forum until the code is fixed. Lots of electricity is being squandered.
----------------------------------------

...KRI please cancel all shadow-banning
[Mar 19, 2019 9:39:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: sigificant credit drop - only for me or did someone else see this?

It will be interesting to see if the ryzen 3900x with its large L3 cache will offer help with this. I'm considering purchasing one, depending on reviews when it comes out. My i5-4590 is currently running 4 tasks at a time with an average completion time of 1 hour 31 minutes.
[Jun 28, 2019 9:00:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2391
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: sigificant credit drop - only for me or did someone else see this?

FX-8370 L3=8 MB so only 1 in 8 CPUs.
Xeon E5-2673v3 L3=30 MB so only 6 of 24 CPUs.
Xeon E5-2699v4 L3=55 MB so only 11 of 44 CPUs.
Ryzen 9 3950X L3=72MB so only 14 of 32 CPUs.

It's socially irresponsible for them to not fix this problem.
----------------------------------------

...KRI please cancel all shadow-banning
[Jun 28, 2019 11:30:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1326
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: sigificant credit drop - only for me or did someone else see this?


Ryzen 9 3950X L3=72MB so only 14 of 32 CPUs.


OT Lucky as far as I am aware these have not been released to the public yet If you are happy/allowed to provide some details I would a appreciate some details about the CPU in Choosing a high end CPU /OT
----------------------------------------

[Jun 29, 2019 6:42:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: sigificant credit drop - only for me or did someone else see this?

Here you go:
https://www.amd.com/en/products/cpu/amd-ryzen-9-3950x

The recommended retail price will be 749 $ and it be available in september (not hard to find out with a quick google search).
[Jun 29, 2019 9:02:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
PowerFactor
Ace Cruncher
Joined: Dec 9, 2016
Post Count: 4033
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: sigificant credit drop - only for me or did someone else see this?

FX-8370 L3=8 MB so only 1 in 8 CPUs.
Xeon E5-2673v3 L3=30 MB so only 6 of 24 CPUs.
Xeon E5-2699v4 L3=55 MB so only 11 of 44 CPUs.
Ryzen 9 3950X L3=72MB so only 14 of 32 CPUs.

It's socially irresponsible for them to not fix this problem.


It looks like the following equation applies: biggrin

Max_core_count_MIP = floor( cpu_L3_cache_size_MB / 5 );
[Jun 29, 2019 2:18:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1326
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: sigificant credit drop - only for me or did someone else see this?

Here you go:
https://www.amd.com/en/products/cpu/amd-ryzen-9-3950x

The recommended retail price will be 749 $ and it be available in september (not hard to find out with a quick google search).

Yes I am aware of that information thank you I am assuming it is US pricing so for me in New Zealand currently it would cost $1137.95 NZ$1 will buy 66 US cents. It wasn't quite what I was getting at either. Thanks for the information
----------------------------------------

[Jun 30, 2019 12:40:17 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Falconet
Master Cruncher
Portugal
Joined: Mar 9, 2009
Post Count: 3315
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: sigificant credit drop - only for me or did someone else see this?

Sorry it's taken us a while to respond. Please don't leave, we appreciate the time ALL of you have invested in the project. It has taken us a while to reproduce and then really pin down what causes the problem.

The short version is that Rosetta, the program being used by the MIP to fold the proteins on all of your computers*, is pretty hungry when it comes to cache. A single instance of the program fits well in to a small cache. However, when you begin to run multiple instances there is more contention for that cache. This results in L3 cache misses and the CPU sits idle while we have to make a long trip to main memory to get the data we need. This behavior is common for programs that have larger memory requirements. It's also not something that we as developers often notice; we typically run on large clusters and use hundreds to thousands of cores in parallel on machines. Nothing seemed slower for us because we are always running in that regime.

I don't know all of the details about how the points are assigned, and I don't know if/how the credit assignment will be modified. But I believe that issue stems from the fact that a single instance Rosetta is well behaved (very few cache misses) on most consumer chips, but on machines with smaller caches and few memory channels a second (or third or forth) instance cannot fit in to the caches and you see the run time scaling issues which result in fewer points/hour (i.e. if a single instance of Rosetta had these cache issues the scaling from one to multiple instances would not be as dramatic nor would the change in points/hour).**

We are looking to see if if we can improve the cache behavior. Rosetta is ~2 million lines of C++ and improving the cache performance might involve changing some pretty fundamental parts. We have some ideas of where to start digging, but I can't make any promises.

Long term, identifying these issues may end up improving Rosetta for everyone that uses it so pat yourselves on the back for that!

Doug

* a newer version of the program used in HPF1 & 2
** this might be the case for machine with very small (less than 4MB) caches, it's just always slow



Would be nice to know if there is an update on this issue.
----------------------------------------


- AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W
- AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W
- AMD Ryzen 7 7730U 8C/16T 3.0 GHz
[Nov 4, 2019 11:38:22 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: sigificant credit drop - only for me or did someone else see this?

Falconet,

Unfortunately there is not an update to this yet.

Thanks,
-Uplinger
[Nov 8, 2019 7:45:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 117   Pages: 12   [ Previous Page | 3 4 5 6 7 8 9 10 11 12 | Next Page ]
[ Jump to Last Post ]
Post new Thread