| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 117
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Most of my points come from 2 dual xeon 40c/80t machines which each have 16gb of ram and work super well. Switching to 100% MIP resulted in massive drops in points and runtime too. I can see the points drop that everyone is talking about, but I didn't see where the drop in runtime came from. Maybe there was a fight for resources? I don't really know. Anyway, if I adjust the config files to run a limited number of WUs, how many should I run max without affecting performance? I've seen a lot of talk about 2, but with my rigs, would 4 be ok? Running 1/4 of the threads on MIP should be ok, so 20 units on your 80t machines. What I do not know, how a dual processor system will affect this. This will only work, if half of the MIP units run on one processor, the rest on the other. Not sure if this will work automatically. What I would try is to start with zero MIP units, monitor the CPU temps and slowly increase the number of MIPs running. Once the CPU temps decrease noticably, this is a sign the CPU is underutilized as the L3 cache is not sufficient for the cache hungry MIP units. |
||
|
|
Aurum
Master Cruncher The Great Basin Joined: Dec 24, 2017 Post Count: 2391 Status: Offline Project Badges:
|
Sorry it's taken us a while to respond. Please don't leave, we appreciate the time ALL of you have invested in the project. It has taken us a while to reproduce and then really pin down what causes the problem. The short version is that Rosetta, the program being used by the MIP to fold the proteins on all of your computers*, is pretty hungry when it comes to cache. A single instance of the program fits well in to a small cache. However, when you begin to run multiple instances there is more contention for that cache. This results in L3 cache misses and the CPU sits idle while we have to make a long trip to main memory to get the data we need. This behavior is common for programs that have larger memory requirements. It's also not something that we as developers often notice; we typically run on large clusters and use hundreds to thousands of cores in parallel on machines. Nothing seemed slower for us because we are always running in that regime. I don't know all of the details about how the points are assigned, and I don't know if/how the credit assignment will be modified. But I believe that issue stems from the fact that a single instance Rosetta is well behaved (very few cache misses) on most consumer chips, but on machines with smaller caches and few memory channels a second (or third or forth) instance cannot fit in to the caches and you see the run time scaling issues which result in fewer points/hour (i.e. if a single instance of Rosetta had these cache issues the scaling from one to multiple instances would not be as dramatic nor would the change in points/hour).** We are looking to see if if we can improve the cache behavior. Rosetta is ~2 million lines of C++ and improving the cache performance might involve changing some pretty fundamental parts. We have some ideas of where to start digging, but I can't make any promises. Long term, identifying these issues may end up improving Rosetta for everyone that uses it so pat yourselves on the back for that! Doug * a newer version of the program used in HPF1 & 2 ** this might be the case for machine with very small (less than 4MB) caches, it's just always slow This post should be pinned as the first thing in this forum until the code is fixed. Lots of electricity is being squandered. ![]() ...KRI please cancel all shadow-banning |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It will be interesting to see if the ryzen 3900x with its large L3 cache will offer help with this. I'm considering purchasing one, depending on reviews when it comes out. My i5-4590 is currently running 4 tasks at a time with an average completion time of 1 hour 31 minutes.
|
||
|
|
Aurum
Master Cruncher The Great Basin Joined: Dec 24, 2017 Post Count: 2391 Status: Offline Project Badges:
|
FX-8370 L3=8 MB so only 1 in 8 CPUs.
----------------------------------------Xeon E5-2673v3 L3=30 MB so only 6 of 24 CPUs. Xeon E5-2699v4 L3=55 MB so only 11 of 44 CPUs. Ryzen 9 3950X L3=72MB so only 14 of 32 CPUs. It's socially irresponsible for them to not fix this problem. ![]() ...KRI please cancel all shadow-banning |
||
|
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1326 Status: Offline Project Badges:
|
Ryzen 9 3950X L3=72MB so only 14 of 32 CPUs. OT Lucky as far as I am aware these have not been released to the public yet If you are happy/allowed to provide some details I would a appreciate some details about the CPU in Choosing a high end CPU /OT ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Here you go:
https://www.amd.com/en/products/cpu/amd-ryzen-9-3950x The recommended retail price will be 749 $ and it be available in september (not hard to find out with a quick google search). |
||
|
|
PowerFactor
Ace Cruncher Joined: Dec 9, 2016 Post Count: 4033 Status: Offline Project Badges:
|
FX-8370 L3=8 MB so only 1 in 8 CPUs. Xeon E5-2673v3 L3=30 MB so only 6 of 24 CPUs. Xeon E5-2699v4 L3=55 MB so only 11 of 44 CPUs. Ryzen 9 3950X L3=72MB so only 14 of 32 CPUs. It's socially irresponsible for them to not fix this problem. It looks like the following equation applies: Max_core_count_MIP = floor( cpu_L3_cache_size_MB / 5 ); |
||
|
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1326 Status: Offline Project Badges:
|
Here you go: https://www.amd.com/en/products/cpu/amd-ryzen-9-3950x The recommended retail price will be 749 $ and it be available in september (not hard to find out with a quick google search). Yes I am aware of that information thank you I am assuming it is US pricing so for me in New Zealand currently it would cost $1137.95 NZ$1 will buy 66 US cents. It wasn't quite what I was getting at either. Thanks for the information ![]() |
||
|
|
Falconet
Master Cruncher Portugal Joined: Mar 9, 2009 Post Count: 3315 Status: Offline Project Badges:
|
Sorry it's taken us a while to respond. Please don't leave, we appreciate the time ALL of you have invested in the project. It has taken us a while to reproduce and then really pin down what causes the problem. The short version is that Rosetta, the program being used by the MIP to fold the proteins on all of your computers*, is pretty hungry when it comes to cache. A single instance of the program fits well in to a small cache. However, when you begin to run multiple instances there is more contention for that cache. This results in L3 cache misses and the CPU sits idle while we have to make a long trip to main memory to get the data we need. This behavior is common for programs that have larger memory requirements. It's also not something that we as developers often notice; we typically run on large clusters and use hundreds to thousands of cores in parallel on machines. Nothing seemed slower for us because we are always running in that regime. I don't know all of the details about how the points are assigned, and I don't know if/how the credit assignment will be modified. But I believe that issue stems from the fact that a single instance Rosetta is well behaved (very few cache misses) on most consumer chips, but on machines with smaller caches and few memory channels a second (or third or forth) instance cannot fit in to the caches and you see the run time scaling issues which result in fewer points/hour (i.e. if a single instance of Rosetta had these cache issues the scaling from one to multiple instances would not be as dramatic nor would the change in points/hour).** We are looking to see if if we can improve the cache behavior. Rosetta is ~2 million lines of C++ and improving the cache performance might involve changing some pretty fundamental parts. We have some ideas of where to start digging, but I can't make any promises. Long term, identifying these issues may end up improving Rosetta for everyone that uses it so pat yourselves on the back for that! Doug * a newer version of the program used in HPF1 & 2 ** this might be the case for machine with very small (less than 4MB) caches, it's just always slow Would be nice to know if there is an update on this issue. ![]() - AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W - AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W - AMD Ryzen 7 7730U 8C/16T 3.0 GHz |
||
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
Falconet,
Unfortunately there is not an update to this yet. Thanks, -Uplinger |
||
|
|
|