Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 9
|
![]() |
Author |
|
WCGAdmin
World Community Grid Admin Joined: Jun 9, 2020 Post Count: 171 Status: Offline |
The research team and the World Community Grid tech team continue to collaborate on a new type of work unit for the project.
https://www.worldcommunitygrid.org/about_us/viewNewsArticle.do?articleId=663 |
||
|
William Albert
Cruncher Joined: Apr 5, 2020 Post Count: 39 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Still have not heard a word about whether they're going to fix the L3 Cache congestion problem. L3 cache is managed by the processor itself (with the exception of some recent post-Spectre instructions that programs can call to flush the cache). There's nothing that they can really "fix," because the program has no influence over how the processor manages its L3 cache (or if L3 cache is even present). |
||
|
William Albert
Cruncher Joined: Apr 5, 2020 Post Count: 39 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Then how did the Baker Lab fix it in their current version of Rosetta??? Fix what? Again, L3 cache is managed by the processor itself. Rosetta (or any other application) has no control over how L3 cache is managed. Also, since Rosetta has to run on many different microarchitectures, it can't make any assumptions about how much (if any) L3 cache is present. The best that the Rosetta developers can realistically do is to design the program's in-memory data structures to be small and relatively static, so that they have a higher chance of staying cached. But designing high-performance data structures isn't trivial, and making them smaller isn't necessarily going to improve performance (the whole space-time tradeoff thing). So I don't really see anything that's "broken" that needs to be "fixed." MIP's workload might benefit from larger amounts of L3 cache, but it's not exactly a surprise that programs run faster on processors with more resources. |
||
|
katoda
Senior Cruncher Poland Joined: Apr 28, 2007 Post Count: 170 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Dear William Albert, have you read https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,40374_offset,0?
----------------------------------------I guess that Aurum420 refers to the issue reported and analysed there - clogging CPU while running too many MIP1 units simultaneously, resulting in long computing time of MIP1 WUs), a behaviour observed only for this project. The fact that Rosetta version is especially "hungry" in regard of processor's cache was later confirmed by the project scientist https://www.worldcommunitygrid.org/forums/wcg...ad,40374_offset,60#569786 For the moment the only solution is to limit number of running MIP1 WUs to 1 unit per 4 MB L3 cache. @Aurum420, could you please elaborate (or provide a link) on Baker Lab fix? ![]() [Edit 2 times, last edit by katoda at Nov 11, 2020 7:35:10 AM] |
||
|
William Albert
Cruncher Joined: Apr 5, 2020 Post Count: 39 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Here's the comment in that thread from the MIP scientist:
https://www.worldcommunitygrid.org/forums/wcg...ad,40374_offset,70#569786 The short version is that Rosetta, the program being used by the MIP to fold the proteins on all of your computers*, is pretty hungry when it comes to cache. A single instance of the program fits well in to a small cache. However, when you begin to run multiple instances there is more contention for that cache. This results in L3 cache misses and the CPU sits idle while we have to make a long trip to main memory to get the data we need. This behavior is common for programs that have larger memory requirements. It's also not something that we as developers often notice; we typically run on large clusters and use hundreds to thousands of cores in parallel on machines. Nothing seemed slower for us because we are always running in that regime. I don't know all of the details about how the points are assigned, and I don't know if/how the credit assignment will be modified. But I believe that issue stems from the fact that a single instance Rosetta is well behaved (very few cache misses) on most consumer chips, but on machines with smaller caches and few memory channels a second (or third or forth) instance cannot fit in to the caches and you see the run time scaling issues which result in fewer points/hour (i.e. if a single instance of Rosetta had these cache issues the scaling from one to multiple instances would not be as dramatic nor would the change in points/hour).** We are looking to see if if we can improve the cache behavior. Rosetta is ~2 million lines of C++ and improving the cache performance might involve changing some pretty fundamental parts. We have some ideas of where to start digging, but I can't make any promises. That explanation is pretty in-line with what I said above. Given that it's been several years since that comment, and this issue still exists, it's likely that optimizing it was infeasible. Also, keep in mind the project's needs from a science standpoint. Results need to be re-producible, and papers analyzing those results will cite the simulation tool and version used. Even if a newer build of Rosetta with optimized caching behavior exists, it's very possible that the MIP team isn't in a position to change versions at this point (or at any point) without spoiling their existing results. If MIP's behavior is causing problems with common consumer hardware, it may be prudent for the WCG admins to add a notice about it and set a default limit on the number of running MIP WUs per device in the project selection menu, similar to how they handle Africa Rainfall Project's space requirements. However, as long as MIP WU's are running to completion and producing useful output, they aren't "broken," and implying that cache usage is a problem to be fixed (rather than simply being a performance characteristic of the project's WU's), and that the MIP team is being negligent for not having "fixed it," is inaccurate and unfair. |
||
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
That is a good summary.
To make a long story short, after considerable time running MIP on Linux (with less experience on Windows), I have found: I can run two MIP on Intel (Haswell, Coffee Lake) machines. Also two at a time on Ryzen 2000 series. And four at a time on Ryzen 3000 series. But that still depends on what other projects you are running. Rosetta itself will reduce those numbers, as will ARP. |
||
|
|
![]() |