| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 117
|
|
| Author |
|
|
andgra
Senior Cruncher Sweden Joined: Mar 15, 2014 Post Count: 195 Status: Offline Project Badges:
|
Must say I'm dissapointed at the Linux implementation of MIP.
----------------------------------------Pulling my Linux cores out of MIP to work on other projects. Windows cores are ok here and could stay. Hopefully we will get some respons from techs or scientists on the progress of this issue.
/andgra
![]() |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Even on Windows, MIP1 does not run optimal.
----------------------------------------I've noticed a strange behaviour on a Win7 Pro SP1 x64 host looking very close to a memory leak in the MIP1 implementation. In other words, even after stopping to compute MIP1 WUs, a large among of RAM is not properly released, even several days after the last MIP1 WU has been computed. After a host reboot, everything is going well again. I succeeded to reproduce this observation twice last November. Since nobody at the tech and scientist side seems to take care of member's remark and observation about MIP1, I did not feel the need to report this issue until now. Cheers, Yves |
||
|
|
RTS48
Veteran Cruncher Bolivia Joined: Aug 2, 2009 Post Count: 1353 Status: Offline Project Badges:
|
Well - the saga continues. I am having to micro manage my WUs because 100% MIP is a disaster (50% is pretty bad too) I have 20 threads in 3 Macs. My oldest Mac which has only 4 threads is running SCC exclusively as it seizes up with MIP. The other two quad core (8 thread) Macs are set to run a maximum of 4 threads each of MIP when I make WUs available through device manager. I am still getting less than 20 points per hour CPU for MIP compared with nearer 25 points / h for SCC. I think that when I get my MIP Sapphire I will abandon the project entirely. Big shame!
----------------------------------------
Rod Peel
Santa Cruz Bolivia South America , ![]() |
||
|
|
JimWork
Cruncher Canada Joined: Oct 11, 2005 Post Count: 35 Status: Offline Project Badges:
|
I agree - this is one stingy stinker of a project. I'll stick with it until I get to 5yrs. I'm at 4:266:00 now and hope to cross the finish line in about ten days.
|
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
It is comforting to notice that I am not alone with my observations.
----------------------------------------However, why did nobody take these issues into account on scientist side? ... and provide some feedbacks? Cheers, Yves |
||
|
|
andgra
Senior Cruncher Sweden Joined: Mar 15, 2014 Post Count: 195 Status: Offline Project Badges:
|
I agree with u KerSamson!
----------------------------------------
/andgra
![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Sorry it's taken us a while to respond. Please don't leave, we appreciate the time ALL of you have invested in the project. It has taken us a while to reproduce and then really pin down what causes the problem.
The short version is that Rosetta, the program being used by the MIP to fold the proteins on all of your computers*, is pretty hungry when it comes to cache. A single instance of the program fits well in to a small cache. However, when you begin to run multiple instances there is more contention for that cache. This results in L3 cache misses and the CPU sits idle while we have to make a long trip to main memory to get the data we need. This behavior is common for programs that have larger memory requirements. It's also not something that we as developers often notice; we typically run on large clusters and use hundreds to thousands of cores in parallel on machines. Nothing seemed slower for us because we are always running in that regime. I don't know all of the details about how the points are assigned, and I don't know if/how the credit assignment will be modified. But I believe that issue stems from the fact that a single instance Rosetta is well behaved (very few cache misses) on most consumer chips, but on machines with smaller caches and few memory channels a second (or third or forth) instance cannot fit in to the caches and you see the run time scaling issues which result in fewer points/hour (i.e. if a single instance of Rosetta had these cache issues the scaling from one to multiple instances would not be as dramatic nor would the change in points/hour).** We are looking to see if if we can improve the cache behavior. Rosetta is ~2 million lines of C++ and improving the cache performance might involve changing some pretty fundamental parts. We have some ideas of where to start digging, but I can't make any promises. Long term, identifying these issues may end up improving Rosetta for everyone that uses it so pat yourselves on the back for that! Doug * a newer version of the program used in HPF1 & 2 ** this might be the case for machine with very small (less than 4MB) caches, it's just always slow |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Doug,
First off, I just see you flagged as "Cruncher", whereas I suspect you're actually a "Project Scientist" or similar, no? Maybe one of the WCG techs can get you properly identified within the forum system? Second, a big pat on the back from me for (a) looking into the problem and (b) taking the trouble to post with your findings. The fact that you're even considering changing the code is wonderful news! Also, knowing more accurately what the problem is may help people to ascertain what a reasonable number of parallel tasks is for their kit in the mean time. Thank you! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thanks for the update. Over the next 3 weeks I will be finishing up some current work and then will be moving about 280 threads to MIP. The plan is stay there for at least 90 days. I'm not contributing for the points so the credit drop doesn't bother me in the least. Good luck with the optimization activities
|
||
|
|
Dayle Diamond
Senior Cruncher Joined: Jan 31, 2013 Post Count: 452 Status: Offline Project Badges:
|
It's not really about a credit drop, Doneske, it's an efficiency drop.
The default mix of projects works out fine for MIP but when we you specialize all cores, your total throughput drops. Every four-hours your core spends doing MIP might be two hours for another cruncher. |
||
|
|
|