World Community Grid - View Thread - sigificant credit drop - only for me or did someone else see this?

World Community Grid Forums

Category: Completed Research

Forum: Microbiome Immunity Project

Thread: sigificant credit drop - only for me or did someone else see this?

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 117

[ ]

Author

This topic has been viewed 502281 times and has 116 replies

andgra
Senior Cruncher
Sweden
Joined: Mar 15, 2014
Post Count: 195
Status: Offline
Project Badges:

2 year badge for The Clean Energy Project - Phase 2

200 year badge for Mapping Cancer Markers

10 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project

20 year badge for Africa Rainfall Project

200 year badge for OpenPandemics - COVID-19


Re: sigificant credit drop - only for me or did someone else see this?

Must say I'm dissapointed at the Linux implementation of MIP.
Pulling my Linux cores out of MIP to work on other projects. Windows cores are ok here and could stay.
Hopefully we will get some respons from techs or scientists on the progress of this issue.

----------------------------------------

/andgra

[Jan 8, 2018 6:51:30 AM]

KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

180 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

5 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

20 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

100 year badge for Mapping Cancer Markers

50 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

50 year badge for OpenPandemics - COVID-19


Re: sigificant credit drop - only for me or did someone else see this?

Even on Windows, MIP1 does not run optimal.
I've noticed a strange behaviour on a Win7 Pro SP1 x64 host looking very close to a memory leak in the MIP1 implementation. In other words, even after stopping to compute MIP1 WUs, a large among of RAM is not properly released, even several days after the last MIP1 WU has been computed. After a host reboot, everything is going well again. I succeeded to reproduce this observation twice last November.
Since nobody at the tech and scientist side seems to take care of member's remark and observation about MIP1, I did not feel the need to report this issue until now.
Cheers,
Yves

----------------------------------------

Décrypthon team progress - KerSamson's contribution

[Jan 8, 2018 2:57:51 PM]

RTS48
Veteran Cruncher
Bolivia
Joined: Aug 2, 2009
Post Count: 1353
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

90 day badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for Computing for Clean Water

5 year badge for GO Fight Against Malaria

1 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

10 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: sigificant credit drop - only for me or did someone else see this?

Well - the saga continues. I am having to micro manage my WUs because 100% MIP is a disaster (50% is pretty bad too) I have 20 threads in 3 Macs. My oldest Mac which has only 4 threads is running SCC exclusively as it seizes up with MIP. The other two quad core (8 thread) Macs are set to run a maximum of 4 threads each of MIP when I make WUs available through device manager. I am still getting less than 20 points per hour CPU for MIP compared with nearer 25 points / h for SCC. I think that when I get my MIP Sapphire I will abandon the project entirely. Big shame!

----------------------------------------

Rod Peel
Santa Cruz
Bolivia
South America

[Feb 28, 2018 1:24:50 PM]

JimWork
Cruncher
Canada
Joined: Oct 11, 2005
Post Count: 35
Status: Offline
Project Badges:

1 year badge for Help Fight Childhood Cancer

1 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

90 day badge for Computing for Sustainable Water

10 year badge for Mapping Cancer Markers

5 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

5 year badge for Africa Rainfall Project


Re: sigificant credit drop - only for me or did someone else see this?

I agree - this is one stingy stinker of a project. I'll stick with it until I get to 5yrs. I'm at 4:266:00 now and hope to cross the finish line in about ten days.

[Feb 28, 2018 11:38:35 PM]

KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:


Re: sigificant credit drop - only for me or did someone else see this?

It is comforting to notice that I am not alone with my observations.
However, why did nobody take these issues into account on scientist side? ... and provide some feedbacks?
Cheers,
Yves

----------------------------------------

Décrypthon team progress - KerSamson's contribution

[Mar 1, 2018 7:56:32 AM]

andgra
Senior Cruncher
Sweden
Joined: Mar 15, 2014
Post Count: 195
Status: Offline
Project Badges:


Re: sigificant credit drop - only for me or did someone else see this?

I agree with u KerSamson!

----------------------------------------

/andgra

[Mar 2, 2018 10:54:12 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: sigificant credit drop - only for me or did someone else see this?

Sorry it's taken us a while to respond. Please don't leave, we appreciate the time ALL of you have invested in the project. It has taken us a while to reproduce and then really pin down what causes the problem.

The short version is that Rosetta, the program being used by the MIP to fold the proteins on all of your computers*, is pretty hungry when it comes to cache. A single instance of the program fits well in to a small cache. However, when you begin to run multiple instances there is more contention for that cache. This results in L3 cache misses and the CPU sits idle while we have to make a long trip to main memory to get the data we need. This behavior is common for programs that have larger memory requirements. It's also not something that we as developers often notice; we typically run on large clusters and use hundreds to thousands of cores in parallel on machines. Nothing seemed slower for us because we are always running in that regime.

I don't know all of the details about how the points are assigned, and I don't know if/how the credit assignment will be modified. But I believe that issue stems from the fact that a single instance Rosetta is well behaved (very few cache misses) on most consumer chips, but on machines with smaller caches and few memory channels a second (or third or forth) instance cannot fit in to the caches and you see the run time scaling issues which result in fewer points/hour (i.e. if a single instance of Rosetta had these cache issues the scaling from one to multiple instances would not be as dramatic nor would the change in points/hour).**

We are looking to see if if we can improve the cache behavior. Rosetta is ~2 million lines of C++ and improving the cache performance might involve changing some pretty fundamental parts. We have some ideas of where to start digging, but I can't make any promises.

Long term, identifying these issues may end up improving Rosetta for everyone that uses it so pat yourselves on the back for that!

Doug

* a newer version of the program used in HPF1 & 2
** this might be the case for machine with very small (less than 4MB) caches, it's just always slow

[Mar 7, 2018 10:08:00 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: sigificant credit drop - only for me or did someone else see this?

Doug,

First off, I just see you flagged as "Cruncher", whereas I suspect you're actually a "Project Scientist" or similar, no? Maybe one of the WCG techs can get you properly identified within the forum system?

Second, a big pat on the back from me for (a) looking into the problem and (b) taking the trouble to post with your findings. The fact that you're even considering changing the code is wonderful news! Also, knowing more accurately what the problem is may help people to ascertain what a reasonable number of parallel tasks is for their kit in the mean time.

Thank you!

[Mar 7, 2018 10:37:03 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: sigificant credit drop - only for me or did someone else see this?

Thanks for the update. Over the next 3 weeks I will be finishing up some current work and then will be moving about 280 threads to MIP. The plan is stay there for at least 90 days. I'm not contributing for the points so the credit drop doesn't bother me in the least. Good luck with the optimization activities

[Mar 7, 2018 11:01:36 PM]

Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 452
Status: Offline
Project Badges:

1 year badge for The Clean Energy Project - Phase 2

14 day badge for Drug Search for Leishmaniasis

10 year badge for Smash Childhood Cancer

2 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: sigificant credit drop - only for me or did someone else see this?

It's not really about a credit drop, Doneske, it's an efficiency drop.
The default mix of projects works out fine for MIP but when we you specialize all cores, your total throughput drops.

Every four-hours your core spends doing MIP might be two hours for another cruncher.

[Mar 8, 2018 2:23:44 AM]

[ ]