World Community Grid - View Thread - sigificant credit drop - only for me or did someone else see this?

World Community Grid Forums

Category: Completed Research

Forum: Microbiome Immunity Project

Thread: sigificant credit drop - only for me or did someone else see this?

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 117

[ ]

Author

This topic has been viewed 502263 times and has 116 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: sigificant credit drop - only for me or did someone else see this?

Most of my points come from 2 dual xeon 40c/80t machines which each have 16gb of ram and work super well.
Switching to 100% MIP resulted in massive drops in points and runtime too. I can see the points drop that everyone is talking about, but I didn't see where the drop in runtime came from.
Maybe there was a fight for resources? I don't really know.

Anyway, if I adjust the config files to run a limited number of WUs, how many should I run max without affecting performance? I've seen a lot of talk about 2, but with my rigs, would 4 be ok?

Running 1/4 of the threads on MIP should be ok, so 20 units on your 80t machines. What I do not know, how a dual processor system will affect this. This will only work, if half of the MIP units run on one processor, the rest on the other. Not sure if this will work automatically.
What I would try is to start with zero MIP units, monitor the CPU temps and slowly increase the number of MIPs running. Once the CPU temps decrease noticably, this is a sign the CPU is underutilized as the L3 cache is not sufficient for the cache hungry MIP units.

[Jun 22, 2018 8:23:48 AM]

Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2391
Status: Offline
Project Badges:

200 year badge for Mapping Cancer Markers

50 year badge for Outsmart Ebola Together

100 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

100 year badge for Microbiome Immunity Project

100 year badge for Africa Rainfall Project

200 year badge for OpenPandemics - COVID-19


Re: sigificant credit drop - only for me or did someone else see this?

Sorry it's taken us a while to respond. Please don't leave, we appreciate the time ALL of you have invested in the project. It has taken us a while to reproduce and then really pin down what causes the problem.

The short version is that Rosetta, the program being used by the MIP to fold the proteins on all of your computers*, is pretty hungry when it comes to cache. A single instance of the program fits well in to a small cache. However, when you begin to run multiple instances there is more contention for that cache. This results in L3 cache misses and the CPU sits idle while we have to make a long trip to main memory to get the data we need. This behavior is common for programs that have larger memory requirements. It's also not something that we as developers often notice; we typically run on large clusters and use hundreds to thousands of cores in parallel on machines. Nothing seemed slower for us because we are always running in that regime.

I don't know all of the details about how the points are assigned, and I don't know if/how the credit assignment will be modified. But I believe that issue stems from the fact that a single instance Rosetta is well behaved (very few cache misses) on most consumer chips, but on machines with smaller caches and few memory channels a second (or third or forth) instance cannot fit in to the caches and you see the run time scaling issues which result in fewer points/hour (i.e. if a single instance of Rosetta had these cache issues the scaling from one to multiple instances would not be as dramatic nor would the change in points/hour).**

We are looking to see if if we can improve the cache behavior. Rosetta is ~2 million lines of C++ and improving the cache performance might involve changing some pretty fundamental parts. We have some ideas of where to start digging, but I can't make any promises.

Long term, identifying these issues may end up improving Rosetta for everyone that uses it so pat yourselves on the back for that!

Doug

* a newer version of the program used in HPF1 & 2
** this might be the case for machine with very small (less than 4MB) caches, it's just always slow

This post should be pinned as the first thing in this forum until the code is fixed. Lots of electricity is being squandered.

----------------------------------------

...KRI please cancel all shadow-banning

[Mar 19, 2019 9:39:41 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: sigificant credit drop - only for me or did someone else see this?

It will be interesting to see if the ryzen 3900x with its large L3 cache will offer help with this. I'm considering purchasing one, depending on reviews when it comes out. My i5-4590 is currently running 4 tasks at a time with an average completion time of 1 hour 31 minutes.

[Jun 28, 2019 9:00:08 PM]

Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2391
Status: Offline
Project Badges:


Re: sigificant credit drop - only for me or did someone else see this?

FX-8370 L3=8 MB so only 1 in 8 CPUs.
Xeon E5-2673v3 L3=30 MB so only 6 of 24 CPUs.
Xeon E5-2699v4 L3=55 MB so only 11 of 44 CPUs.
Ryzen 9 3950X L3=72MB so only 14 of 32 CPUs.

It's socially irresponsible for them to not fix this problem.

----------------------------------------

...KRI please cancel all shadow-banning

[Jun 28, 2019 11:30:57 PM]

Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1326
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

14 day badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

90 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

10 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: sigificant credit drop - only for me or did someone else see this?

Ryzen 9 3950X L3=72MB so only 14 of 32 CPUs.

OT Lucky as far as I am aware these have not been released to the public yet If you are happy/allowed to provide some details I would a appreciate some details about the CPU in Choosing a high end CPU /OT

----------------------------------------

[Jun 29, 2019 6:42:11 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: sigificant credit drop - only for me or did someone else see this?

Here you go:
https://www.amd.com/en/products/cpu/amd-ryzen-9-3950x

The recommended retail price will be 749 $ and it be available in september (not hard to find out with a quick google search).

[Jun 29, 2019 9:02:47 AM]

PowerFactor
Ace Cruncher
Joined: Dec 9, 2016
Post Count: 4033
Status: Offline
Project Badges:

20 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2


Re: sigificant credit drop - only for me or did someone else see this?

It looks like the following equation applies: biggrin

Max_core_count_MIP = floor( cpu_L3_cache_size_MB / 5 );

[Jun 29, 2019 2:18:16 PM]

Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1326
Status: Offline
Project Badges:


Re: sigificant credit drop - only for me or did someone else see this?

Here you go:
https://www.amd.com/en/products/cpu/amd-ryzen-9-3950x

The recommended retail price will be 749 $ and it be available in september (not hard to find out with a quick google search).

Yes I am aware of that information thank you I am assuming it is US pricing so for me in New Zealand currently it would cost $1137.95 NZ$1 will buy 66 US cents. It wasn't quite what I was getting at either. Thanks for the information

----------------------------------------

[Jun 30, 2019 12:40:17 AM]