World Community Grid - View Thread - How full of load can I go?

World Community Grid Forums

Category: Active Research

Forum: Africa Rainfall Project

Thread: How full of load can I go?

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 22

[ ]

Author

This topic has been viewed 8588 times and has 21 replies

Bearcat
Master Cruncher
USA
Joined: Jan 6, 2007
Post Count: 2803
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

1 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

1 year badge for Influenza Antiviral Drug Search

5 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

5 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

10 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


How full of load can I go?

Just want to know if I run 10 out of 12 threads if the computer can handle it or will it choke?

----------------------------------------

Crunching for humanity since 2007!

[Nov 2, 2019 5:25:12 PM]

Dataman
Ace Cruncher
Joined: Nov 16, 2004
Post Count: 4865
Status: Offline
Project Badges:

2 year badge for Discovering Dengue Drugs - Together

5 year badge for Help Fight Childhood Cancer

2 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Drug Search for Leishmaniasis

2 year badge for Computing for Sustainable Water

2 year badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

45 day badge for Africa Rainfall Project


Re: How full of load can I go?

I doubt if it would "choke" your computer as the memory requirement is ~1 GB/wu but it may choke your network with the very large upload files especially when several ARP wu try to upload concurently. This is not a problem for those running only a few machines but as I recall you have a large farm.

I my case I am running 29 machines so my network is in a constant state of download/upload. When a long running upload occurs, other project's uploads go into a pending state and may also go into a backoff state for minutes (or hours).

It is a bit of a moot point as you probably will not get 10 anyway with the current distribution. wink

Cheers

----------------------------------------

[Nov 2, 2019 5:44:18 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

5 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: How full of load can I go?

TL;DR -- stick to a mixed WCG work-load, 10 of these might be very inefficient...

Dataman covers a lot of interesting stuff there, and things like network bandwidth and trying to avoid using swap-space are indeed significant anti-choking features. So if you choose a mixed WCG work-load you'll be fine. However, there are other ways to choke a machine, including too much disk I/O and intensive memory access thrashing L3 cache.

The effect of lots of L3 cache misses is especially obvious if one runs multiple MIP1 tasks at once. Not only do they wander around memory a lot (hence the misses) but they also have a higher proportion of instructions accessing main memory for data than [most of] the other current projects so the misses relate to a larger number of overall instructions! There are posts in the Microbiome Immunity Project forum on that very topic...

I suspect that if you try to run 10 of these at once (or a mix of these and MIP1 only - see below) you'll find they run a lot slower than if you only run a couple! This is because the number of L3 cache misses gets very high and the CPUs have more wait states so you get through less instructions per second.

ARP1 doesn't seem to have such a high proportion of data memory-accessing instructions so the effect of cache misses isn't as immediately noticeable - however, I suspect that there will be a number of simultaneous ARP1 tasks beyond which the run-time increases would become unacceptable. (And if/when I can accumulate enough of them at once to test things, I'll try to find out if no-one else does it beforehand!) As Dataman points out, you're unlikely to be able to collect 10 of these at a time at the moment unless you go out of your way to do so, so it probably won't be a problem!

Most other WCG projects don't tend to thrash L3 cache as much, even FAH2 and HST1.

I have used Linux performance monitoring tools on an Intel i7-7700K and an AMD Ryzen 3700X to dig into CPU utilization stats, so this is based on more than just reported run times...

The Intel box (4-core, 8-thread with 8MB L3 cache) is allowed at most 1 CPDN task, and at most 6 WCG tasks with a limit of 1 MIP1 and 1 ARP1 imposed via an app_config.xml file. (There's also GPU work going on, hence the "at most 6") That mix seems to run without MIP1 or ARP1 tasks suffering serious performance hits.

My Ryzen 3700X (8 cores, 16 threads, 32MB L3 cache divvied up as 4x8MB) is allowed double the above, and my observations on throughput are similar.

By the way, you may see a noticeable performance degradation on any applications that do large amounts of floating-point instructions if you enable hyperthreading - however, unless you also have a lot of cache-thrashing you'll probably manage to run more work with hyperthreading on, as performance is unlikely to drop by 50%!

Happy crunching - Al.

[Edited to re-order and rephrase some content.]
[Edited again in response to post from mdxi - "you'll also see" changed to "you may see" in acknowledgment of the fact my experiences were based on older hardware...]

----------------------------------------
[Edit 2 times, last edit by alanb1951 at Nov 4, 2019 3:22:08 AM]

[Nov 3, 2019 2:59:09 AM]

mdxi
Advanced Cruncher
Joined: Dec 6, 2017
Post Count: 109
Status: Offline
Project Badges:

100 year badge for Mapping Cancer Markers

5 year badge for FightAIDS@Home - Phase 2


Re: How full of load can I go?

By the way, you'll also see a noticeable performance degradation on any applications that do large amounts of floating-point instructions if you enable hyperthreading - however, unless you also have a lot of cache-thrashing you'll probably manage to run more work with hyperthreading on, as performance is unlikely to drop by 50%!

I'd like to provide some actual data around this statement. It's something that gets repeated a good deal, but usually as an anecdote.

Earlier this year I benchmarked ZIKA, FAH2, and MCM1. I did 24 hour runs with SMT/HT off, and another 24 hours with it on.

The shortest possible takeaway is that there was never any degradation of performance on WCG tasks from SMT being enabled.

I tested on the Ryzen 1600 and two configurations of Ryzen 2700 (stock and underclocked/undervolted). The smallest SMT performance uplift was 1.16X (2700, low-power, ZIKA). The largest uplift was 1.43X (2700, low-power, FAH2). The average uplift across all 9 runs was 1.28X.

I did not benchmark MIP1 this way, because it is well understood that the Rosetta suite used by MIP will cause cache thrashing once you exceed approximately (MB_OF_L3CACHE / 4) concurrent WUs.

I did test some non-WCG software. I tested the Stockfish chess engine, which was incredibly parallelizable. Its lowest SMT uplift was 1.4X, and the highest was 1.46X.

Finally, I tested the OpenFOAM computational fluid dynamics package. CFD is about as non-linear and FP-heavy as it gets, so if SMT was going to have a negative effect anywhere, you would expect to see it here. And I actually did -- sometimes.

Going from 8 threads to 16 threads on the 2700 resulted in a 4.2% performance degradation. Going from 12 to 24 threads on the 3900X slowed things down by 1% (and I'm not rounding down there; the actual timings on the benchmark were 23.86s for 12 threads vs 24.00s for 24 threads). However, bizarrely, the 1600 bucked the trend with a 3.5% speedup when going from 6 to 12 threads. I don't know how, but I ran it twice and got the same numbers both times.

So yes. If you are doing physics-based simulations with a complexity on the order of describing the flow of compressible fluids around non-compressible objects, then you MAY see some VERY small slowdowns on modern hardware. For almost everything else, expect to simply get free performance by using all your threads.

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by mdxi at Nov 3, 2019 6:35:14 AM]

[Nov 3, 2019 6:32:43 AM]

hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:

45 day badge for Help Cure Muscular Dystrophy

1 year badge for Outsmart Ebola Together

90 day badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project


Re: How full of load can I go?

@mdxi, fascinating results, and thanks for doing all those tests. I'd be interested in also measuring power consumption and CPU temperature with HT/SMT off and then on.

----------------------------------------

i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[Nov 3, 2019 9:18:37 AM]

fuzzydice555
Advanced Cruncher
Joined: Mar 25, 2015
Post Count: 89
Status: Offline
Project Badges:

90 day badge for The Clean Energy Project - Phase 2

1 year badge for Uncovering Genome Mysteries

10 year badge for Smash Childhood Cancer

2 year badge for Microbiome Immunity Project

50 year badge for OpenPandemics - COVID-19


Re: How full of load can I go?

I tested HT on/off power consumption on an older machine (Xeon X5650).

The result was that the power consumption increase was exactly the same as the points increase, something like +40% points = +40% power.

Hopefully hyperthreading got better over time, so I haven't tested any newer chips biggrin

----------------------------------------

[Nov 3, 2019 12:54:31 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:


Re: How full of load can I go?

@mdxi
Thanks for lots of interesting data. You've been testing on different (and newer) machines than the ones on which I based my hyperthread observations, so I'm quite prepared to believe things are better now!

I'd like to provide some actual data around this statement. It's something that gets repeated a good deal, but usually as an anecdote.

My comment was based on experimentation, by the way, but several years ago (pre-MIP1, and before I was familiar with any ways to do per-process performance monitoring!); the comment itself could've been made less "doom and gloom", and I've taken the liberty of changing it (and acknowledging that I've changed it...)

One question on your information - you refer to uplift, and I'd like to be sure I understand your numbers; are those numbers the amount of extra work you got by running twice as many threads or a measure of how much faster individual jobs ran when not hyperthreading, or am I completely misunderstanding? (That wouldn't be a surprise...)

Once again, thank you!

Cheer - Al.

[Nov 4, 2019 3:57:54 AM]

mdxi
Advanced Cruncher
Joined: Dec 6, 2017
Post Count: 109
Status: Offline
Project Badges:


Re: How full of load can I go?

@mdxi
One question on your information - you refer to uplift, and I'd like to be sure I understand your numbers; are those numbers the amount of extra work you got by running twice as many threads or a measure of how much faster individual jobs ran when not hyperthreading

It's the ratio of WUs completed in 24 hours with SMT vs the WUs completed in 24 hours without SMT. So 15 WUs with SMT versus 10 WUs without, would be a 1.5X uplift.

I have all this in a document, but I need to re-measure the 3900X power consumption numbers in it. The power numbers are too low because they predate people figuring out the 3900's voltage droop/CPU frequency micro-stutter behavior.

All the benchmarking and performance data should be correct though. If you're interested, the doc is here. Just skip the undervolting/underclocking sections. As I said before, they are overly optimistic and not in line with real-world usage.

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by mdxi at Nov 4, 2019 6:19:45 AM]

[Nov 4, 2019 6:12:53 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: How full of load can I go?

Thanks for all the data mdxi!
I would be really interested in the remeasured power consuption data of the 3900X.
How about a comments section on your website, to be able to discuss things there?

[Nov 4, 2019 7:39:38 AM]

mdxi
Advanced Cruncher
Joined: Dec 6, 2017
Post Count: 109
Status: Offline
Project Badges:


Re: How full of load can I go?

Thanks for all the data mdxi!
I would be really interested in the remeasured power consuption data of the 3900X.
How about a comments section on your website, to be able to discuss things there?

I haven't plugged one back up to the killawatt, but I can tell you that where I finally got everything stable was 3.4GHz with a Vcore of 1.01875. And what I mean by "stable" here is "the clocks hold at 3.39GHz under full load". At that clock and voltage, with the stock cooler, temperatures are between 58C and 62C.

Next time I clean dust out of the HSF, I'll plug in the killawatt and get a power usage number.

To your other point: my website is built with a static generator, so comments aren't a thing. Sorry!

----------------------------------------

[Nov 4, 2019 4:51:33 PM]

[ ]