World Community Grid - View Thread - Which synthetic CPU test best represents WCG tasks?

World Community Grid Forums

Category: Community

Forum: Hardware Chat Room

Thread: Which synthetic CPU test best represents WCG tasks?

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 9

[ ]

Author

This topic has been viewed 2915 times and has 8 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Which synthetic CPU test best represents WCG tasks?

Using the tool stress-ng, which CPU testing suite(s) would best reflect the type of workload the WCG tasks place on the CPU? Not one specific sub-project but generically, as I run all subprojects shared on the same CPUs.

https://kernel.ubuntu.com/~cking/stress-ng/

A handful of Google hits seem to think the "matrix" test is a good general one, it's description:

maxtrixprod: matrix product of two 128 * 128 matrices of double floats. Testing on 64 bit x86 hardware shows that this is provides a good mix of memory, cache and floating point operations and is probably the best CPU method to use to make a CPU run hot.

If you look at the above link there's a million and one types of tests which could be relevant, would appreciate recommendations on choosing the most appropriate type of operations to gain the best value in results.

[May 21, 2019 4:17:58 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Which synthetic CPU test best represents WCG tasks?

The best 'Synthetic' test is build right into BOINC and lasts 30 seconds. It measures Whetstone and Dhrystone (Float / Integer), and form the base upon which points are awarded per unit of contributed computing time.

Up front, integer type calculations perform drastically better on Linux/Mac unix based OSses, up to 60% greater throughput.

[May 21, 2019 5:55:24 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Which synthetic CPU test best represents WCG tasks?

I respectfully disagree they are the "best" - the results are wildly off target with actual real life results when sampled across CPUs of varying ages and types. I've pretty much learned through trial and error ("why are these E5-2660v2 numbers so much lower than the E5645?!") they are rather useless to actually judge a CPU's performance and return results of the actual work units themselves.

Not to mention they measure a single thread on a single core, which is wildly different when you fire up all 20 threads at once on a 10-core Xeon - thermal / cooling is now in play and a ton of other factors. Apps like stress-ng fire up a 1-per-thread process and perform a much larger, more in depth and broad set of tests that show what real life is like when you start cooking the sockets and provide realistic simulations. It's the same reason you use fio instead of dd when performing disk I/O tests, real life is random and messy and noisy.

[May 21, 2019 6:52:55 PM]

hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:

45 day badge for Help Cure Muscular Dystrophy

20 year badge for Mapping Cancer Markers

1 year badge for Outsmart Ebola Together

90 day badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Which synthetic CPU test best represents WCG tasks?

@xithryx, would almost need to look at the individual science applications to see what kinds of CPU functions are typically used to then make a best estimated guess as to which synthetic tests in stress-ng come the closest. HSTB uses GROMACS. OpenZika uses AutoDock Vina. MIP uses Rosetta. Apologies, but I lack the level of detailed exposure to be able to help any further.

----------------------------------------

i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[May 23, 2019 1:51:43 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Which synthetic CPU test best represents WCG tasks?

There's a french website, WUProp, that does detailed analysis of all projects generated job/cpu data provided by those who've got their client connected to their BOINC project. It's a low/no load project (NIC) that just captures job stats, CPU, OS etc. That one tells which CPU gives the best points per job type.

Edit: Here http://wuprop.boinc-af.org/results/delai.py

----------------------------------------
[Edit 1 times, last edit by Former Member at May 23, 2019 1:45:01 PM]

[May 23, 2019 1:43:00 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Which synthetic CPU test best represents WCG tasks?

Apologies for the delay, I don't seem to get email notification of thread replies. @hchc that's a fantastic lead, for GROMACS I found their download of code and their benchmark suite (I'm unfamiliar with the details right now, the benchmark tarball has files ending in .gro/.top/.mdp which feel like payloads to the main app), they're downloading to a test laptop now @ 32.7KB/s :)

Logistically this helps solve where I'm going if it works - if I can use this benchmark suite (get the app to compile, run, etc.) and take a baseline with the latest upstream-default kernel, I can then easily test with/without the kernel Spectre/Meltdown/MDS mitigations deactivated and have a good set of comparison data (like-for-like).

The spirit is to help figure out if the KPTI hit(s) actually affect these specific WCG workloads and if there's any measurable value in enabling/disabling a specific mitigation to increase WU output. For purpose built machines with no external access (other than SSH), the mitigations on/off are not important for real life needs.

Edit: I found the tiny "Watch" icon at the top of the thread, should get replies to email now. Woulda thought that was the default for self-created posts...

----------------------------------------
[Edit 1 times, last edit by xithryx at May 24, 2019 2:08:03 PM]

[May 24, 2019 2:05:24 PM]

Jean-David Beyer
Senior Cruncher
USA
Joined: Oct 2, 2007
Post Count: 339
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

90 day badge for Help Fight Childhood Cancer

14 day badge for Influenza Antiviral Drug Search

45 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

90 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

10 year badge for Mapping Cancer Markers

90 day badge for Uncovering Genome Mysteries

180 day badge for Outsmart Ebola Together

1 year badge for FightAIDS@Home - Phase 2

1 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

2 year badge for OpenPandemics - COVID-19


Re: Which synthetic CPU test best represents WCG tasks?

I do not know anything about the Dhrystone benchmark, but I do know a lot about the Whetstone one. This one is widely believed to be a good test of floating point calculation, but actually it is just awful.

Whetstone is comprised of a bunch of loops, each of which is designed to test some aspect of the machine and its compilation system. These were selected based on the frequency of use as instrumented by the Whetstone Algol 60 compiler system. Typically each item under test was put into a loop executed (IIRC) 10,000 times. One of these loops does a bunch of floating point operations. Hence people thought it measured floating point operations. Actually this loop was meant to measure the call and return overhead, and the floating point operations were not of interest.

BUT this is just an illusion. Modern compilers do extensive optimization. I was involved in the design and implementation of the C compiler optimizer when I was working at Bell Labs. And we absolutely clobbered that loop. The significant optimizations we did were first:

I moved all the floating point operations outside of the loop, since the calculations were invariant and did not need to be performed more than once. Then a live-dead analysis removed the loop code and the single set of floating point instructions since the values of the calculations were not used. So any compiler written after about 1980 would do the same. It, basically, does not test floating point operations at all.

----------------------------------------

[Jun 13, 2019 1:18:04 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Which synthetic CPU test best represents WCG tasks?

The test is supposed to approach real life use environment and not some artificial clean environment. You'd see quite a bit of difference if a system is truly idle are actually used beyond web browsing. Integer libs on Linux are multiple times more optimized than in the Windows environ, hence why running BOINC on nix shows much larger throughput, for some sciences double.

[Jun 13, 2019 1:35:05 PM]

Jean-David Beyer
Senior Cruncher
USA
Joined: Oct 2, 2007
Post Count: 339
Status: Offline
Project Badges:


Re: Which synthetic CPU test best represents WCG tasks?

Integer libs on Linux are multiple times more optimized than in the Windows environ,

For sure, and the floating point libraries as well. The man who wrote the most recent versions of these libraries (by recent, I mean around 1985) had the office next to mine. He hand coded them in assembler for the machines we were using (mainly pdp-11/45s and 3B-20 machines). I think we did them for the SPARC as well, but I am not sure about that.

----------------------------------------

[Jun 13, 2019 2:09:35 PM]

[ ]