| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 8
|
|
| Author |
|
|
Aurum
Master Cruncher The Great Basin Joined: Dec 24, 2017 Post Count: 2391 Status: Offline Project Badges:
|
Some CPUs are faster at Integer operations and others at Floating Point operations.
----------------------------------------Which BOINC projects belong in which category and how do you know??? ![]() ...KRI please cancel all shadow-banning |
||
|
|
Martin Schnellinger
Advanced Cruncher Joined: Apr 29, 2007 Post Count: 128 Status: Offline Project Badges:
|
An intelligent and interesting question ideed.
I thought about a way to find out by testing. But it seems quite difficult. Two crunchers with an identical machine (same CPU, same operating system) would have to run an identical work unit).......no, this test would not help. Checked the info about work unit properties given in the BOINC manager. Only the estimated numer of GFLOPs is given, no word about interger operations. At least in the case of the open pandemics project this is the case. I assume, that it is the same with other projects, but cannot check right now, running only open pandemic at the moment. In Vikipedia there is info about the power of different CPUs, but as far as I see it is only given in GFLOPs. The table here https://en.wikipedia.org/wiki/FLOPS#FLOPS_per_cycle_for_various_processors says as follows: FLOPS per cycle for various processors Microarchitecture ISA FP64 FP32 FP16 Intel Atom (Bonnell, Saltwell, Silvermont and Goldmont) SSE3 (64-bit) 2 4 0 Intel Core (Merom, Penryn) Intel Nehalem[7] (Nehalem, Westmere) SSE4 (128-bit) 4 8 0 Intel Sandy Bridge (Sandy Bridge, Ivy Bridge) AVX (256-bit) 8 16 0 Intel Haswell[7] (Haswell, Devil's Canyon, Broadwell) Intel Skylake (Skylake, Kaby Lake, Coffee Lake, Whiskey lake, Amber lake) AVX2 & FMA (256-bit) 16 32 0 Intel Xeon Phi (Knights Corner) SSE & FMA (256-bit) 16 32 0 Intel Skylake-X Intel Xeon Phi (Knights Landing, Knights Mill) AVX-512 & FMA (512-bit) 32 64 0 AMD Bobcat AMD64 (64-bit) 2 4 0 AMD Jaguar AMD Puma AVX (128-bit) 4 8 0 AMD K10 SSE4/4a (128-bit) 4 8 0 AMD Bulldozer[7] (Piledriver, Steamroller, Excavator) AVX (128-bit) Bulldozer-Steamroller AVX2 (128-bit) Excavator FMA3 (Bulldozer)[8] FMA3/4 (Piledriver-Excavator) 4 8 0 AMD Zen (Ryzen 1000 series, Threadripper 1000 series, Epyc Naples) AMD Zen+[7][9][10][11] (Ryzen 2000 series, Threadripper 2000 series) AVX2 & FMA (128-bit, 256-bit decoding)[12] 8 16 0 AMD Zen 2[13] (Ryzen 3000 series, Threadripper 3000 series, Epyc Rome)) AMD Zen 3 (Ryzen 5000 series) AVX2 & FMA (256-bit) 16 32 0 ARM Cortex-A7, A9, A15 ARMv7 1 8 0 ARM Cortex-A32, A35, A53, A55, A72, A73, A75 ARMv8 2 8 0 ARM Cortex-A57[7] ARMv8 4 8 0 ARM Cortex-A76, A77 ARMv8 8 16 0 Qualcomm Krait ARMv8 1 8 0 Qualcomm Kryo (1xx - 3xx) ARMv8 2 8 0 Qualcomm Kryo (4xx - 5xx) ARMv8 8 16 0 Samsung Exynos M1 and M2 ARMv8 2 8 0 Samsung Exynos M3 and M4 ARMv8 3 12 0 IBM PowerPC A2 (Blue Gene/Q) ? 8 8 (as FP64) 0 Hitachi SH-4[14][15] SH-4 1 7 0 Nvidia Fermi (only GeForce GTX 465–480, 560 Ti, 570-590) PTX 1/4 (locked by driver, 1 in hardware) 2 0 Nvidia Fermi (only Quadro 600-2000) PTX 1/8 2 0 Nvidia Fermi (only Quadro 4000–7000, Tesla) PTX 1 2 0 Nvidia Kepler (GeForce (except Titan and Titan Black), Quadro (except K6000), Tesla K10) PTX 1/12 (for GK110: locked by driver, 2/3 in hardware) 2 0 Nvidia Kepler (GeForce GTX Titan and Titan Black, Quadro K6000, Tesla (except K10)) PTX 2/3 2 0 Nvidia Maxwell Nvidia Pascal (all except Quadro GP100 and Tesla P100) PTX 1/16 2 1/32 Nvidia Pascal (only Quadro GP100 and Tesla P100) PTX 1 2 4 Nvidia Volta[16] PTX 1 2 (FP32) + 2 (INT32) 16 Nvidia Turing (only GeForce 16XX) PTX 1/16 2 (FP32) + 2 (INT32) 4 Nvidia Turing (all except GeForce 16XX) PTX 1/16 2 (FP32) + 2 (INT32) 16 Nvidia Ampere[17][18] (only A100) PTX 2 2 (FP32) + 2 (INT32) 32 Nvidia Ampere (only GeForce) PTX 1/32 2 (FP32) + 0 (INT32) or 1 (FP32) + 1 (INT32) 16 AMD GCN (only Radeon Pro WX 2100-7100) GCN 1/8 2 2 AMD GCN (all except Radeon VII, Instinct MI50 and MI60, Radeon Pro WX 2100-7100) GCN 1/8 2 4 AMD GCN Vega 20 (only Radeon VII) GCN 1/2 (locked by driver, 1 in hardware) 2 4 AMD GCN Vega 20 (only Radeon Instinct MI50 / MI60 and Radeon Pro VII) GCN 1 2 4 AMD RDNA[19][20] RDNA 1/8 2 4 Graphcore Colossus GC2[21][22][23] (values estimated) ? 0 18 72 Graphcore Colossus GC200 Mk2[24] (values estimated) ? 0 18 144 [25] No word about the power of CPUs to do interger operations.......strange. I found out, that CPU do have a so called "integer range" Vikipedia says: "Integer range Every CPU represents numerical values in a specific way. For example, some early digital computers represented numbers as familiar decimal (base 10) numeral system values, and others have employed more unusual representations such as ternary (base three). Nearly all modern CPUs represent numbers in binary form, with each digit being represented by some two-valued physical quantity such as a "high" or "low" voltage.[f] A six-bit word containing the binary encoded representation of decimal value 40. Most modern CPUs employ word sizes that are a power of two, for example 8, 16, 32 or 64 bits. Related to numeric representation is the size and precision of integer numbers that a CPU can represent. In the case of a binary CPU, this is measured by the number of bits (significant digits of a binary encoded integer) that the CPU can process in one operation, which is commonly called word size, bit width, data path width, integer precision, or integer size. A CPU's integer size determines the range of integer values it can directly operate on.[g] For example, an 8-bit CPU can directly manipulate integers represented by eight bits, which have a range of 256 (28) discrete integer values." But I could not find a source telling which CPU on the market has a big or a small "integer range" Probably a CPU with 64bit architecture has a better integer range than a CPU with 8bit architecture and should be better at integer operations. But I am not sure, I am not a computer specialist. Just wanted to contribute the best I could. All the best to everyone, stay healthy. |
||
|
|
flynryan
Senior Cruncher United States Joined: Aug 15, 2006 Post Count: 235 Status: Offline Project Badges:
|
You can do a CPU benchmark within BOINC to find out your processors performance.
Here is mine on a Ryzen 3950X system: 10/14/2020 10:03:15 AM | | Suspending computation - CPU benchmarks in progress 10/14/2020 10:03:46 AM | | Benchmark results: 10/14/2020 10:03:46 AM | | Number of CPUs: 32 10/14/2020 10:03:46 AM | | 4752 floating point MIPS (Whetstone) per CPU 10/14/2020 10:03:46 AM | | 17368 integer MIPS (Dhrystone) per CPU As far as which project use which type, my guess is that each project uses some of each, it's probably not full one or the other. Which ones utilize more of each, I couldn't be sure though. |
||
|
|
Aurum
Master Cruncher The Great Basin Joined: Dec 24, 2017 Post Count: 2391 Status: Offline Project Badges:
|
17368 integer MIPS (Dhrystone) per CPU I'd try this one again. I don't have any Ryzens but for that FP 4752 I'd guess that Integer should be around 100,000. I've been manually suspending work and running CPU Benchmarks twice in a row. I have a couple of cases where I have several of the same CPU and one of them will have FP be 5 to 10x too low. Might be the motherboard? Maybe I need a BIOS update? I haven't figured it out yet. I'm making a table of my benchmarks. I wish I knew how to make those nice tables like adri but I'll probably post screenshots later. ![]() ...KRI please cancel all shadow-banning |
||
|
|
Aurum
Master Cruncher The Great Basin Joined: Dec 24, 2017 Post Count: 2391 Status: Offline Project Badges:
|
An intelligent and interesting question indeed. That's the nicest thing anyone's said to me in ages :-)I just added some X299 CPUs and I thought for sure I knew which one would run ARP the fastest but I was exactly upside down according to my eyeball estimates. ARP runs faster for higher Integer BMs. Then I saw something that made me think maybe they don't even compile for high-end CPUs: https://www.nas.nasa.gov/hecc/support/kb/casc...0operations%20per%20cycle. "In addition to the instruction sets SSE, SSE2, SSE3, Supplemental SSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, and AVX512[F,CD,BW,DQ,VL], which are available in its Skylake predecessor, Cascade Lake also includes the new AVX-512 Vector Neural Network Instructions (VNNI), which provide significant, more efficient deep-learning inference acceleration. With 512-bit floating-point vector registers and two floating-point functional units, each capable of Fused Multiply-Add (FMA), a Cascade Lake core can deliver 32 double-precision floating-point operations per cycle." 32 DP FLOPs per cycle is double what we're used to. Anyone know if ARP or any other project is compiled for VNNI??? ![]() ...KRI please cancel all shadow-banning[Edit 3 times, last edit by Aurum420 at Oct 14, 2020 7:10:04 PM] |
||
|
|
Falconet
Master Cruncher Portugal Joined: Mar 9, 2009 Post Count: 3315 Status: Offline Project Badges:
|
On Linux, the Integer from the BOINC benchmark is a lot higher than on Windows while FP seems higher on Linux than on Windows.
----------------------------------------For example, my Ryzen 1400: Windows 10: 3986.69 million ops/sec 14783.9 million ops/sec LInux Mint 20 (20.04 based): 5458.53 million ops/sec 60194.52 million ops/sec Of note, on Ubuntu 18.04 or distros based on that, the integer number is a lot higher than it is on Ubuntu 20.04. My Ryzen would probably get 90,000-ish on 18.04. I did not notice any negative performance differences from 18.04 to 20.04 despite the lower integer number on 20.04. I did think 20.04 was doing SCC tasks a bit faster than 18.04 back in April, but I didn't really make a consistent comparison. ![]() - AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W - AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W - AMD Ryzen 7 7730U 8C/16T 3.0 GHz |
||
|
|
Aurum
Master Cruncher The Great Basin Joined: Dec 24, 2017 Post Count: 2391 Status: Offline Project Badges:
|
Falconet, Thanks, that explains something I saw.
----------------------------------------i5-4690k, Win7, 4448, 15694 i5-4690k, LM 19, 5379, 140545 I'm slowly upgrading from Linux Mint 19.3 to 20. ![]() ...KRI please cancel all shadow-banning |
||
|
|
AgrFan
Senior Cruncher USA Joined: Apr 17, 2008 Post Count: 396 Status: Offline Project Badges:
|
"In addition to the instruction sets SSE, SSE2, SSE3, Supplemental SSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, and AVX512[F,CD,BW,DQ,VL], which are available in its Skylake predecessor, Cascade Lake also includes the new AVX-512 Vector Neural Network Instructions (VNNI), which provide significant, more efficient deep-learning inference acceleration. With 512-bit floating-point vector registers and two floating-point functional units, each capable of Fused Multiply-Add (FMA), a Cascade Lake core can deliver 32 double-precision floating-point operations per cycle." There has never been any information provided by the WCG team to indicate which instructions sets are being utilized by each project. This would be helpful information for those of us replacing old hardware. The last information I saw on these forums was the projects are compiled/optimized for the most common processors. I believe that was SSEx at one time. That may have changed over the past few years with the newer projects.
|
||
|
|
|