Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 8
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2777 times and has 7 replies Next Thread
Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2391
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Integer vs Floating Point Projects

Some CPUs are faster at Integer operations and others at Floating Point operations.
Which BOINC projects belong in which category and how do you know???
----------------------------------------

...KRI please cancel all shadow-banning
[Oct 14, 2020 9:35:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Martin Schnellinger
Advanced Cruncher
Joined: Apr 29, 2007
Post Count: 128
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
rose Re: Integer vs Floating Point Projects

An intelligent and interesting question ideed.
I thought about a way to find out by testing.
But it seems quite difficult.
Two crunchers with an identical machine (same CPU, same operating system) would have to run an identical work unit).......no, this test would not help.
Checked the info about work unit properties given in the BOINC manager.
Only the estimated numer of GFLOPs is given, no word about interger operations.
At least in the case of the open pandemics project this is the case. I assume, that it is the same with other projects, but cannot check right now, running only open pandemic at the moment.

In Vikipedia there is info about the power of different CPUs, but as far as I see it is only given in GFLOPs.

The table here https://en.wikipedia.org/wiki/FLOPS#FLOPS_per_cycle_for_various_processors

says as follows:
FLOPS per cycle for various processors
Microarchitecture ISA FP64 FP32 FP16
Intel Atom (Bonnell, Saltwell, Silvermont and Goldmont) SSE3 (64-bit) 2 4 0
Intel Core (Merom, Penryn)
Intel Nehalem[7] (Nehalem, Westmere) SSE4 (128-bit) 4 8 0
Intel Sandy Bridge (Sandy Bridge, Ivy Bridge) AVX (256-bit) 8 16 0
Intel Haswell[7] (Haswell, Devil's Canyon, Broadwell)
Intel Skylake (Skylake, Kaby Lake, Coffee Lake, Whiskey lake, Amber lake) AVX2 & FMA (256-bit) 16 32 0
Intel Xeon Phi (Knights Corner) SSE & FMA (256-bit) 16 32 0
Intel Skylake-X
Intel Xeon Phi (Knights Landing, Knights Mill) AVX-512 & FMA (512-bit) 32 64 0
AMD Bobcat AMD64 (64-bit) 2 4 0
AMD Jaguar
AMD Puma AVX (128-bit) 4 8 0
AMD K10 SSE4/4a (128-bit) 4 8 0
AMD Bulldozer[7] (Piledriver, Steamroller, Excavator) AVX (128-bit) Bulldozer-Steamroller

AVX2 (128-bit) Excavator

FMA3 (Bulldozer)[8]

FMA3/4 (Piledriver-Excavator)
4 8 0
AMD Zen (Ryzen 1000 series, Threadripper 1000 series, Epyc Naples)
AMD Zen+[7][9][10][11] (Ryzen 2000 series, Threadripper 2000 series) AVX2 & FMA (128-bit, 256-bit decoding)[12] 8 16 0
AMD Zen 2[13] (Ryzen 3000 series, Threadripper 3000 series, Epyc Rome))
AMD Zen 3 (Ryzen 5000 series) AVX2 & FMA (256-bit) 16 32 0
ARM Cortex-A7, A9, A15 ARMv7 1 8 0
ARM Cortex-A32, A35, A53, A55, A72, A73, A75 ARMv8 2 8 0
ARM Cortex-A57[7] ARMv8 4 8 0
ARM Cortex-A76, A77 ARMv8 8 16 0
Qualcomm Krait ARMv8 1 8 0
Qualcomm Kryo (1xx - 3xx) ARMv8 2 8 0
Qualcomm Kryo (4xx - 5xx) ARMv8 8 16 0
Samsung Exynos M1 and M2 ARMv8 2 8 0
Samsung Exynos M3 and M4 ARMv8 3 12 0
IBM PowerPC A2 (Blue Gene/Q) ? 8 8 (as FP64) 0
Hitachi SH-4[14][15] SH-4 1 7 0
Nvidia Fermi (only GeForce GTX 465–480, 560 Ti, 570-590) PTX 1/4 (locked by driver, 1 in hardware) 2 0
Nvidia Fermi (only Quadro 600-2000) PTX 1/8 2 0
Nvidia Fermi (only Quadro 4000–7000, Tesla) PTX 1 2 0
Nvidia Kepler (GeForce (except Titan and Titan Black), Quadro (except K6000), Tesla K10) PTX 1/12 (for GK110: locked by driver, 2/3 in hardware) 2 0
Nvidia Kepler (GeForce GTX Titan and Titan Black, Quadro K6000, Tesla (except K10)) PTX 2/3 2 0
Nvidia Maxwell
Nvidia Pascal (all except Quadro GP100 and Tesla P100) PTX 1/16 2 1/32
Nvidia Pascal (only Quadro GP100 and Tesla P100) PTX 1 2 4
Nvidia Volta[16] PTX 1 2 (FP32) + 2 (INT32) 16
Nvidia Turing (only GeForce 16XX) PTX 1/16 2 (FP32) + 2 (INT32) 4
Nvidia Turing (all except GeForce 16XX) PTX 1/16 2 (FP32) + 2 (INT32) 16
Nvidia Ampere[17][18] (only A100) PTX 2 2 (FP32) + 2 (INT32) 32
Nvidia Ampere (only GeForce) PTX 1/32 2 (FP32) + 0 (INT32) or 1 (FP32) + 1 (INT32) 16
AMD GCN (only Radeon Pro WX 2100-7100) GCN 1/8 2 2
AMD GCN (all except Radeon VII, Instinct MI50 and MI60, Radeon Pro WX 2100-7100) GCN 1/8 2 4
AMD GCN Vega 20 (only Radeon VII) GCN 1/2 (locked by driver, 1 in hardware) 2 4
AMD GCN Vega 20 (only Radeon Instinct MI50 / MI60 and Radeon Pro VII) GCN 1 2 4
AMD RDNA[19][20] RDNA 1/8 2 4
Graphcore Colossus GC2[21][22][23] (values estimated) ? 0 18 72
Graphcore Colossus GC200 Mk2[24] (values estimated) ? 0 18 144

[25]

No word about the power of CPUs to do interger operations.......strange.

I found out, that CPU do have a so called "integer range"

Vikipedia says:

"Integer range

Every CPU represents numerical values in a specific way. For example, some early digital computers represented numbers as familiar decimal (base 10) numeral system values, and others have employed more unusual representations such as ternary (base three). Nearly all modern CPUs represent numbers in binary form, with each digit being represented by some two-valued physical quantity such as a "high" or "low" voltage.[f]
A six-bit word containing the binary encoded representation of decimal value 40. Most modern CPUs employ word sizes that are a power of two, for example 8, 16, 32 or 64 bits.

Related to numeric representation is the size and precision of integer numbers that a CPU can represent. In the case of a binary CPU, this is measured by the number of bits (significant digits of a binary encoded integer) that the CPU can process in one operation, which is commonly called word size, bit width, data path width, integer precision, or integer size. A CPU's integer size determines the range of integer values it can directly operate on.[g] For example, an 8-bit CPU can directly manipulate integers represented by eight bits, which have a range of 256 (28) discrete integer values."

But I could not find a source telling which CPU on the market has a big or a small "integer range"

Probably a CPU with 64bit architecture has a better integer range than a CPU with 8bit architecture and should be better at integer operations. But I am not sure, I am not a computer specialist.

Just wanted to contribute the best I could.
All the best to everyone, stay healthy.
[Oct 14, 2020 4:04:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
flynryan
Senior Cruncher
United States
Joined: Aug 15, 2006
Post Count: 235
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Integer vs Floating Point Projects

You can do a CPU benchmark within BOINC to find out your processors performance.

Here is mine on a Ryzen 3950X system:

10/14/2020 10:03:15 AM | | Suspending computation - CPU benchmarks in progress
10/14/2020 10:03:46 AM | | Benchmark results:
10/14/2020 10:03:46 AM | | Number of CPUs: 32
10/14/2020 10:03:46 AM | | 4752 floating point MIPS (Whetstone) per CPU
10/14/2020 10:03:46 AM | | 17368 integer MIPS (Dhrystone) per CPU

As far as which project use which type, my guess is that each project uses some of each, it's probably not full one or the other. Which ones utilize more of each, I couldn't be sure though.
[Oct 14, 2020 5:11:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2391
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Integer vs Floating Point Projects

17368 integer MIPS (Dhrystone) per CPU
I'd try this one again. I don't have any Ryzens but for that FP 4752 I'd guess that Integer should be around 100,000. I've been manually suspending work and running CPU Benchmarks twice in a row.
I have a couple of cases where I have several of the same CPU and one of them will have FP be 5 to 10x too low. Might be the motherboard? Maybe I need a BIOS update? I haven't figured it out yet.
I'm making a table of my benchmarks. I wish I knew how to make those nice tables like adri but I'll probably post screenshots later.
----------------------------------------

...KRI please cancel all shadow-banning
[Oct 14, 2020 6:59:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2391
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Integer vs Floating Point Projects

An intelligent and interesting question indeed.
That's the nicest thing anyone's said to me in ages :-)
I just added some X299 CPUs and I thought for sure I knew which one would run ARP the fastest but I was exactly upside down according to my eyeball estimates. ARP runs faster for higher Integer BMs.
Then I saw something that made me think maybe they don't even compile for high-end CPUs:
https://www.nas.nasa.gov/hecc/support/kb/casc...0operations%20per%20cycle.
"In addition to the instruction sets SSE, SSE2, SSE3, Supplemental SSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, and AVX512[F,CD,BW,DQ,VL], which are available in its Skylake predecessor, Cascade Lake also includes the new AVX-512 Vector Neural Network Instructions (VNNI), which provide significant, more efficient deep-learning inference acceleration.
With 512-bit floating-point vector registers and two floating-point functional units, each capable of Fused Multiply-Add (FMA), a Cascade Lake core can deliver 32 double-precision floating-point operations per cycle."

32 DP FLOPs per cycle is double what we're used to.

Anyone know if ARP or any other project is compiled for VNNI???
----------------------------------------

...KRI please cancel all shadow-banning
----------------------------------------
[Edit 3 times, last edit by Aurum420 at Oct 14, 2020 7:10:04 PM]
[Oct 14, 2020 7:06:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Falconet
Master Cruncher
Portugal
Joined: Mar 9, 2009
Post Count: 3315
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Integer vs Floating Point Projects

On Linux, the Integer from the BOINC benchmark is a lot higher than on Windows while FP seems higher on Linux than on Windows.

For example, my Ryzen 1400:

Windows 10:
3986.69 million ops/sec
14783.9 million ops/sec

LInux Mint 20 (20.04 based):
5458.53 million ops/sec
60194.52 million ops/sec

Of note, on Ubuntu 18.04 or distros based on that, the integer number is a lot higher than it is on Ubuntu 20.04. My Ryzen would probably get 90,000-ish on 18.04. I did not notice any negative performance differences from 18.04 to 20.04 despite the lower integer number on 20.04.
I did think 20.04 was doing SCC tasks a bit faster than 18.04 back in April, but I didn't really make a consistent comparison.
----------------------------------------


- AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W
- AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W
- AMD Ryzen 7 7730U 8C/16T 3.0 GHz
[Oct 14, 2020 7:16:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2391
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Integer vs Floating Point Projects

Falconet, Thanks, that explains something I saw.
i5-4690k, Win7, 4448, 15694
i5-4690k, LM 19, 5379, 140545

I'm slowly upgrading from Linux Mint 19.3 to 20.
----------------------------------------

...KRI please cancel all shadow-banning
[Oct 14, 2020 7:32:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
AgrFan
Senior Cruncher
USA
Joined: Apr 17, 2008
Post Count: 396
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Integer vs Floating Point Projects

"In addition to the instruction sets SSE, SSE2, SSE3, Supplemental SSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, and AVX512[F,CD,BW,DQ,VL], which are available in its Skylake predecessor, Cascade Lake also includes the new AVX-512 Vector Neural Network Instructions (VNNI), which provide significant, more efficient deep-learning inference acceleration.
With 512-bit floating-point vector registers and two floating-point functional units, each capable of Fused Multiply-Add (FMA), a Cascade Lake core can deliver 32 double-precision floating-point operations per cycle."

There has never been any information provided by the WCG team to indicate which instructions sets are being utilized by each project. This would be helpful information for those of us replacing old hardware. The last information I saw on these forums was the projects are compiled/optimized for the most common processors. I believe that was SSEx at one time. That may have changed over the past few years with the newer projects.
----------------------------------------

  • i5-10400 (Comet Lake, 6C/12T) @ 2.9 GHz
  • i5-7400 (Kaby Lake, 4C/4T) @ 3.0 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3330 (Ivy Bridge, 4C/4T) @ 3.0 GHz

[Oct 14, 2020 11:47:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread