Summary of some performance tests with floating-point numbers on my
standard desktop computer

Thu Feb 11 07:59:16 CET 2016

Executive summary: doubles are faster than floats, except when floats
are faster.

Hardware:

Intel Core i7-4790K ("Haswell Refresh" from the summer of 2014)
ASUS Z97-DELUXE ATX
32 GB of memory (4 * 8 GB Corsair Vengeance 1600 MHz)
No overclocking

I ran one billion operations, alternating multiplication and division,
in varying numbers of threads. On my processor, with four cores and
hyperthreading, most of the time the best performance was found with
eight parallel threads. This is consistent with what you would expect.

On this computer a float is 32 bits and a double is 64 bits.

With operations on a single variable (on the C source code level),
where one might expect that everything is done in processor registers,
the best performance measured was 1533.03 Mflops with float, and
6205.27 Mflops with double. Using doubles was four times faster than
using floats, in spite of floats being 32 bits and doubles 64. Here I
would guess that the main memory is not used at all, and that
performance is limited by floating-point calculations in the
processor. If I have to guess the reasons, based on what I think I
remember that I have read, it is because floating-point calculations
are done in fast hardware with double precision or more, and
single-precision float calculations are made the same way, but then
the values have to be converted from, and back to, float.

If, instead, have a billion different floating-point numbers, stored
in memory, and each number must be loaded from, and stored back in,
main memory, it is the other way around, with floats being (slightly)
faster than doubles. In this case, the best measured float performance
was 2315.10 Mflops, and double 1592.77 Mflops. Here I think
performance is limited by memory bandwidth, and it is not unreasonable
that floats (with in this case 4 gigabytes of data) is faster than
doubles (with in this case 8 gigabytes of data). But note that it was
not twice as fast.

But, as always, it must be remembered that benchmarks are dependent on
many things: hardware, benchmark software, compiler, compiler
settings, the CPU cooler, etcetera. On other hardware, for example
with different implementations of floating-point calculations, we
might get completely different results.

-- Thomas Padron-McCarthy, tel +46(0)707347013, http://www.aass.oru.se/~tpy/