Utmatning från min i7 hemma. De första testerna är samma som jag körde 2014, när den var ny, och ger ungefär samma resultat. Thu Feb 11 07:59:16 CET 2016 Intel Core i7-4790K ("Haswell Refresh" från sommaren 2014) ASUS Z97-DELUXE ATX 32 GB minne (4*8 GB Corsair Vengeance 1600 MHz) Ingen överklockning Först varianten med en miljard flyttalsadditioner och upp till 20 trådar. Kompilering: gcc -Wall -std=c99 -O3 threadcount.c -lpthread -o threadcount Starting 1 thread(s)... The 1 thread(s) are running. Waiting for the 1 thread(s) to finish... Elapsed: 0.69 s 1 thread(s): 1438.88 Mflops (1438.88 per thread) Starting 2 thread(s)... The 2 thread(s) are running. Waiting for the 2 thread(s) to finish... Elapsed: 0.69 s 2 thread(s): 2907.21 Mflops (1453.61 per thread) Starting 3 thread(s)... The 3 thread(s) are running. Waiting for the 3 thread(s) to finish... Elapsed: 0.70 s 3 thread(s): 4257.63 Mflops (1419.21 per thread) Starting 4 thread(s)... The 4 thread(s) are running. Waiting for the 4 thread(s) to finish... Elapsed: 0.71 s 4 thread(s): 5669.07 Mflops (1417.27 per thread) Starting 5 thread(s)... The 5 thread(s) are running. Waiting for the 5 thread(s) to finish... Elapsed: 0.72 s 5 thread(s): 6915.23 Mflops (1383.05 per thread) Starting 6 thread(s)... The 6 thread(s) are running. Waiting for the 6 thread(s) to finish... Elapsed: 0.71 s 6 thread(s): 8462.25 Mflops (1410.38 per thread) Starting 7 thread(s)... The 7 thread(s) are running. Waiting for the 7 thread(s) to finish... Elapsed: 0.72 s 7 thread(s): 9666.21 Mflops (1380.89 per thread) Starting 8 thread(s)... The 8 thread(s) are running. Waiting for the 8 thread(s) to finish... Elapsed: 0.78 s 8 thread(s): 10218.69 Mflops (1277.34 per thread) Starting 9 thread(s)... The 9 thread(s) are running. Waiting for the 9 thread(s) to finish... Elapsed: 0.91 s 9 thread(s): 9852.70 Mflops (1094.74 per thread) Starting 10 thread(s)... The 10 thread(s) are running. Waiting for the 10 thread(s) to finish... Elapsed: 1.02 s 10 thread(s): 9764.09 Mflops (976.41 per thread) Starting 11 thread(s)... The 11 thread(s) are running. Waiting for the 11 thread(s) to finish... Elapsed: 1.15 s 11 thread(s): 9581.00 Mflops (871.00 per thread) Starting 12 thread(s)... The 12 thread(s) are running. Waiting for the 12 thread(s) to finish... Elapsed: 1.18 s 12 thread(s): 10156.09 Mflops (846.34 per thread) Starting 13 thread(s)... The 13 thread(s) are running. Waiting for the 13 thread(s) to finish... Elapsed: 1.25 s 13 thread(s): 10413.09 Mflops (801.01 per thread) Starting 14 thread(s)... The 14 thread(s) are running. Waiting for the 14 thread(s) to finish... Elapsed: 1.34 s 14 thread(s): 10411.14 Mflops (743.65 per thread) Starting 15 thread(s)... The 15 thread(s) are running. Waiting for the 15 thread(s) to finish... Elapsed: 1.44 s 15 thread(s): 10445.97 Mflops (696.40 per thread) Starting 16 thread(s)... The 16 thread(s) are running. Waiting for the 16 thread(s) to finish... Elapsed: 1.49 s 16 thread(s): 10761.37 Mflops (672.59 per thread) Starting 17 thread(s)... The 17 thread(s) are running. Waiting for the 17 thread(s) to finish... Elapsed: 1.59 s 17 thread(s): 10700.20 Mflops (629.42 per thread) Starting 18 thread(s)... The 18 thread(s) are running. Waiting for the 18 thread(s) to finish... Elapsed: 1.70 s 18 thread(s): 10563.93 Mflops (586.88 per thread) Starting 19 thread(s)... The 19 thread(s) are running. Waiting for the 19 thread(s) to finish... Elapsed: 1.77 s 19 thread(s): 10742.37 Mflops (565.39 per thread) Starting 20 thread(s)... The 20 thread(s) are running. Waiting for the 20 thread(s) to finish... Elapsed: 1.93 s 20 thread(s): 10369.28 Mflops (518.46 per thread) Dvs, med åtta parallella trådar kan man komma upp i drygt 10 gigaflops (med additioner, inte multiplikationer). Fler än åtta trådar ger liten eller ingen ytterligare prestanda, vilket är vad man kan förvänta sig av fyra processorkärnor med hypertrådning. Sen varianten med en miljard flyttalsmultiplikationer och upp till 10 trådar. Kompilering: gcc -Wall -std=c99 -O3 threadcount-multiplication.c -lpthread -o threadcount-multiplication Starting 1 thread(s)... The 1 thread(s) are running. Waiting for the 1 thread(s) to finish... Elapsed: 1.16 s 1 thread(s): 865.30 Mflops (865.30 per thread) Starting 2 thread(s)... The 2 thread(s) are running. Waiting for the 2 thread(s) to finish... Elapsed: 1.15 s 2 thread(s): 1741.33 Mflops (870.67 per thread) Starting 3 thread(s)... The 3 thread(s) are running. Waiting for the 3 thread(s) to finish... Elapsed: 1.17 s 3 thread(s): 2560.80 Mflops (853.60 per thread) Starting 4 thread(s)... The 4 thread(s) are running. Waiting for the 4 thread(s) to finish... Elapsed: 1.20 s 4 thread(s): 3342.43 Mflops (835.61 per thread) Starting 5 thread(s)... The 5 thread(s) are running. Waiting for the 5 thread(s) to finish... Elapsed: 1.20 s 5 thread(s): 4177.52 Mflops (835.50 per thread) Starting 6 thread(s)... The 6 thread(s) are running. Waiting for the 6 thread(s) to finish... Elapsed: 1.21 s 6 thread(s): 4976.43 Mflops (829.41 per thread) Starting 7 thread(s)... The 7 thread(s) are running. Waiting for the 7 thread(s) to finish... Elapsed: 1.21 s 7 thread(s): 5803.10 Mflops (829.01 per thread) Starting 8 thread(s)... The 8 thread(s) are running. Waiting for the 8 thread(s) to finish... Elapsed: 1.29 s 8 thread(s): 6205.27 Mflops (775.66 per thread) Starting 9 thread(s)... The 9 thread(s) are running. Waiting for the 9 thread(s) to finish... Elapsed: 1.51 s 9 thread(s): 5965.83 Mflops (662.87 per thread) Starting 10 thread(s)... The 10 thread(s) are running. Waiting for the 10 thread(s) to finish... Elapsed: 1.62 s 10 thread(s): 6179.58 Mflops (617.96 per thread) Dvs, med multiplikationer i stället för additioner kommer man med åtta parallella trådar upp i drygt 6 gigaflops. Det var med doubles. Nu ska vi prova om floats blir snabbare! gcc -Wall -std=c99 -O3 threadcount-multiplication-float.c -lpthread -o threadcount-multiplication-float Häpp! Utan optimering tar det rimliga tider, men med -O3 går det på tiden noll! Optimeraren verkar optimera bort hela loopen, kanske beroende på att det blir overflow. Vi gör en ny version av testprogrammet, som alternerar mellan division och multiplikation. Först med vanliga doubles. gcc -Wall -std=c99 -O3 threadcount-multiplication-2.c -lpthread -o threadcount-multiplication-2 8 thread(s): 2318.45 Mflops (289.81 per thread) Som bäst får vi 2.3 gigaflops (multiplikationer och divisioner). Som förut ger fler än åtta trådar liten eller ingen ökning. Det är betydligt sämre än de 6 gigaflops vi fick med bara multiplikation. Förmodligen beror det på att division är en jobbigare operation, och kanske på att vi nu räknar med andra data. Prova med samma ingångsvärden som förut, men sen blir det ju andra värden under beräkningens gång: gcc -Wall -std=c99 -O3 threadcount-multiplication-3.c -lpthread -o threadcount-multiplication-3 8 thread(s): 2341.49 Mflops (292.69 per thread) Nej, vårt försök att räkna med andra värden gav samma resultat. Det verkar alltså som att division tar längre tid. Nu tar vi vårt "overflow-säkrade" program, och gör beräkningarna med floats i stället för doubles. gcc -Wall -std=c99 -O3 threadcount-multiplication-2-float.c -lpthread -o threadcount-multiplication-2-float Starting 1 thread(s)... The 1 thread(s) are running. Waiting for the 1 thread(s) to finish... Elapsed: 4.78 s 1 thread(s): 209.23 Mflops (209.23 per thread) Starting 2 thread(s)... The 2 thread(s) are running. Waiting for the 2 thread(s) to finish... Elapsed: 4.83 s 2 thread(s): 414.22 Mflops (207.11 per thread) Starting 3 thread(s)... The 3 thread(s) are running. Waiting for the 3 thread(s) to finish... Elapsed: 4.87 s 3 thread(s): 615.93 Mflops (205.31 per thread) Starting 4 thread(s)... The 4 thread(s) are running. Waiting for the 4 thread(s) to finish... Elapsed: 4.95 s 4 thread(s): 808.49 Mflops (202.12 per thread) Starting 5 thread(s)... The 5 thread(s) are running. Waiting for the 5 thread(s) to finish... Elapsed: 4.98 s 5 thread(s): 1003.08 Mflops (200.62 per thread) Starting 6 thread(s)... The 6 thread(s) are running. Waiting for the 6 thread(s) to finish... Elapsed: 5.03 s 6 thread(s): 1192.02 Mflops (198.67 per thread) Starting 7 thread(s)... The 7 thread(s) are running. Waiting for the 7 thread(s) to finish... Elapsed: 5.16 s 7 thread(s): 1356.32 Mflops (193.76 per thread) Starting 8 thread(s)... The 8 thread(s) are running. Waiting for the 8 thread(s) to finish... Elapsed: 5.68 s 8 thread(s): 1409.08 Mflops (176.14 per thread) Starting 9 thread(s)... The 9 thread(s) are running. Waiting for the 9 thread(s) to finish... Elapsed: 6.38 s 9 thread(s): 1409.77 Mflops (156.64 per thread) Starting 10 thread(s)... The 10 thread(s) are running. Waiting for the 10 thread(s) to finish... Elapsed: 6.95 s 10 thread(s): 1437.94 Mflops (143.79 per thread) Här är det inte lika tydligt att gränsen går efter 8 trådar. greppat ur ovanstående: 1 thread(s): 209.23 Mflops (209.23 per thread) 2 thread(s): 414.22 Mflops (207.11 per thread) 3 thread(s): 615.93 Mflops (205.31 per thread) 4 thread(s): 808.49 Mflops (202.12 per thread) 5 thread(s): 1003.08 Mflops (200.62 per thread) 6 thread(s): 1192.02 Mflops (198.67 per thread) 7 thread(s): 1356.32 Mflops (193.76 per thread) 8 thread(s): 1409.08 Mflops (176.14 per thread) 9 thread(s): 1409.77 Mflops (156.64 per thread) 10 thread(s): 1437.94 Mflops (143.79 per thread) Det kan bero på tillfälligheter, och att datorn inte hade alla kärnor och hypertrådar lediga. Prova med upp till 20 trådar (men jag gjorde annat på datorn samtidigt): Starting 1 thread(s)... The 1 thread(s) are running. Waiting for the 1 thread(s) to finish... Elapsed: 4.74 s 1 thread(s): 210.78 Mflops (210.78 per thread) Starting 2 thread(s)... The 2 thread(s) are running. Waiting for the 2 thread(s) to finish... Elapsed: 4.78 s 2 thread(s): 417.97 Mflops (208.99 per thread) Starting 3 thread(s)... The 3 thread(s) are running. Waiting for the 3 thread(s) to finish... Elapsed: 4.88 s 3 thread(s): 614.45 Mflops (204.82 per thread) Starting 4 thread(s)... The 4 thread(s) are running. Waiting for the 4 thread(s) to finish... Elapsed: 4.94 s 4 thread(s): 809.29 Mflops (202.32 per thread) Starting 5 thread(s)... The 5 thread(s) are running. Waiting for the 5 thread(s) to finish... Elapsed: 5.02 s 5 thread(s): 996.71 Mflops (199.34 per thread) Starting 6 thread(s)... The 6 thread(s) are running. Waiting for the 6 thread(s) to finish... Elapsed: 5.03 s 6 thread(s): 1193.55 Mflops (198.92 per thread) Starting 7 thread(s)... The 7 thread(s) are running. Waiting for the 7 thread(s) to finish... Elapsed: 5.10 s 7 thread(s): 1373.49 Mflops (196.21 per thread) Starting 8 thread(s)... The 8 thread(s) are running. Waiting for the 8 thread(s) to finish... Elapsed: 5.85 s 8 thread(s): 1367.42 Mflops (170.93 per thread) Starting 9 thread(s)... The 9 thread(s) are running. Waiting for the 9 thread(s) to finish... Elapsed: 6.47 s 9 thread(s): 1391.62 Mflops (154.62 per thread) Starting 10 thread(s)... The 10 thread(s) are running. Waiting for the 10 thread(s) to finish... Elapsed: 6.58 s 10 thread(s): 1520.36 Mflops (152.04 per thread) Starting 11 thread(s)... The 11 thread(s) are running. Waiting for the 11 thread(s) to finish... Elapsed: 7.74 s 11 thread(s): 1420.37 Mflops (129.12 per thread) Starting 12 thread(s)... The 12 thread(s) are running. Waiting for the 12 thread(s) to finish... Elapsed: 8.78 s 12 thread(s): 1366.76 Mflops (113.90 per thread) Starting 13 thread(s)... The 13 thread(s) are running. Waiting for the 13 thread(s) to finish... Elapsed: 9.50 s 13 thread(s): 1367.92 Mflops (105.22 per thread) Starting 14 thread(s)... The 14 thread(s) are running. Waiting for the 14 thread(s) to finish... Elapsed: 9.34 s 14 thread(s): 1499.45 Mflops (107.10 per thread) Starting 15 thread(s)... The 15 thread(s) are running. Waiting for the 15 thread(s) to finish... Elapsed: 9.94 s 15 thread(s): 1508.47 Mflops (100.56 per thread) Starting 16 thread(s)... The 16 thread(s) are running. Waiting for the 16 thread(s) to finish... Elapsed: 10.96 s 16 thread(s): 1460.05 Mflops (91.25 per thread) Starting 17 thread(s)... The 17 thread(s) are running. Waiting for the 17 thread(s) to finish... Elapsed: 11.26 s 17 thread(s): 1509.81 Mflops (88.81 per thread) Starting 18 thread(s)... The 18 thread(s) are running. Waiting for the 18 thread(s) to finish... Elapsed: 11.74 s 18 thread(s): 1533.03 Mflops (85.17 per thread) Starting 19 thread(s)... The 19 thread(s) are running. Waiting for the 19 thread(s) to finish... Elapsed: 12.60 s 19 thread(s): 1507.62 Mflops (79.35 per thread) Starting 20 thread(s)... The 20 thread(s) are running. Waiting for the 20 thread(s) to finish... Elapsed: 13.51 s 20 thread(s): 1480.77 Mflops (74.04 per thread) Max var 1533.03 Mflops med 18 trådar. 8 trådar gav 1367.42 Mflops. (I första testet gav 8 trådar 1409.08 Mflops.) Vi kunde alltså mäta upp 1.5 gigaflops med floats, mot 6.2 gigaflops med doubles. Här gick det alltså fyra gånger så fort med doubles som med floats, trots att (på den här maskinen) float är 32 bitar och double 64. Om jag ska gissa, baseart på vad jag tror mig komma ihåg att jag läst, beror det på att processorns flyttalsberäkningar görs i snabb hårdvara med double-precision, eller ännu mer, och att float-beräkningar egentligen görs som doubles, men då måste värdena dessutom konverteras fram och tillbaka. Detta är med beräkningar av (på C-källkodsnivån) en enda variabel. Kanske kan alltihop då göras görs i processorregister. Blir det annorlunda med större datamängder, när data måste flyttas mellan primärminnet och processorn? Vi allokerar nu en miljard flyttal, och gör beräkningarna på alla dem, i stället för en miljard gånger på samma variabel. Alla trådarna släpps lösa i samma data, så det blir olika värden i olika tester, och kanske trådkonstigheter. gcc -Wall -std=c99 -O3 threadcount-multiplication-many-floats.c -lpthread -o threadcount-multiplication-many-floats Starting 1 thread(s)... The 1 thread(s) are running. Waiting for the 1 thread(s) to finish... Elapsed: 1.63 s 1 thread(s): 612.37 Mflops (612.37 per thread) Starting 2 thread(s)... The 2 thread(s) are running. Waiting for the 2 thread(s) to finish... Elapsed: 1.65 s 2 thread(s): 1214.96 Mflops (607.48 per thread) Starting 3 thread(s)... The 3 thread(s) are running. Waiting for the 3 thread(s) to finish... Elapsed: 1.70 s 3 thread(s): 1768.09 Mflops (589.36 per thread) Starting 4 thread(s)... The 4 thread(s) are running. Waiting for the 4 thread(s) to finish... Elapsed: 3.29 s 4 thread(s): 1215.17 Mflops (303.79 per thread) Starting 5 thread(s)... The 5 thread(s) are running. Waiting for the 5 thread(s) to finish... Elapsed: 2.53 s 5 thread(s): 1978.91 Mflops (395.78 per thread) Starting 6 thread(s)... The 6 thread(s) are running. Waiting for the 6 thread(s) to finish... Elapsed: 3.24 s 6 thread(s): 1852.94 Mflops (308.82 per thread) Starting 7 thread(s)... The 7 thread(s) are running. Waiting for the 7 thread(s) to finish... Elapsed: 3.37 s 7 thread(s): 2078.68 Mflops (296.95 per thread) Starting 8 thread(s)... The 8 thread(s) are running. Waiting for the 8 thread(s) to finish... Elapsed: 3.46 s 8 thread(s): 2315.10 Mflops (289.39 per thread) Starting 9 thread(s)... The 9 thread(s) are running. Waiting for the 9 thread(s) to finish... Elapsed: 4.14 s 9 thread(s): 2171.60 Mflops (241.29 per thread) Starting 10 thread(s)... The 10 thread(s) are running. Waiting for the 10 thread(s) to finish... Elapsed: 4.55 s 10 thread(s): 2197.85 Mflops (219.79 per thread) Starting 11 thread(s)... The 11 thread(s) are running. Waiting for the 11 thread(s) to finish... Elapsed: 5.20 s 11 thread(s): 2116.41 Mflops (192.40 per thread) Starting 12 thread(s)... The 12 thread(s) are running. Waiting for the 12 thread(s) to finish... Elapsed: 5.43 s 12 thread(s): 2209.57 Mflops (184.13 per thread) Starting 13 thread(s)... The 13 thread(s) are running. Waiting for the 13 thread(s) to finish... Elapsed: 5.95 s 13 thread(s): 2185.89 Mflops (168.15 per thread) Starting 14 thread(s)... The 14 thread(s) are running. Waiting for the 14 thread(s) to finish... Elapsed: 6.33 s 14 thread(s): 2212.92 Mflops (158.07 per thread) Starting 15 thread(s)... The 15 thread(s) are running. Waiting for the 15 thread(s) to finish... Elapsed: 6.72 s 15 thread(s): 2232.95 Mflops (148.86 per thread) Starting 16 thread(s)... The 16 thread(s) are running. Waiting for the 16 thread(s) to finish... Elapsed: 7.33 s 16 thread(s): 2183.11 Mflops (136.44 per thread) Starting 17 thread(s)... The 17 thread(s) are running. Waiting for the 17 thread(s) to finish... Elapsed: 7.70 s 17 thread(s): 2208.84 Mflops (129.93 per thread) Starting 18 thread(s)... The 18 thread(s) are running. Waiting for the 18 thread(s) to finish... Elapsed: 8.20 s 18 thread(s): 2194.43 Mflops (121.91 per thread) Starting 19 thread(s)... The 19 thread(s) are running. Waiting for the 19 thread(s) to finish... Elapsed: 8.69 s 19 thread(s): 2186.04 Mflops (115.05 per thread) Starting 20 thread(s)... The 20 thread(s) are running. Waiting for the 20 thread(s) to finish... Elapsed: 9.12 s 20 thread(s): 2193.40 Mflops (109.67 per thread) Bäst resultat med 8 trådar, 2315.10 Mflops (289.39 per thread). Det var med float. Nu provar vi med double! gcc -Wall -std=c99 -O3 threadcount-multiplication-many-doubles.c -lpthread -o threadcount-multiplication-many-doubles Starting 1 thread(s)... The 1 thread(s) are running. Waiting for the 1 thread(s) to finish... Elapsed: 1.60 s 1 thread(s): 624.15 Mflops (624.15 per thread) Starting 2 thread(s)... The 2 thread(s) are running. Waiting for the 2 thread(s) to finish... Elapsed: 1.95 s 2 thread(s): 1025.11 Mflops (512.55 per thread) Starting 3 thread(s)... The 3 thread(s) are running. Waiting for the 3 thread(s) to finish... Elapsed: 3.29 s 3 thread(s): 911.52 Mflops (303.84 per thread) Starting 4 thread(s)... The 4 thread(s) are running. Waiting for the 4 thread(s) to finish... Elapsed: 3.40 s 4 thread(s): 1177.64 Mflops (294.41 per thread) Starting 5 thread(s)... The 5 thread(s) are running. Waiting for the 5 thread(s) to finish... Elapsed: 3.71 s 5 thread(s): 1348.79 Mflops (269.76 per thread) Starting 6 thread(s)... The 6 thread(s) are running. Waiting for the 6 thread(s) to finish... Elapsed: 4.08 s 6 thread(s): 1469.95 Mflops (244.99 per thread) Starting 7 thread(s)... The 7 thread(s) are running. Waiting for the 7 thread(s) to finish... Elapsed: 4.39 s 7 thread(s): 1592.77 Mflops (227.54 per thread) Starting 8 thread(s)... The 8 thread(s) are running. Waiting for the 8 thread(s) to finish... Elapsed: 6.64 s 8 thread(s): 1204.19 Mflops (150.52 per thread) Starting 9 thread(s)... The 9 thread(s) are running. Waiting for the 9 thread(s) to finish... Elapsed: 7.83 s 9 thread(s): 1148.95 Mflops (127.66 per thread) Starting 10 thread(s)... The 10 thread(s) are running. Waiting for the 10 thread(s) to finish... Elapsed: 8.80 s 10 thread(s): 1136.23 Mflops (113.62 per thread) Starting 11 thread(s)... The 11 thread(s) are running. Waiting for the 11 thread(s) to finish... Elapsed: 9.68 s 11 thread(s): 1135.94 Mflops (103.27 per thread) Starting 12 thread(s)... The 12 thread(s) are running. Waiting for the 12 thread(s) to finish... Elapsed: 10.75 s 12 thread(s): 1115.91 Mflops (92.99 per thread) Starting 13 thread(s)... The 13 thread(s) are running. Waiting for the 13 thread(s) to finish... Elapsed: 11.61 s 13 thread(s): 1120.01 Mflops (86.15 per thread) Starting 14 thread(s)... The 14 thread(s) are running. Waiting for the 14 thread(s) to finish... Elapsed: 12.43 s 14 thread(s): 1125.98 Mflops (80.43 per thread) Starting 15 thread(s)... The 15 thread(s) are running. Waiting for the 15 thread(s) to finish... Elapsed: 13.51 s 15 thread(s): 1110.41 Mflops (74.03 per thread) Starting 16 thread(s)... The 16 thread(s) are running. Waiting for the 16 thread(s) to finish... Elapsed: 14.19 s 16 thread(s): 1127.89 Mflops (70.49 per thread) Starting 17 thread(s)... The 17 thread(s) are running. Waiting for the 17 thread(s) to finish... Elapsed: 15.27 s 17 thread(s): 1113.55 Mflops (65.50 per thread) Starting 18 thread(s)... The 18 thread(s) are running. Waiting for the 18 thread(s) to finish... Elapsed: 16.09 s 18 thread(s): 1118.59 Mflops (62.14 per thread) Starting 19 thread(s)... The 19 thread(s) are running. Waiting for the 19 thread(s) to finish... Elapsed: 16.92 s 19 thread(s): 1122.71 Mflops (59.09 per thread) Starting 20 thread(s)... The 20 thread(s) are running. Waiting for the 20 thread(s) to finish... Elapsed: 17.96 s 20 thread(s): 1113.49 Mflops (55.67 per thread) Bäst resultat med 7 trådar, 1592.77 Mflops (227.54 per thread). 8 trådar gav 8 thread(s): 1204.19 Mflops (150.52 per thread). Men jag gjorde annat på datorn samtidigt. Float gav alltså som bäst 2315.10 Mflops och double gav som bäst 1592.77 Mflops. Slutsatser: Med en miljard flyttal där varje tal ska hämtas, och sparas, i primärminnet, går det alltså (lite) fortare med float än double. Float gav som bäst 2315.10 Mflops och double gav som bäst 1592.77 Mflops. Här tror jag att prestandan begränsas av minnesbandbredd, och då är det inte orimligt att float (med i vårt fall 4 gigabyte data) går snabbare än double (med i vårt fall 8 gigabyte data). Men notera att det inte gick dubbelt så fort. Med ett enda tal, där vi kan gissa att allt görs i processorregister, kom vi som bäst upp i 1533.03 Mflops med float, och 6205.27 Mflops med double. Där gick det i stället (mycket) fortare med double än float. Här gissar jag att primärminnet inte används alls, och att restandan begränsas av flyttalsberäkningarna i processorn. Som vanligt måste man komma ihåg att benchmarks är beroende av många saker: hårdvara, benchmark-programmet, kompilatorn, kompilatorinställningar, hur bra kylning man har på processorn, med mera. På annan hårdvara, till exempel med annorlunda implementation av flyttalsberäkningar, kan man få helt andra resultat. -- Thomas Padron-McCarthy, tel +46(0)707347013, http://www.aass.oru.se/~tpy/