c++ Programming Glossary: mulsd

http://stackoverflow.com/questions/4956033/multiply-by-0-optimization

f .type f @function f .LFB0 .cfi_startproc movsd rdi xmm0 mulsd 8 rdi xmm0 mulsd .LC0 rip xmm0 ret .cfi_endproc .LFE0 .size.. f .LFB0 .cfi_startproc movsd rdi xmm0 mulsd 8 rdi xmm0 mulsd .LC0 rip xmm0 ret .cfi_endproc .LFE0 .size f . f .section .rodata.cst8..

Is there any advantage to using pow(x,2) instead of x*x, with x double?

http://stackoverflow.com/questions/6321170/is-there-any-advantage-to-using-powx-2-instead-of-xx-with-x-double

pow y 2 Assembles to pushq rbp movq rsp rbp movsd rdi xmm0 mulsd xmm0 xmm0 movsd xmm0 rdi movsd rsi xmm0 mulsd xmm0 xmm0 movsd.. rdi xmm0 mulsd xmm0 xmm0 movsd xmm0 rdi movsd rsi xmm0 mulsd xmm0 xmm0 movsd xmm0 rsi leave ret So as long as you're using..

Fast multiplication/division by 2 for floats and doubles (C/C++)

http://stackoverflow.com/questions/7720668/fast-multiplication-division-by-2-for-floats-and-doubles-c-c

with an inner loop of movsd xmm1 mmword ptr esp eax 8 38h mulsd xmm1 xmm0 movsd mmword ptr esp eax 8 38h xmm1 inc eax VC10 without..

how to achieve 4 FLOPs per cycle

http://stackoverflow.com/questions/8389648/how-to-achieve-4-flops-per-cycle

the main loop seems kind of optimal to me .L4 inc eax mulsd xmm8 xmm3 mulsd xmm7 xmm3 mulsd xmm6 xmm3 mulsd xmm5 xmm3 mulsd.. seems kind of optimal to me .L4 inc eax mulsd xmm8 xmm3 mulsd xmm7 xmm3 mulsd xmm6 xmm3 mulsd xmm5 xmm3 mulsd xmm1 xmm3 addsd.. optimal to me .L4 inc eax mulsd xmm8 xmm3 mulsd xmm7 xmm3 mulsd xmm6 xmm3 mulsd xmm5 xmm3 mulsd xmm1 xmm3 addsd xmm13 xmm2 addsd..