| c++ Programming Glossary: mul2Help me understand std::erase http://stackoverflow.com/questions/1821703/help-me-understand-stderase 
 Fast bignum square computation http://stackoverflow.com/questions/18465326/fast-bignum-square-computation  2 fast sqr mul1 363.472 ms ... O N2 classic multiplication mul2 349.384 ms ... O 3 N^log2 3 optimized karatsuba multiplication.. 195 32 bits sqr 883.01 ms mul1 1427.02 ms mul2 1089.84 ms x 0.98765588997654321000... 389 32 bits sqr 3189.19.. 389 32 bits sqr 3189.19 ms mul1 5553.23 ms mul2 3159.07 ms after optimizations for karatsuba the code is massively.. 
 modular arithmetics and NTT (finite field DFT) optimizations http://stackoverflow.com/questions/18577076/modular-arithmetics-and-ntt-finite-field-dft-optimizations  fast sqr sqr2 720.419 ms NTT sqr mul1 5.588 ms simpe mul mul2 3.172 ms karatsuba mul mul3 1053.382 ms NTT mul some measurements.. fast sqr sqr2 208.298 ms NTT sqr mul1 5.564 ms simpe mul mul2 3.113 ms karatsuba mul mul3 302.740 ms NTT mul check the NTT.. a 0.98765588997654321000 1553 32bits looped 1x times mul2 28.585 ms karatsuba mul mul3 26.311 ms NTT mul new source code.. 
 SSE2 Compiler Error http://stackoverflow.com/questions/1874882/sse2-compiler-error  align 16 int t2 100000 temporary variable __m128i mul1 mul2 for int j 0 j 100000 j t1 j j t2 j j 1 set temporary variables.. xmm0 05fh pshufd xmm1 xmm1 05fh muludq xmm0 xmm1 movdqa mul2 xmm0 add eax 16 cmp eax 100000 jnge label return 0 And get the.. 
 how to achieve 4 FLOPs per cycle http://stackoverflow.com/questions/8389648/how-to-achieve-4-flops-per-cycle  0.1 sum2 0.1 sum3 0.2 sum4 0.2 sum5 0.0 double mul1 1.0 mul2 1.1 mul3 1.2 mul4 1.3 mul5 1.4 int loops ops 10 we have 10 floating.. 5.0 add loops sum1 sum2 sum3 sum4 sum5  pow mul loops mul1 mul2 mul3 mul4 mul5 for int i 0 i loops i mul1 mul mul2 mul mul3.. mul1 mul2 mul3 mul4 mul5 for int i 0 i loops i mul1 mul mul2 mul mul3 mul mul4 mul mul5 mul sum1 add sum2 add sum3 add sum4.. 
 |