| c++ Programming Glossary: mul3Fast bignum square computation http://stackoverflow.com/questions/18465326/fast-bignum-square-computation  ms ... O 3 N^log2 3 optimized karatsuba multiplication mul3 9345.127 ms ... O 3 N^log2 3 unoptimized karatsuba multiplication.. sqr mul1 5.588 ms simpe mul mul2 3.172 ms karatsuba mul mul3 1053.382 ms NTT mul my implementation void arbnum sqr_NTT const.. 1553 32bits looped 1x times mul2 28.585 ms karatsuba mul mul3 26.311 ms NTT mul So now NTT multiplication is finally faster.. 
 modular arithmetics and NTT (finite field DFT) optimizations http://stackoverflow.com/questions/18577076/modular-arithmetics-and-ntt-finite-field-dft-optimizations  sqr mul1 5.588 ms simpe mul mul2 3.172 ms karatsuba mul mul3 1053.382 ms NTT mul some measurements after my optimizations.. sqr mul1 5.564 ms simpe mul mul2 3.113 ms karatsuba mul mul3 302.740 ms NTT mul check the NTT mul and NTT sqr times my optimizations.. 1553 32bits looped 1x times mul2 28.585 ms karatsuba mul mul3 26.311 ms NTT mul new source code for modular arithmetics  .. 
 how to achieve 4 FLOPs per cycle http://stackoverflow.com/questions/8389648/how-to-achieve-4-flops-per-cycle  0.1 sum3 0.2 sum4 0.2 sum5 0.0 double mul1 1.0 mul2 1.1 mul3 1.2 mul4 1.3 mul5 1.4 int loops ops 10 we have 10 floating point.. loops sum1 sum2 sum3 sum4 sum5  pow mul loops mul1 mul2 mul3 mul4 mul5 for int i 0 i loops i mul1 mul mul2 mul mul3 mul mul4.. mul3 mul4 mul5 for int i 0 i loops i mul1 mul mul2 mul mul3 mul mul4 mul mul5 mul sum1 add sum2 add sum3 add sum4 add sum5.. 
 |