c++ Programming Glossary: mul1
Fast bignum square computation http://stackoverflow.com/questions/18465326/fast-bignum-square-computation 98 32 bits sqr 213.989 ms ... O N 1 N 2 fast sqr mul1 363.472 ms ... O N2 classic multiplication mul2 349.384 ms ..... x 0.98765588997654321000... 195 32 bits sqr 883.01 ms mul1 1427.02 ms mul2 1089.84 ms x 0.98765588997654321000... 389 32.. ms x 0.98765588997654321000... 389 32 bits sqr 3189.19 ms mul1 5553.23 ms mul2 3159.07 ms after optimizations for karatsuba..
modular arithmetics and NTT (finite field DFT) optimizations http://stackoverflow.com/questions/18577076/modular-arithmetics-and-ntt-finite-field-dft-optimizations 1x times sqr1 3.177 ms fast sqr sqr2 720.419 ms NTT sqr mul1 5.588 ms simpe mul mul2 3.172 ms karatsuba mul mul3 1053.382.. 1x times sqr1 3.214 ms fast sqr sqr2 208.298 ms NTT sqr mul1 5.564 ms simpe mul mul2 3.113 ms karatsuba mul mul3 302.740..
SSE2 Compiler Error http://stackoverflow.com/questions/1874882/sse2-compiler-error align 16 int t2 100000 temporary variable __m128i mul1 mul2 for int j 0 j 100000 j t1 j j t2 j j 1 set temporary variables.. movdqa xmm1 xmmword ptr t2 eax pmuludq xmm0 xmm1 movdqa mul1 xmm0 movdqa xmm0 xmmword ptr t1 eax pshufd xmm0 xmm0 05fh pshufd..
how to achieve 4 FLOPs per cycle http://stackoverflow.com/questions/8389648/how-to-achieve-4-flops-per-cycle double sum1 0.1 sum2 0.1 sum3 0.2 sum4 0.2 sum5 0.0 double mul1 1.0 mul2 1.1 mul3 1.2 mul4 1.3 mul5 1.4 int loops ops 10 we.. 5.0 add loops sum1 sum2 sum3 sum4 sum5 pow mul loops mul1 mul2 mul3 mul4 mul5 for int i 0 i loops i mul1 mul mul2 mul.. mul loops mul1 mul2 mul3 mul4 mul5 for int i 0 i loops i mul1 mul mul2 mul mul3 mul mul4 mul mul5 mul sum1 add sum2 add sum3..
|