c++ Programming Glossary: agner
Why is transposing a matrix of 512x512 much slower than transposing a matrix of 513x513? http://stackoverflow.com/questions/11413855/why-is-transposing-a-matrix-of-512x512-much-slower-than-transposing-a-matrix-of share improve this question The explanation comes from Agner Fog in Optimizing software in C and it reduces to how data is.. read memory. I'll try to somewhat follow the example from Agner Assume each set has 4 lines each holding 64 bytes. We first.. lines. This is the theory part. Next the explanation also Agner I'm following it closely to avoid making mistakes Assume a matrix..
cpu dispatcher for visual studio for AVX and SSE http://stackoverflow.com/questions/15406658/cpu-dispatcher-for-visual-studio-for-avx-and-sse the appropriate code path. I've follow the suggestions by Agner Fog to make a CPU dispatcher http www.agner.org optimize #vectorclass.. this Edit Okay I think I isolated the problem. I'm using Agner Fog's vector class and I have defined three source files as.. as long as I don't have another source file with AVX. Agner Fog's manual says There is no advantage in using the 256 bit..
What is “cache-friendly” code? http://stackoverflow.com/questions/16699247/what-is-cache-friendly-code about caches memory hierarchies and proper programming Agner Fog's page . In his excellent documents you can find detailed..
Why is unsigned integer overflow defined behavior but signed integer overflow isn't? http://stackoverflow.com/questions/18195715/why-is-unsigned-integer-overflow-defined-behavior-but-signed-integer-overflow-is this blog post by Ian Lance Taylor or this complaint by Agner Fog and the answers to his bug report. share improve this answer..
Performance of built-in types : char vs short vs int vs. float vs. double http://stackoverflow.com/questions/5069489/performance-of-built-in-types-char-vs-short-vs-int-vs-float-vs-double about them and non existent otherwise. Further reading Agner Fog maintains a nice website with lots of discussion of low..
SSE SSE2 and SSE3 for GNU C++ http://stackoverflow.com/questions/661338/sse-sse2-and-sse3-for-gnu-c some very nice coverage of intrinsics and vectorization in Agner Fog's optimization PDFs thanks although it's a bit spread about..
How to write fast (low level) code? [closed] http://stackoverflow.com/questions/6852670/how-to-write-fast-low-level-code Writing High Level book Software optimization resources by Agner Fog five detailed pdf manuals I'll need a bit of skim time to..
How can adding code to a loop make it faster? http://stackoverflow.com/questions/688325/how-can-adding-code-to-a-loop-make-it-faster EDIT If you want to read on the branch prediction give Agner Fog's excellent web site a try http www.agner.org optimize This..
Using AVX CPU instructions: Poor performance without “/arch:AVX” http://stackoverflow.com/questions/7839925/using-avx-cpu-instructions-poor-performance-without-archavx the result of expensive state switching. See page 102 of Agner Fog's manual http www.agner.org optimize microarchitecture.pdf..
how to achieve 4 FLOPs per cycle http://stackoverflow.com/questions/8389648/how-to-achieve-4-flops-per-cycle mul to complete on most of the modern Intel cpu's see e.g. Agner Fog's 'Instruction Tables' . Due to pipelining one can get a..
Why is one loop so much slower than two loops? http://stackoverflow.com/questions/8547778/why-is-one-loop-so-much-slower-than-two-loops going on here... Alignment could still play an effect as Agner Fog mentions cache bank conflicts . That link is about Sandy..
|