Merge pull request #25 from wsmoses/master

Add information on compiler flags
This commit is contained in:
Andrej
2023-07-23 22:12:28 -07:00
committed by GitHub
+3 -3
View File
@@ -104,13 +104,13 @@ gcc -O3 -o run run.c -lm
-O3 includes optimizations that are expensive in terms of compile time and memory usage. Including vectorization, loop unrolling, and predicting branches. Here's a few more to try.
`-Ofast` TODO
`-Ofast` Run additional optimizations which may break compliance with the C/IEEE specifications, in addition to `-O3`. See [the GCC docs](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html) for more information.
`-ffast-math` breaks IEEE compliance, e.g. allowing reordering of operations, disables a bunch of checks for e.g. NaNs (assuming they don't happen), enables reciprocal approximations, disables signed zero, etc.
`-funsafe-math-optimizations` TODO
`-funsafe-math-optimizations` a more limited form of -ffast-math, that still breaks IEEE compliance but doesn't have all of the numeric/error handling changes from `-ffasth-math`. See [the GCC docs](https://gcc.gnu.org/wiki/FloatingPointMath) for more information.
`-march=native` TODO
`-march=native` Compile the program to use the architecture of the machine you're compiling on rather than a more generic CPU. This may enable additional optimizations and hardware-specific tuning such as improved vector instructions/width.
Putting a few of these together, the fastest throughput I saw so far on my MacBook Air (M1) is with: