Merge pull request #25 from wsmoses/master

Add information on compiler flags
2023-07-23 22:12:28 -07:00
parent f6388c99c8 65e07462e4
commit 0e4076cd52
1 changed files with 3 additions and 3 deletions
@@ -104,13 +104,13 @@ gcc -O3 -o run run.c -lm

 -O3 includes optimizations that are expensive in terms of compile time and memory usage. Including vectorization, loop unrolling, and predicting branches. Here's a few more to try.

-`-Ofast` TODO
+`-Ofast` Run additional optimizations which may break compliance with the C/IEEE specifications, in addition to `-O3`. See [the GCC docs](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html) for more information.

 `-ffast-math` breaks IEEE compliance, e.g. allowing reordering of operations, disables a bunch of checks for e.g. NaNs (assuming they don't happen), enables reciprocal approximations, disables signed zero, etc.

-`-funsafe-math-optimizations` TODO
+`-funsafe-math-optimizations` a more limited form of -ffast-math, that still breaks IEEE compliance but doesn't have all of the numeric/error handling changes from `-ffasth-math`. See [the GCC docs](https://gcc.gnu.org/wiki/FloatingPointMath) for more information.

-`-march=native` TODO
+`-march=native` Compile the program to use the architecture of the machine you're compiling on rather than a more generic CPU. This may enable additional optimizations and hardware-specific tuning such as improved vector instructions/width.

 Putting a few of these together, the fastest throughput I saw so far on my MacBook Air (M1) is with: