Merge pull request #25 from wsmoses/master
Add information on compiler flags
This commit is contained in:
@@ -104,13 +104,13 @@ gcc -O3 -o run run.c -lm
|
||||
|
||||
-O3 includes optimizations that are expensive in terms of compile time and memory usage. Including vectorization, loop unrolling, and predicting branches. Here's a few more to try.
|
||||
|
||||
`-Ofast` TODO
|
||||
`-Ofast` Run additional optimizations which may break compliance with the C/IEEE specifications, in addition to `-O3`. See [the GCC docs](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html) for more information.
|
||||
|
||||
`-ffast-math` breaks IEEE compliance, e.g. allowing reordering of operations, disables a bunch of checks for e.g. NaNs (assuming they don't happen), enables reciprocal approximations, disables signed zero, etc.
|
||||
|
||||
`-funsafe-math-optimizations` TODO
|
||||
`-funsafe-math-optimizations` a more limited form of -ffast-math, that still breaks IEEE compliance but doesn't have all of the numeric/error handling changes from `-ffasth-math`. See [the GCC docs](https://gcc.gnu.org/wiki/FloatingPointMath) for more information.
|
||||
|
||||
`-march=native` TODO
|
||||
`-march=native` Compile the program to use the architecture of the machine you're compiling on rather than a more generic CPU. This may enable additional optimizations and hardware-specific tuning such as improved vector instructions/width.
|
||||
|
||||
Putting a few of these together, the fastest throughput I saw so far on my MacBook Air (M1) is with:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user