Update README.md: add mention of -f unroll loops option for gcc
This commit is contained in:
@@ -144,6 +144,8 @@ The fastest throughput I saw so far on my MacBook Air (M1) so far is with `make
|
||||
|
||||
You can also experiment with replacing `gcc` with `clang`.
|
||||
|
||||
If compiling with gcc, try experimenting with `-funroll-all-loops`, see PR [#183](https://github.com/karpathy/llama2.c/pull/183)
|
||||
|
||||
### OpenMP
|
||||
Big improvements can also be achieved by compiling with OpenMP, which "activates" the `#pragma omp parallel for` inside the matmul and attention, allowing the work in the loops to be split up over multiple processors.
|
||||
You'll need to install the OpenMP library and the clang compiler first (e.g. `apt install clang libomp-dev` on ubuntu). I was not able to get improvements from OpenMP on my MacBook, though. Then you can compile with `make runomp`, which does:
|
||||
|
||||
Reference in New Issue
Block a user