diff --git a/README.md b/README.md index d2d19d9..8c36285 100644 --- a/README.md +++ b/README.md @@ -205,7 +205,7 @@ If compiling with gcc, try experimenting with `-funroll-all-loops`, see PR [#183 ### OpenMP Big improvements can also be achieved by compiling with OpenMP, which "activates" the `#pragma omp parallel for` inside the matmul and attention, allowing the work in the loops to be split up over multiple processors. -You'll need to install the OpenMP library and the clang compiler first (e.g. `apt install clang libomp-dev` on ubuntu). I was not able to get improvements from OpenMP on my MacBook, though. Then you can compile with `make runomp`, which does: +You'll need to install the OpenMP library and the clang compiler first (e.g. `apt install clang libomp-dev` on ubuntu). Then you can compile with `make runomp`, which does: ```bash clang -Ofast -fopenmp -march=native run.c -lm -o run @@ -225,6 +225,8 @@ On **Windows**, use `build_msvc.bat` in a Visual Studio Command Prompt to build On **Centos 7**, **Amazon Linux 2018** use `rungnu` Makefile target: `make rungnu` or `make runompgnu` to use openmp. +On **Mac**, use clang from brew for openmp build. Install clang as `brew install llvm` and use the installed clang binary to compile with openmp: `make runomp CC=/opt/homebrew/opt/llvm/bin/clang` + ## tests You can run tests simply with pytest: