From d45a36cdd2ef86b93094cd6020dd0296e8ad5667 Mon Sep 17 00:00:00 2001 From: Krishnaraj Bhat Date: Thu, 10 Aug 2023 10:59:39 +0530 Subject: [PATCH] Update readme for openmp on mac --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index b62a8d8..c83b11d 100644 --- a/README.md +++ b/README.md @@ -160,7 +160,7 @@ If compiling with gcc, try experimenting with `-funroll-all-loops`, see PR [#183 ### OpenMP Big improvements can also be achieved by compiling with OpenMP, which "activates" the `#pragma omp parallel for` inside the matmul and attention, allowing the work in the loops to be split up over multiple processors. -You'll need to install the OpenMP library and the clang compiler first (e.g. `apt install clang libomp-dev` on ubuntu). I was not able to get improvements from OpenMP on my MacBook, though. Then you can compile with `make runomp`, which does: +You'll need to install the OpenMP library and the clang compiler first (e.g. `apt install clang libomp-dev` on ubuntu). Then you can compile with `make runomp`, which does: ```bash clang -Ofast -fopenmp -march=native run.c -lm -o run @@ -180,6 +180,8 @@ On **Windows**, use `build_msvc.bat` in a Visual Studio Command Prompt to build On **Centos 7**, **Amazon Linux 2018** use `rungnu` Makefile target: `make rungnu` or `make runompgnu` to use openmp. +On **Mac**, use clang from brew for openmp build. Install clang as `brew install llvm` and use the installed clang binary to compile with openmp: `make runomp CC=/opt/homebrew/opt/llvm/bin/clang` + ## ack I trained the llama2.c storyteller models on a 4X A100 40GB box graciously provided by the excellent [Lambda labs](https://lambdalabs.com/service/gpu-cloud), thank you.