From d45a36cdd2ef86b93094cd6020dd0296e8ad5667 Mon Sep 17 00:00:00 2001
From: Krishnaraj Bhat <krrishnarraj@gmail.com>
Date: Thu, 10 Aug 2023 10:59:39 +0530
Subject: [PATCH] Update readme for openmp on mac

---
 README.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index b62a8d8..c83b11d 100644
--- a/README.md
+++ b/README.md
@@ -160,7 +160,7 @@ If compiling with gcc, try experimenting with `-funroll-all-loops`, see PR [#183
 
 ### OpenMP
 Big improvements can also be achieved by compiling with OpenMP, which "activates" the `#pragma omp parallel for` inside the matmul and attention, allowing the work in the loops to be split up over multiple processors.
-You'll need to install the OpenMP library and the clang compiler first (e.g. `apt install clang libomp-dev` on ubuntu). I was not able to get improvements from OpenMP on my MacBook, though. Then you can compile with `make runomp`, which does:
+You'll need to install the OpenMP library and the clang compiler first (e.g. `apt install clang libomp-dev` on ubuntu). Then you can compile with `make runomp`, which does:
 
 ```bash
 clang -Ofast -fopenmp -march=native run.c  -lm  -o run
@@ -180,6 +180,8 @@ On **Windows**, use `build_msvc.bat` in a Visual Studio Command Prompt to build
 
 On **Centos 7**, **Amazon Linux 2018** use `rungnu` Makefile target: `make rungnu` or `make runompgnu` to use openmp.
 
+On **Mac**, use clang from brew for openmp build. Install clang as `brew install llvm` and use the installed clang binary to compile with openmp: `make runomp CC=/opt/homebrew/opt/llvm/bin/clang`
+
 ## ack
 
 I trained the llama2.c storyteller models on a 4X A100 40GB box graciously provided by the excellent [Lambda labs](https://lambdalabs.com/service/gpu-cloud), thank you.