From 9055766cf6349b1c9d3914f49e8ec7f597f8f1b2 Mon Sep 17 00:00:00 2001 From: Andrej Karpathy Date: Mon, 24 Jul 2023 14:08:06 +0000 Subject: [PATCH] docs on how to run with openmp --- README.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/README.md b/README.md index 48655d7..0738675 100644 --- a/README.md +++ b/README.md @@ -120,6 +120,20 @@ gcc -Ofast -o run run.c -lm Also, I saw someone report higher throughput replacing `gcc` with `clang`. +**OpenMP** Big improvements can also be achieved by compiling with OpenMP, which "activates" the `#pragma omp parallel for` inside the matmul. You can compile e.g. like so: + +```bash +clang -Ofast -fopenmp -march=native run.c -lm -o run +``` + +(I believe you can swap clang/gcc, and may try to leave out -march=native). Then when you run inference, make sure to use OpenMP flags to set the number of threads, e.g.: + +```bash +OMP_NUM_THREADS=4 ./run out/model.bin +``` + +Depending on your system resources you may want to tweak these hyperparameters. (TODO: I am not intimitely familiar with OpenMP and its configuration, if someone would like to flesh out this section I would welcome a PR). + ## unsorted todos - why is there a leading space in C sampling code when we `./run`?