From 4e23ad83995601b63a7697ef27d0ba958480b908 Mon Sep 17 00:00:00 2001 From: Andrej Karpathy Date: Thu, 27 Jul 2023 06:17:13 +0000 Subject: [PATCH] touchups to readme: reshuffle todos, and add a windows note --- README.md | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index ac19680..7b6251e 100644 --- a/README.md +++ b/README.md @@ -155,18 +155,9 @@ OMP_NUM_THREADS=4 ./run out/model.bin Depending on your system resources you may want to tweak these hyperparameters. (TODO: I am not intimately familiar with OpenMP and its configuration, if someone would like to flesh out this section I would welcome a PR). -## unsorted todos +## platforms -- why is there a leading space in C sampling code when we `./run`? -- support Llama 2 Chat models, and tune run.c to Chat UI/UX -- possibly include emscripten / web backend (as seen in @gg PR) -- currently the project only runs in fp32, want to explore more reduced precision inference. -- todo multiquery support? doesn't seem as useful for smaller models that run on CPU (?) -- todo support inferencing beyond max_seq_len steps, have to think through the kv cache -- why is MFU so low (~10%) on my A100 40GB for training? -- weird errors with torch.compile and wandb when using DDP -- (LoRA) finetuning of Llama 2 models -- make more better tests to decrease yolo +On **Windows**, use `build_msvc.bat` in a Visual Studio Command Prompt to build with msvc, or you can use `make win64` to use mingw compiler toolchain from linux or windows to build the windows target. MSVC build will automatically use openmp and max threads appropriate for your CPU unless you set `OMP_NUM_THREADS` env. ## ack @@ -199,6 +190,19 @@ If your candidate PRs have elements of these it doesn't mean they won't get merg - [llama2.rs](https://github.com/gaxler/llama2.rs) by @gaxler: a Rust port of this project - [go-llama2](https://github.com/tmc/go-llama2) by @tmc: a Go port of this project +## unsorted todos + +- why is there a leading space in C sampling code when we `./run`? +- support Llama 2 Chat models, and tune run.c to Chat UI/UX +- possibly include emscripten / web backend (as seen in @gg PR) +- currently the project only runs in fp32, want to explore more reduced precision inference. +- todo multiquery support? doesn't seem as useful for smaller models that run on CPU (?) +- todo support inferencing beyond max_seq_len steps, have to think through the kv cache +- why is MFU so low (~10%) on my A100 40GB for training? +- weird errors with torch.compile and wandb when using DDP +- (LoRA) finetuning of Llama 2 models +- make more better tests to decrease yolo + ## License MIT