touchups to readme: reshuffle todos, and add a windows note
This commit is contained in:
@@ -155,18 +155,9 @@ OMP_NUM_THREADS=4 ./run out/model.bin
|
||||
|
||||
Depending on your system resources you may want to tweak these hyperparameters. (TODO: I am not intimately familiar with OpenMP and its configuration, if someone would like to flesh out this section I would welcome a PR).
|
||||
|
||||
## unsorted todos
|
||||
## platforms
|
||||
|
||||
- why is there a leading space in C sampling code when we `./run`?
|
||||
- support Llama 2 Chat models, and tune run.c to Chat UI/UX
|
||||
- possibly include emscripten / web backend (as seen in @gg PR)
|
||||
- currently the project only runs in fp32, want to explore more reduced precision inference.
|
||||
- todo multiquery support? doesn't seem as useful for smaller models that run on CPU (?)
|
||||
- todo support inferencing beyond max_seq_len steps, have to think through the kv cache
|
||||
- why is MFU so low (~10%) on my A100 40GB for training?
|
||||
- weird errors with torch.compile and wandb when using DDP
|
||||
- (LoRA) finetuning of Llama 2 models
|
||||
- make more better tests to decrease yolo
|
||||
On **Windows**, use `build_msvc.bat` in a Visual Studio Command Prompt to build with msvc, or you can use `make win64` to use mingw compiler toolchain from linux or windows to build the windows target. MSVC build will automatically use openmp and max threads appropriate for your CPU unless you set `OMP_NUM_THREADS` env.
|
||||
|
||||
## ack
|
||||
|
||||
@@ -199,6 +190,19 @@ If your candidate PRs have elements of these it doesn't mean they won't get merg
|
||||
- [llama2.rs](https://github.com/gaxler/llama2.rs) by @gaxler: a Rust port of this project
|
||||
- [go-llama2](https://github.com/tmc/go-llama2) by @tmc: a Go port of this project
|
||||
|
||||
## unsorted todos
|
||||
|
||||
- why is there a leading space in C sampling code when we `./run`?
|
||||
- support Llama 2 Chat models, and tune run.c to Chat UI/UX
|
||||
- possibly include emscripten / web backend (as seen in @gg PR)
|
||||
- currently the project only runs in fp32, want to explore more reduced precision inference.
|
||||
- todo multiquery support? doesn't seem as useful for smaller models that run on CPU (?)
|
||||
- todo support inferencing beyond max_seq_len steps, have to think through the kv cache
|
||||
- why is MFU so low (~10%) on my A100 40GB for training?
|
||||
- weird errors with torch.compile and wandb when using DDP
|
||||
- (LoRA) finetuning of Llama 2 models
|
||||
- make more better tests to decrease yolo
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
|
||||
Reference in New Issue
Block a user