slightly tune todos of the project

This commit is contained in:
Andrej Karpathy
2023-07-27 23:03:35 +00:00
parent e5752e1fc9
commit 568a651c45
+4 -3
View File
@@ -202,10 +202,11 @@ If your candidate PRs have elements of these it doesn't mean they won't get merg
## unsorted todos ## unsorted todos
- why is there a leading space in C sampling code when we `./run`? - support Llama 2 7B Chat model and tune run.c to Chat UI/UX
- support Llama 2 Chat models, and tune run.c to Chat UI/UX - speed up 7B Llama 2 models sufficiently to work at interactive rates on Apple Silicon MacBooks
- possibly include emscripten / web backend (as seen in @gg PR) - possibly include emscripten / web backend (as seen in @gg PR)
- currently the project only runs in fp32, want to explore more reduced precision inference. - currently the project only runs in fp32, how easy would it be to different precisions?
- look into quantization and what would be involved
- todo multiquery support? doesn't seem as useful for smaller models that run on CPU (?) - todo multiquery support? doesn't seem as useful for smaller models that run on CPU (?)
- todo support inferencing beyond max_seq_len steps, have to think through the kv cache - todo support inferencing beyond max_seq_len steps, have to think through the kv cache
- why is MFU so low (~10%) on my A100 40GB for training? - why is MFU so low (~10%) on my A100 40GB for training?