From 0609eb660164f9c983a4029921e23f78705bfa2a Mon Sep 17 00:00:00 2001 From: Andrej Karpathy Date: Sat, 5 Aug 2023 17:13:35 +0000 Subject: [PATCH] slightly tune todos --- README.md | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 85340b9..1b46f29 100644 --- a/README.md +++ b/README.md @@ -230,16 +230,13 @@ If your candidate PRs have elements of these it doesn't mean they won't get merg ## unsorted todos -- support Llama 2 7B Chat model and tune run.c to Chat UI/UX +- should calculate freq_cis online in the script run.c instead of loading them +- support Llama 2 7B Chat models and tune run.c to Chat UI/UX - speed up 7B Llama 2 models sufficiently to work at interactive rates on Apple Silicon MacBooks -- possibly include emscripten / web backend (as seen in @gg PR) -- currently the project only runs in fp32, how easy would it be to different precisions? -- look into quantization and what would be involved -- todo multiquery support? doesn't seem as useful for smaller models that run on CPU (?) -- todo support inferencing beyond max_seq_len steps, have to think through the kv cache -- why is MFU so low (~10%) on my A100 40GB for training? -- weird errors with torch.compile and wandb when using DDP -- (LoRA) finetuning of Llama 2 models +- investigate precisions other than just fp32: fp16, and quantization +- investigate running on other backends, especially GPUs +- add multiquery support into run.c +- (LoRA) finetuning and export of Llama 2 models - make more better tests to decrease yolo ## License