From 0609eb660164f9c983a4029921e23f78705bfa2a Mon Sep 17 00:00:00 2001
From: Andrej Karpathy <andrej.karpathy@gmail.com>
Date: Sat, 5 Aug 2023 17:13:35 +0000
Subject: [PATCH] slightly tune todos

---
 README.md | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/README.md b/README.md
index 85340b9..1b46f29 100644
--- a/README.md
+++ b/README.md
@@ -230,16 +230,13 @@ If your candidate PRs have elements of these it doesn't mean they won't get merg
 
 ## unsorted todos
 
-- support Llama 2 7B Chat model and tune run.c to Chat UI/UX
+- should calculate freq_cis online in the script run.c instead of loading them
+- support Llama 2 7B Chat models and tune run.c to Chat UI/UX
 - speed up 7B Llama 2 models sufficiently to work at interactive rates on Apple Silicon MacBooks
-- possibly include emscripten / web backend (as seen in @gg PR)
-- currently the project only runs in fp32, how easy would it be to different precisions?
-- look into quantization and what would be involved
-- todo multiquery support? doesn't seem as useful for smaller models that run on CPU (?)
-- todo support inferencing beyond max_seq_len steps, have to think through the kv cache
-- why is MFU so low (~10%) on my A100 40GB for training?
-- weird errors with torch.compile and wandb when using DDP
-- (LoRA) finetuning of Llama 2 models
+- investigate precisions other than just fp32: fp16, and quantization
+- investigate running on other backends, especially GPUs
+- add multiquery support into run.c
+- (LoRA) finetuning and export of Llama 2 models
 - make more better tests to decrease yolo
 
 ## License