From 96873b02746f106eba9bb48bd91bb8ff89ef1025 Mon Sep 17 00:00:00 2001 From: Andrej Karpathy Date: Wed, 9 Aug 2023 02:08:33 +0000 Subject: [PATCH] refine todos section make more concrete and sort --- README.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index ccd77c5..323bbcf 100644 --- a/README.md +++ b/README.md @@ -241,14 +241,15 @@ If your candidate PRs have elements of these it doesn't mean they won't get merg ## unsorted todos -- should calculate freq_cis online in the script run.c instead of loading them -- support Llama 2 7B Chat models and tune run.c to Chat UI/UX -- speed up 7B Llama 2 models sufficiently to work at interactive rates on Apple Silicon MacBooks -- investigate precisions other than just fp32: fp16, and quantization -- investigate running on other backends, especially GPUs - add multiquery support into run.c +- add custom bpe training code and the ability to train a smaller vocabulary (32K is to much) +- should calculate freq_cis online in the script run.c instead of loading them +- int4/8 quantization +- export the model in a more sensible output format with a proper header, etc. +- train a tiny Llama test model (committed to repo) and use it as reference in unit tests +- support Llama 2 7B Chat models and tune run.c to Chat UI/UX +- llama2.cu investigate and merge - (LoRA) finetuning and export of Llama 2 models -- make more better tests to decrease yolo ## License