From 60d32cf13a47c7d6f7fc82b4901341c4649ead0b Mon Sep 17 00:00:00 2001 From: Andrej Karpathy Date: Sun, 23 Jul 2023 05:25:07 +0000 Subject: [PATCH] move lines around --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 1648bb3..b090fbb 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,10 @@ ## llama2.c -![llama2c](assets/llama_cute.jpg) - Have you ever wanted to inference a baby [Llama 2](https://ai.meta.com/llama/) model in pure C? No? Well, now you can! +![llama2c](assets/llama_cute.jpg) + Code in this repo first lets you train the Llama 2 architecture from scratch in PyTorch, then save the weights to a raw binary file, then load that into one ~simple 500-line C file that inferences the model, simply in fp32 for now. Of course, this is not super fast, but it's not too bad either. E.g. on my cloud Linux devbox a dim 288 6-layer 6-head model (~15M params) inferences at ~18 tok/s in fp32, and about the same on my M1 MacBook Air.