15 Commits

Author SHA1 Message Date
Markus Zhang 6def77d4ba Correct WandB log step 2023-08-25 17:12:29 +08:00
Andrej Karpathy 7f551dbfd7 new model export: versions 0 (legacy) and 1 2023-08-19 18:25:20 +00:00
Andrej Karpathy 38bfac90a8 bigchange: add multiquery support in run.c. we can now train and inference multiquery models (where n_kv_heads < n_heads). this also means that we, in principle, support Llama 2 34B and 70B models, which are multiquery 2023-08-13 19:34:05 +00:00
Andrej Karpathy 9c3cfb46a3 make default be the llama2 tokenizer 2023-08-13 03:08:07 +00:00
Andrej Karpathy 00a61dc7f9 remove the tinyshakespeare dataset until i can bring it back later in a nicer form, otherwise right now we just have a ton of copy paste code here 2023-08-13 02:18:30 +00:00
Andrej Karpathy b0cfa2458d ok i can train and sample a model with a custom tokenizer 2023-08-11 16:47:29 +00:00
Andrej Karpathy 623894f5da fix bug, have to use raw_model not model to access the loss 2023-08-06 07:55:46 +00:00
Michael Cusack fd5e2cc7bc Updating training code for loss result 2023-08-04 17:03:11 +07:00
Will Lamond e592ed5d64 Add tinyshakespeare dataset 2023-08-01 15:26:47 -07:00
Andrej Karpathy 78952fb0b4 propagate the dropout flag 2023-07-27 22:20:31 +00:00
Andrej Karpathy 517763346d HF checkpoints i removed the optimizer to save space, init Adam without the first/second moments is ok 2023-07-27 22:20:07 +00:00
Andrew Gu 25494f9cbc Have DDP ignore freqs_cis to avoid broadcast 2023-07-24 13:58:09 +00:00
Andrej Karpathy 9414e7a45e tweaks and add a simple test 2023-07-23 14:52:08 +00:00
Andrej Karpathy f499d9d2b5 delete debug line 2023-07-23 05:37:44 +00:00
Andrej Karpathy 5b161abb9a somewhere ~20 hours later 2023-07-23 05:23:45 +00:00