llama2.c

Author	SHA1	Message	Date
Markus Zhang	6def77d4ba	Correct WandB log step	2023-08-25 17:12:29 +08:00
Andrej Karpathy	7f551dbfd7	new model export: versions 0 (legacy) and 1	2023-08-19 18:25:20 +00:00
Andrej Karpathy	38bfac90a8	bigchange: add multiquery support in run.c. we can now train and inference multiquery models (where n_kv_heads < n_heads). this also means that we, in principle, support Llama 2 34B and 70B models, which are multiquery	2023-08-13 19:34:05 +00:00
Andrej Karpathy	9c3cfb46a3	make default be the llama2 tokenizer	2023-08-13 03:08:07 +00:00
Andrej Karpathy	00a61dc7f9	remove the tinyshakespeare dataset until i can bring it back later in a nicer form, otherwise right now we just have a ton of copy paste code here	2023-08-13 02:18:30 +00:00
Andrej Karpathy	b0cfa2458d	ok i can train and sample a model with a custom tokenizer	2023-08-11 16:47:29 +00:00
Andrej Karpathy	623894f5da	fix bug, have to use raw_model not model to access the loss	2023-08-06 07:55:46 +00:00
Michael Cusack	fd5e2cc7bc	Updating training code for loss result	2023-08-04 17:03:11 +07:00
Will Lamond	e592ed5d64	Add tinyshakespeare dataset	2023-08-01 15:26:47 -07:00
Andrej Karpathy	78952fb0b4	propagate the dropout flag	2023-07-27 22:20:31 +00:00
Andrej Karpathy	517763346d	HF checkpoints i removed the optimizer to save space, init Adam without the first/second moments is ok	2023-07-27 22:20:07 +00:00
Andrew Gu	25494f9cbc	Have DDP ignore `freqs_cis` to avoid broadcast	2023-07-24 13:58:09 +00:00
Andrej Karpathy	9414e7a45e	tweaks and add a simple test	2023-07-23 14:52:08 +00:00
Andrej Karpathy	f499d9d2b5	delete debug line	2023-07-23 05:37:44 +00:00
Andrej Karpathy	5b161abb9a	somewhere ~20 hours later	2023-07-23 05:23:45 +00:00

15 Commits