Markus Zhang
|
6def77d4ba
|
Correct WandB log step
|
2023-08-25 17:12:29 +08:00 |
|
Andrej Karpathy
|
7f551dbfd7
|
new model export: versions 0 (legacy) and 1
|
2023-08-19 18:25:20 +00:00 |
|
Andrej Karpathy
|
38bfac90a8
|
bigchange: add multiquery support in run.c. we can now train and inference multiquery models (where n_kv_heads < n_heads). this also means that we, in principle, support Llama 2 34B and 70B models, which are multiquery
|
2023-08-13 19:34:05 +00:00 |
|
Andrej Karpathy
|
9c3cfb46a3
|
make default be the llama2 tokenizer
|
2023-08-13 03:08:07 +00:00 |
|
Andrej Karpathy
|
00a61dc7f9
|
remove the tinyshakespeare dataset until i can bring it back later in a nicer form, otherwise right now we just have a ton of copy paste code here
|
2023-08-13 02:18:30 +00:00 |
|
Andrej Karpathy
|
b0cfa2458d
|
ok i can train and sample a model with a custom tokenizer
|
2023-08-11 16:47:29 +00:00 |
|
Andrej Karpathy
|
623894f5da
|
fix bug, have to use raw_model not model to access the loss
|
2023-08-06 07:55:46 +00:00 |
|
Michael Cusack
|
fd5e2cc7bc
|
Updating training code for loss result
|
2023-08-04 17:03:11 +07:00 |
|
Will Lamond
|
e592ed5d64
|
Add tinyshakespeare dataset
|
2023-08-01 15:26:47 -07:00 |
|
Andrej Karpathy
|
78952fb0b4
|
propagate the dropout flag
|
2023-07-27 22:20:31 +00:00 |
|
Andrej Karpathy
|
517763346d
|
HF checkpoints i removed the optimizer to save space, init Adam without the first/second moments is ok
|
2023-07-27 22:20:07 +00:00 |
|
Andrew Gu
|
25494f9cbc
|
Have DDP ignore freqs_cis to avoid broadcast
|
2023-07-24 13:58:09 +00:00 |
|
Andrej Karpathy
|
9414e7a45e
|
tweaks and add a simple test
|
2023-07-23 14:52:08 +00:00 |
|
Andrej Karpathy
|
f499d9d2b5
|
delete debug line
|
2023-07-23 05:37:44 +00:00 |
|
Andrej Karpathy
|
5b161abb9a
|
somewhere ~20 hours later
|
2023-07-23 05:23:45 +00:00 |
|