Andrej Karpathy
|
b0cfa2458d
|
ok i can train and sample a model with a custom tokenizer
|
2023-08-11 16:47:29 +00:00 |
|
Andrej Karpathy
|
4c6f0af9ff
|
add the ability to train a custom sentencepiece tokenizer with a given vocab_size, and pretok with it. some more changes still needed to merge this branch, in train.py and ofc run.c. did this in a sadly bit ugly, but fully backwards compatible way. basically when we use custom tokenizer we create a whole new directory structure for that
|
2023-08-11 03:58:22 +00:00 |
|
Milos Cubrilo
|
af3f3a7b31
|
Speed up tinystories pretokenize command
|
2023-07-29 03:08:33 +02:00 |
|
Andrej Karpathy
|
5b161abb9a
|
somewhere ~20 hours later
|
2023-07-23 05:23:45 +00:00 |
|