7b0017c6cd
Merge pull request #362 from byte-6174/upmaster
Andrej
2023-08-26 14:03:31 -07:00
50832e3dff
move script into the new docs folder
Andrej Karpathy
2023-08-26 21:02:23 +00:00
1386edfd90
add docs on stories260K
Andrej Karpathy
2023-08-26 20:52:49 +00:00
32cecbfe4a
freeing tokenizer in test.c
Aniket
2023-08-26 16:35:50 -04:00
e47bacdc62
Merge pull request #355 from janimo/export-vocab-size
Andrej
2023-08-26 13:24:55 -07:00
604d3c59c0
Add Code Llama info
Jani Monoses
2023-08-26 22:36:09 +03:00
2c2b284988
Get vocab_size from token embeddings size
Jani Monoses
2023-08-26 22:35:55 +03:00
49daf18f2f
Merge pull request #343 from karpathy/feature/chat
Andrej
2023-08-25 08:00:11 -07:00
4a7a62bd21
Merge branch 'master' into feature/chat
feature/chat
Andrej
2023-08-25 07:58:33 -07:00
5c6427e4d7
Merge pull request #352 from dmarcos/readmeTypo
Andrej
2023-08-25 07:55:54 -07:00
cbc2488b82
Merge pull request #353 from photomz/master
Andrej
2023-08-25 07:55:26 -07:00
fbe324fc5a
adjust things a bit
Andrej Karpathy
2023-08-25 14:54:05 +00:00
6def77d4ba
Correct WandB log step
Markus Zhang
2023-08-25 17:12:29 +08:00
19cfbeca71
Fix typo in README.md
Diego Marcos Segura
2023-08-24 19:45:23 -07:00
d7cd98633d
add todo item to add a PyTorch Engine
Andrej
2023-08-24 09:04:52 -07:00
3d787b2463
ok getting closer, and manually verified correctness of the schema matching python. still some weirdness in the printing to chase down, and also have to tune the buffer lengths and make them sensible and such
Andrej Karpathy
2023-08-24 04:31:06 +00:00
40fb902cf0
fix chat format bug i think
Andrej Karpathy
2023-08-24 03:33:44 +00:00
c7a26264a2
Merge branch 'master' of github.com:karpathy/llama2.c
Andrej Karpathy
2023-08-24 03:10:18 +00:00
446c1c0df3
Merge branch 'janimo-train-vocab-python'
Andrej Karpathy
2023-08-24 03:10:07 +00:00
096325b66c
bring back num_threads
Andrej Karpathy
2023-08-24 03:09:55 +00:00
90104db721
Merge pull request #348 from nehzata/clip_steps
Andrej
2023-08-23 19:57:01 -07:00
9bc72acab0
steps shouldn't exceed the model's seq_len either
Ali Nehzat
2023-08-24 09:09:16 +10:00
c5e0e7fce4
attempt at chat function, but it was 8AM and I didn't have coffee yet. Seems to work but it's probably subtly broken or too complex. version 1 only, lots of hard-coded non-sensical buffer sizes. Have to go to work now
Andrej Karpathy
2023-08-23 16:27:48 +00:00
fe9b9f2f15
Train vocab in Python
Jani Monoses
2023-08-23 17:28:14 +03:00
7ac65cb2c2
make decode safer and fix issue with skipping bad byte tokens
Andrej Karpathy
2023-08-23 01:08:31 +00:00
4b3e66021a
lol text
Andrej Karpathy
2023-08-23 00:26:47 +00:00
d1eb18b8ec
add BOS and EOS function to the Tokenizer as we start to converge closer to the Llama 2 code from Meta, and as we're about to add the Chat capability
Andrej Karpathy
2023-08-23 00:08:22 +00:00
d26a499207
absorb our rng state into the Sampler. I feel that this is correct because it makes our use of entropy very explicit and localized, and the sampler is now well-contained without any global state. Code is increasingly more beautiful.
Andrej Karpathy
2023-08-22 03:22:56 +00:00
ac6cf8d6e8
tweak todo list
Andrej Karpathy
2023-08-22 02:48:51 +00:00
ad7a1ef525
clean up swiglu a little bit
Andrej Karpathy
2023-08-22 02:32:21 +00:00
0e362f735f
and finallygit add run.c split off the generate function. alongside it will come a chat function. we are close
Andrej Karpathy
2023-08-22 02:22:36 +00:00
d73b917d3b
hide temperature and topp into the sampler, it's a little bit less flexible but a little bit more cleaner
Andrej Karpathy
2023-08-22 02:17:51 +00:00
379f083b85
make sorted vocab a buffer of Tokenizer
Andrej Karpathy
2023-08-22 01:56:51 +00:00
5eaca535cd
Merge pull request #335 from ozabluda/ozabluda-patch-5
Andrej
2023-08-21 18:16:07 -07:00
83287ff254
fix steps=0 is max context
Andrej Karpathy
2023-08-22 01:15:00 +00:00
8a3ea7b433
Merge pull request #329 from atamurad/import_meta
Andrej
2023-08-21 07:34:32 -07:00
61c26d5392
Updated README to replace export_meta_llama_bin.py script with export.py
atamyrat
2023-08-21 14:24:01 +03:00
36a78af5e1
tested load_meta_model() in export.py, deleting old export_meta_llama_bin.py file
atamyrat
2023-08-21 14:19:56 +03:00
de005474d3
Added load_meta_model() to export.py
atamyrat
2023-08-21 14:13:47 +03:00
4444575c4e
Added check of generation parameters.
rdentato
2023-08-21 06:43:39 +00:00
dd61b13e57
delete the save_torchscript export file, but copy its content to the new export.py for the future maybe
Andrej Karpathy
2023-08-21 05:09:06 +00:00
ea44f53568
now that the export.py HF functionality is in master, we can delete this file, and update the readme
Andrej Karpathy
2023-08-21 04:58:19 +00:00
801c68f5a1
Merge pull request #326 from atamurad/import_hf
Andrej
2023-08-20 21:53:17 -07:00
74a68eeb35
Merge pull request #325 from HarryGifford/users/hegi/update-readme-threading
Andrej
2023-08-20 21:50:26 -07:00
288b3cec09
remove dagger in the eyeball
Andrej Karpathy
2023-08-21 04:47:49 +00:00
14275bd623
minor clean. i think a lot of chaos has been reduced for today. we shall now rest.
Andrej Karpathy
2023-08-21 04:43:24 +00:00
3868f732a4
and finally refactor the Sampler. things are starting to look a lot cleaner I think
Andrej Karpathy
2023-08-21 04:23:02 +00:00
8a377a1d31
refactor the Transformer (Config, Weights, RunState) into a single object, with build and free too
Andrej Karpathy
2023-08-21 03:55:12 +00:00
ae2e4f8d88
name the tokenizer methods cleaner: encode and decode
Andrej Karpathy
2023-08-21 03:11:54 +00:00
155475a523
Fix WQ and WK permutation in huggingface models
atamyrat
2023-08-21 05:16:11 +03:00
d7704bdeaa
mark ModelArgs.hidden_dim as optional and calculate as previously if not provided
atamyrat
2023-08-21 03:40:34 +03:00
09db52c69e
Added huggingface model loader to export.py
atamyrat
2023-08-21 02:53:50 +03:00
a72b3b0206
Update readme with suggestion on number of threads to use
Harry Gifford
2023-08-20 15:01:33 -07:00
c74456f3f0
refactor step 1. the tokenizer, and all the other abstractions, are a total mess, refactoring things a bit
Andrej Karpathy
2023-08-20 18:18:23 +00:00
1e335a41cf
remove freq_cis fields as they are not used anymore
Andrej Karpathy
2023-08-20 17:26:43 +00:00
c0511de617
probindex should never have been part of RunState. i apologize for this failure of abstraction
Andrej Karpathy
2023-08-20 17:18:06 +00:00
8c93c7a30e
Merge pull request #322 from karpathy/feature/export
Andrej
2023-08-20 10:08:32 -07:00
13dcee493a
todos update
Andrej Karpathy
2023-08-20 17:02:22 +00:00
f3db92a2dc
use out_file.tell() instead of nbytes += arithmetic
Andrej Karpathy
2023-08-20 16:51:35 +00:00
fa8dfd854e
isolate read_checkpoint, because i'd like to now make it support both version 0 and version 1
Andrej Karpathy
2023-08-19 19:21:12 +00:00
4df5e2e939
make version 1 be the legacy export but with new header. version 2 will be Q8_0 export
Andrej Karpathy
2023-08-19 18:51:32 +00:00
4212bd6d43
oops fix double indent on quantize def
Andrej Karpathy
2023-08-19 18:34:49 +00:00
7f551dbfd7
new model export: versions 0 (legacy) and 1
Andrej Karpathy
2023-08-19 18:25:20 +00:00
6c5d78fa41
Merge pull request #317 from yiminghan/yhan/old
Andrej
2023-08-19 10:01:08 -07:00
db1a722816
Merge pull request #318 from rahoua/master
Andrej
2023-08-19 10:00:56 -07:00
d2a546c577
Merge pull request #319 from RahulSChand/warning
Andrej
2023-08-19 10:00:27 -07:00
fbefeec1b1
add assert message to give better warning
rahulschand
2023-08-19 13:05:26 +05:30
978c311b30
Add pecca-rs to README.md
rahoua
2023-08-18 14:58:21 -07:00
882e480bc0
update read me
YiMing Han
2023-08-18 15:18:29 -04:00
d09ebbb32b
Revert "working one"
YiMing Han
2023-08-18 15:14:08 -04:00
bc7cb7d0e8
Revert "only dart"
YiMing Han
2023-08-18 15:13:59 -04:00
01df3731d6
only dart
YiMing Han
2023-08-18 15:09:24 -04:00
8607b11ea1
working one
YiMing Han
2023-08-18 15:07:41 -04:00
039a9713c2
ok this first version works but i don't think is ready to merge, have to think on more
feature/int8
Andrej Karpathy
2023-08-18 15:44:02 +00:00
52fe3653e5
Fix vocab_source in sample.py
David A Roberts
2023-08-18 18:40:25 +10:00
591f1353c7
ok this works but is super slow because we are doing all the work in fp32 still
Andrej Karpathy
2023-08-18 03:40:18 +00:00
e9cbe3e84f
small improvements to comments and warnings and increase header size during model export
Andrej Karpathy
2023-08-17 14:32:22 +00:00
5e2e5b28f4
re-write the model export to do int8 quantization in groups, with group size fallback, and also change the header to be much better
Andrej Karpathy
2023-08-17 05:56:20 +00:00
bd182289c5
calculate the freq_cis online, no need to write/read them to/from checkpoints
Andrej Karpathy
2023-08-17 04:13:13 +00:00
b68a6d2ab5
Merge pull request #307 from madroidmaq/master
Andrej
2023-08-16 20:09:32 -07:00
57bf0e9ee4
Merge pull request #306 from rdentato/patch-utf8-no-validation
Andrej
2023-08-16 09:51:11 -07:00