Commit Graph

421 Commits

Author SHA1 Message Date
Diego Marcos Segura 19cfbeca71 Fix typo in README.md 2023-08-24 19:46:43 -07:00
Andrej d7cd98633d add todo item to add a PyTorch Engine 2023-08-24 09:04:52 -07:00
Andrej Karpathy c7a26264a2 Merge branch 'master' of github.com:karpathy/llama2.c 2023-08-24 03:10:18 +00:00
Andrej Karpathy 446c1c0df3 Merge branch 'janimo-train-vocab-python' 2023-08-24 03:10:07 +00:00
Andrej Karpathy 096325b66c bring back num_threads 2023-08-24 03:09:55 +00:00
Andrej 90104db721 Merge pull request #348 from nehzata/clip_steps
Clip steps maximum value
2023-08-23 19:57:01 -07:00
Ali Nehzat 9bc72acab0 steps shouldn't exceed the model's seq_len either 2023-08-24 09:09:16 +10:00
Jani Monoses fe9b9f2f15 Train vocab in Python 2023-08-23 19:10:28 +03:00
Andrej Karpathy 7ac65cb2c2 make decode safer and fix issue with skipping bad byte tokens 2023-08-23 01:08:31 +00:00
Andrej Karpathy 4b3e66021a lol text 2023-08-23 00:26:47 +00:00
Andrej Karpathy d1eb18b8ec add BOS and EOS function to the Tokenizer as we start to converge closer to the Llama 2 code from Meta, and as we're about to add the Chat capability 2023-08-23 00:08:22 +00:00
Andrej Karpathy d26a499207 absorb our rng state into the Sampler. I feel that this is correct because it makes our use of entropy very explicit and localized, and the sampler is now well-contained without any global state. Code is increasingly more beautiful. 2023-08-22 03:22:56 +00:00
Andrej Karpathy ac6cf8d6e8 tweak todo list 2023-08-22 02:48:51 +00:00
Andrej Karpathy ad7a1ef525 clean up swiglu a little bit 2023-08-22 02:32:21 +00:00
Andrej Karpathy 0e362f735f and finallygit add run.c split off the generate function. alongside it will come a chat function. we are close 2023-08-22 02:22:36 +00:00
Andrej Karpathy d73b917d3b hide temperature and topp into the sampler, it's a little bit less flexible but a little bit more cleaner 2023-08-22 02:17:51 +00:00
Andrej Karpathy 379f083b85 make sorted vocab a buffer of Tokenizer 2023-08-22 01:56:51 +00:00
Andrej 5eaca535cd Merge pull request #335 from ozabluda/ozabluda-patch-5
Remove unneeded check of free(NULL)
2023-08-21 18:16:07 -07:00
Andrej Karpathy 83287ff254 fix steps=0 is max context 2023-08-22 01:15:00 +00:00
Oleg Zabluda c2834c8a1f Remove unneeded check of free(NULL)
Passing NULL to free() is totally allowed
2023-08-21 10:54:53 -07:00
Andrej ee95b1bf29 Merge pull request #315 from davidar/vocab_source
Fix vocab_source in sample.py
2023-08-21 08:26:28 -07:00
Andrej Karpathy d02e0c90d8 Merge branch 'rdentato-patch-check-params' 2023-08-21 15:17:37 +00:00
Andrej Karpathy 33d94f60a5 parameter validation cleanup 2023-08-21 15:17:14 +00:00
Remo Dentato 2d972f1763 Merge branch 'karpathy:master' into patch-check-params 2023-08-21 17:02:42 +02:00
Andrej 8a3ea7b433 Merge pull request #329 from atamurad/import_meta
Moved export_meta_llama_bin.py to new export.py
2023-08-21 07:34:32 -07:00
atamyrat 61c26d5392 Updated README to replace export_meta_llama_bin.py script with export.py 2023-08-21 14:24:01 +03:00
atamyrat 36a78af5e1 tested load_meta_model() in export.py, deleting old export_meta_llama_bin.py file 2023-08-21 14:19:56 +03:00
atamyrat de005474d3 Added load_meta_model() to export.py 2023-08-21 14:13:47 +03:00
rdentato 4444575c4e Added check of generation parameters. 2023-08-21 06:43:39 +00:00
Andrej Karpathy dd61b13e57 delete the save_torchscript export file, but copy its content to the new export.py for the future maybe 2023-08-21 05:09:06 +00:00
Andrej Karpathy ea44f53568 now that the export.py HF functionality is in master, we can delete this file, and update the readme 2023-08-21 04:58:19 +00:00
Andrej 801c68f5a1 Merge pull request #326 from atamurad/import_hf
Added huggingface model loader/importer to export.py
2023-08-20 21:53:17 -07:00
Andrej 74a68eeb35 Merge pull request #325 from HarryGifford/users/hegi/update-readme-threading
Update readme with suggestion on number of threads to use
2023-08-20 21:50:26 -07:00
Andrej Karpathy 288b3cec09 remove dagger in the eyeball 2023-08-21 04:47:49 +00:00
Andrej Karpathy 14275bd623 minor clean. i think a lot of chaos has been reduced for today. we shall now rest. 2023-08-21 04:43:24 +00:00
Andrej Karpathy 3868f732a4 and finally refactor the Sampler. things are starting to look a lot cleaner I think 2023-08-21 04:23:02 +00:00
Andrej Karpathy 8a377a1d31 refactor the Transformer (Config, Weights, RunState) into a single object, with build and free too 2023-08-21 03:55:12 +00:00
Andrej Karpathy ae2e4f8d88 name the tokenizer methods cleaner: encode and decode 2023-08-21 03:11:54 +00:00
atamyrat 0dd82158f6 removed transformers from requirements.txt, added error message 2023-08-21 06:07:29 +03:00
atamyrat 155475a523 Fix WQ and WK permutation in huggingface models 2023-08-21 05:16:11 +03:00
atamyrat d7704bdeaa mark ModelArgs.hidden_dim as optional and calculate as previously if not provided 2023-08-21 03:40:34 +03:00
atamyrat 09db52c69e Added huggingface model loader to export.py 2023-08-21 02:59:12 +03:00
Harry Gifford a72b3b0206 Update readme with suggestion on number of threads to use
Update the documentation to make suggestions on the number of threads. The performance difference can be very large. Also linked to the PyTorch docs which are relevant here.
2023-08-20 15:01:33 -07:00
Andrej Karpathy c74456f3f0 refactor step 1. the tokenizer, and all the other abstractions, are a total mess, refactoring things a bit 2023-08-20 18:18:23 +00:00
Andrej Karpathy 1e335a41cf remove freq_cis fields as they are not used anymore 2023-08-20 17:26:43 +00:00
Andrej Karpathy c0511de617 probindex should never have been part of RunState. i apologize for this failure of abstraction 2023-08-20 17:18:06 +00:00
Andrej 8c93c7a30e Merge pull request #322 from karpathy/feature/export
New model export (the code remains "dead" and legacy version is still the default behavior, so no breaking changes are introduced). The major benefit is a new export.py file, which we can use to centralize work on formatting: both imports and exports.
2023-08-20 10:08:32 -07:00
Andrej Karpathy 13dcee493a todos update 2023-08-20 17:02:22 +00:00
Andrej Karpathy f3db92a2dc use out_file.tell() instead of nbytes += arithmetic 2023-08-20 16:51:35 +00:00
Andrej Karpathy fa8dfd854e isolate read_checkpoint, because i'd like to now make it support both version 0 and version 1 2023-08-19 19:21:12 +00:00