Diego Marcos Segura
19cfbeca71
Fix typo in README.md
2023-08-24 19:46:43 -07:00
Andrej
d7cd98633d
add todo item to add a PyTorch Engine
2023-08-24 09:04:52 -07:00
Andrej Karpathy
c7a26264a2
Merge branch 'master' of github.com:karpathy/llama2.c
2023-08-24 03:10:18 +00:00
Andrej Karpathy
446c1c0df3
Merge branch 'janimo-train-vocab-python'
2023-08-24 03:10:07 +00:00
Andrej Karpathy
096325b66c
bring back num_threads
2023-08-24 03:09:55 +00:00
Andrej
90104db721
Merge pull request #348 from nehzata/clip_steps
...
Clip steps maximum value
2023-08-23 19:57:01 -07:00
Ali Nehzat
9bc72acab0
steps shouldn't exceed the model's seq_len either
2023-08-24 09:09:16 +10:00
Jani Monoses
fe9b9f2f15
Train vocab in Python
2023-08-23 19:10:28 +03:00
Andrej Karpathy
7ac65cb2c2
make decode safer and fix issue with skipping bad byte tokens
2023-08-23 01:08:31 +00:00
Andrej Karpathy
4b3e66021a
lol text
2023-08-23 00:26:47 +00:00
Andrej Karpathy
d1eb18b8ec
add BOS and EOS function to the Tokenizer as we start to converge closer to the Llama 2 code from Meta, and as we're about to add the Chat capability
2023-08-23 00:08:22 +00:00
Andrej Karpathy
d26a499207
absorb our rng state into the Sampler. I feel that this is correct because it makes our use of entropy very explicit and localized, and the sampler is now well-contained without any global state. Code is increasingly more beautiful.
2023-08-22 03:22:56 +00:00
Andrej Karpathy
ac6cf8d6e8
tweak todo list
2023-08-22 02:48:51 +00:00
Andrej Karpathy
ad7a1ef525
clean up swiglu a little bit
2023-08-22 02:32:21 +00:00
Andrej Karpathy
0e362f735f
and finallygit add run.c split off the generate function. alongside it will come a chat function. we are close
2023-08-22 02:22:36 +00:00
Andrej Karpathy
d73b917d3b
hide temperature and topp into the sampler, it's a little bit less flexible but a little bit more cleaner
2023-08-22 02:17:51 +00:00
Andrej Karpathy
379f083b85
make sorted vocab a buffer of Tokenizer
2023-08-22 01:56:51 +00:00
Andrej
5eaca535cd
Merge pull request #335 from ozabluda/ozabluda-patch-5
...
Remove unneeded check of free(NULL)
2023-08-21 18:16:07 -07:00
Andrej Karpathy
83287ff254
fix steps=0 is max context
2023-08-22 01:15:00 +00:00
Oleg Zabluda
c2834c8a1f
Remove unneeded check of free(NULL)
...
Passing NULL to free() is totally allowed
2023-08-21 10:54:53 -07:00
Andrej
ee95b1bf29
Merge pull request #315 from davidar/vocab_source
...
Fix vocab_source in sample.py
2023-08-21 08:26:28 -07:00
Andrej Karpathy
d02e0c90d8
Merge branch 'rdentato-patch-check-params'
2023-08-21 15:17:37 +00:00
Andrej Karpathy
33d94f60a5
parameter validation cleanup
2023-08-21 15:17:14 +00:00
Remo Dentato
2d972f1763
Merge branch 'karpathy:master' into patch-check-params
2023-08-21 17:02:42 +02:00
Andrej
8a3ea7b433
Merge pull request #329 from atamurad/import_meta
...
Moved export_meta_llama_bin.py to new export.py
2023-08-21 07:34:32 -07:00
atamyrat
61c26d5392
Updated README to replace export_meta_llama_bin.py script with export.py
2023-08-21 14:24:01 +03:00
atamyrat
36a78af5e1
tested load_meta_model() in export.py, deleting old export_meta_llama_bin.py file
2023-08-21 14:19:56 +03:00
atamyrat
de005474d3
Added load_meta_model() to export.py
2023-08-21 14:13:47 +03:00
rdentato
4444575c4e
Added check of generation parameters.
2023-08-21 06:43:39 +00:00
Andrej Karpathy
dd61b13e57
delete the save_torchscript export file, but copy its content to the new export.py for the future maybe
2023-08-21 05:09:06 +00:00
Andrej Karpathy
ea44f53568
now that the export.py HF functionality is in master, we can delete this file, and update the readme
2023-08-21 04:58:19 +00:00
Andrej
801c68f5a1
Merge pull request #326 from atamurad/import_hf
...
Added huggingface model loader/importer to export.py
2023-08-20 21:53:17 -07:00
Andrej
74a68eeb35
Merge pull request #325 from HarryGifford/users/hegi/update-readme-threading
...
Update readme with suggestion on number of threads to use
2023-08-20 21:50:26 -07:00
Andrej Karpathy
288b3cec09
remove dagger in the eyeball
2023-08-21 04:47:49 +00:00
Andrej Karpathy
14275bd623
minor clean. i think a lot of chaos has been reduced for today. we shall now rest.
2023-08-21 04:43:24 +00:00
Andrej Karpathy
3868f732a4
and finally refactor the Sampler. things are starting to look a lot cleaner I think
2023-08-21 04:23:02 +00:00
Andrej Karpathy
8a377a1d31
refactor the Transformer (Config, Weights, RunState) into a single object, with build and free too
2023-08-21 03:55:12 +00:00
Andrej Karpathy
ae2e4f8d88
name the tokenizer methods cleaner: encode and decode
2023-08-21 03:11:54 +00:00
atamyrat
0dd82158f6
removed transformers from requirements.txt, added error message
2023-08-21 06:07:29 +03:00
atamyrat
155475a523
Fix WQ and WK permutation in huggingface models
2023-08-21 05:16:11 +03:00
atamyrat
d7704bdeaa
mark ModelArgs.hidden_dim as optional and calculate as previously if not provided
2023-08-21 03:40:34 +03:00
atamyrat
09db52c69e
Added huggingface model loader to export.py
2023-08-21 02:59:12 +03:00
Harry Gifford
a72b3b0206
Update readme with suggestion on number of threads to use
...
Update the documentation to make suggestions on the number of threads. The performance difference can be very large. Also linked to the PyTorch docs which are relevant here.
2023-08-20 15:01:33 -07:00
Andrej Karpathy
c74456f3f0
refactor step 1. the tokenizer, and all the other abstractions, are a total mess, refactoring things a bit
2023-08-20 18:18:23 +00:00
Andrej Karpathy
1e335a41cf
remove freq_cis fields as they are not used anymore
2023-08-20 17:26:43 +00:00
Andrej Karpathy
c0511de617
probindex should never have been part of RunState. i apologize for this failure of abstraction
2023-08-20 17:18:06 +00:00
Andrej
8c93c7a30e
Merge pull request #322 from karpathy/feature/export
...
New model export (the code remains "dead" and legacy version is still the default behavior, so no breaking changes are introduced). The major benefit is a new export.py file, which we can use to centralize work on formatting: both imports and exports.
2023-08-20 10:08:32 -07:00
Andrej Karpathy
13dcee493a
todos update
2023-08-20 17:02:22 +00:00
Andrej Karpathy
f3db92a2dc
use out_file.tell() instead of nbytes += arithmetic
2023-08-20 16:51:35 +00:00
Andrej Karpathy
fa8dfd854e
isolate read_checkpoint, because i'd like to now make it support both version 0 and version 1
2023-08-19 19:21:12 +00:00