108 Commits

Author SHA1 Message Date
Andrej 4a7a62bd21 Merge branch 'master' into feature/chat 2023-08-25 07:58:33 -07:00
Andrej Karpathy fbe324fc5a adjust things a bit 2023-08-25 14:54:05 +00:00
Andrej Karpathy 3d787b2463 ok getting closer, and manually verified correctness of the schema matching python. still some weirdness in the printing to chase down, and also have to tune the buffer lengths and make them sensible and such 2023-08-24 04:31:06 +00:00
Andrej Karpathy 40fb902cf0 fix chat format bug i think 2023-08-24 03:33:44 +00:00
Ali Nehzat 9bc72acab0 steps shouldn't exceed the model's seq_len either 2023-08-24 09:09:16 +10:00
Andrej Karpathy c5e0e7fce4 attempt at chat function, but it was 8AM and I didn't have coffee yet. Seems to work but it's probably subtly broken or too complex. version 1 only, lots of hard-coded non-sensical buffer sizes. Have to go to work now 2023-08-23 16:27:48 +00:00
Andrej Karpathy 7ac65cb2c2 make decode safer and fix issue with skipping bad byte tokens 2023-08-23 01:08:31 +00:00
Andrej Karpathy d1eb18b8ec add BOS and EOS function to the Tokenizer as we start to converge closer to the Llama 2 code from Meta, and as we're about to add the Chat capability 2023-08-23 00:08:22 +00:00
Andrej Karpathy d26a499207 absorb our rng state into the Sampler. I feel that this is correct because it makes our use of entropy very explicit and localized, and the sampler is now well-contained without any global state. Code is increasingly more beautiful. 2023-08-22 03:22:56 +00:00
Andrej Karpathy ad7a1ef525 clean up swiglu a little bit 2023-08-22 02:32:21 +00:00
Andrej Karpathy 0e362f735f and finallygit add run.c split off the generate function. alongside it will come a chat function. we are close 2023-08-22 02:22:36 +00:00
Andrej Karpathy d73b917d3b hide temperature and topp into the sampler, it's a little bit less flexible but a little bit more cleaner 2023-08-22 02:17:51 +00:00
Andrej Karpathy 379f083b85 make sorted vocab a buffer of Tokenizer 2023-08-22 01:56:51 +00:00
Andrej 5eaca535cd Merge pull request #335 from ozabluda/ozabluda-patch-5
Remove unneeded check of free(NULL)
2023-08-21 18:16:07 -07:00
Andrej Karpathy 83287ff254 fix steps=0 is max context 2023-08-22 01:15:00 +00:00
Oleg Zabluda c2834c8a1f Remove unneeded check of free(NULL)
Passing NULL to free() is totally allowed
2023-08-21 10:54:53 -07:00
Andrej Karpathy 33d94f60a5 parameter validation cleanup 2023-08-21 15:17:14 +00:00
rdentato 4444575c4e Added check of generation parameters. 2023-08-21 06:43:39 +00:00
Andrej Karpathy 288b3cec09 remove dagger in the eyeball 2023-08-21 04:47:49 +00:00
Andrej Karpathy 14275bd623 minor clean. i think a lot of chaos has been reduced for today. we shall now rest. 2023-08-21 04:43:24 +00:00
Andrej Karpathy 3868f732a4 and finally refactor the Sampler. things are starting to look a lot cleaner I think 2023-08-21 04:23:02 +00:00
Andrej Karpathy 8a377a1d31 refactor the Transformer (Config, Weights, RunState) into a single object, with build and free too 2023-08-21 03:55:12 +00:00
Andrej Karpathy ae2e4f8d88 name the tokenizer methods cleaner: encode and decode 2023-08-21 03:11:54 +00:00
Andrej Karpathy c74456f3f0 refactor step 1. the tokenizer, and all the other abstractions, are a total mess, refactoring things a bit 2023-08-20 18:18:23 +00:00
Andrej Karpathy 1e335a41cf remove freq_cis fields as they are not used anymore 2023-08-20 17:26:43 +00:00
Andrej Karpathy c0511de617 probindex should never have been part of RunState. i apologize for this failure of abstraction 2023-08-20 17:18:06 +00:00
Andrej Karpathy fa8dfd854e isolate read_checkpoint, because i'd like to now make it support both version 0 and version 1 2023-08-19 19:21:12 +00:00
Andrej Karpathy bd182289c5 calculate the freq_cis online, no need to write/read them to/from checkpoints 2023-08-17 04:13:13 +00:00
rdentato 55e60740f5 Added space to str_buffer in case max_token_length is 1. 2023-08-16 07:58:07 +00:00
rdentato befe4867b3 minimal protection against invalid UTF8 encoding. 2023-08-16 07:42:53 +00:00
Andrej Karpathy ca67253f28 smallfix: not sure what the point of this indirection was 2023-08-15 16:09:33 +00:00
Andrej Karpathy 4c63c5608d shorten top comment on run.c file 2023-08-15 16:07:48 +00:00
Andrej Karpathy a47f9b3969 collapsing copy paste code because it's driving my ocd crazy 2023-08-15 16:03:11 +00:00
Andrej Karpathy a9a0628c92 thoroughly commented the UTF-8 byte reading code 2023-08-15 02:18:49 +00:00
Andrej Karpathy d459fd4243 add back careful processing of the byte tokens 2023-08-15 01:42:33 +00:00
Andrej Karpathy 4bf36ecc17 get rid of the special byte decoding logic 2023-08-15 01:04:10 +00:00
Andrej Karpathy 8417cb438d Merge branch 'utf8' of https://github.com/atamurad/llama2.c into feature/utf8 2023-08-15 00:18:53 +00:00
Andrej Karpathy 32c1ff97fb missed p->dim to kv_dim for k,v vectors. we're not doing anything wrong we're just being wasteful with memory. thanks @xefoci7612 for pointing out 2023-08-14 14:52:07 +00:00
Andrej Karpathy 45afa91dca the accum function has been bothering me, there is no real need to add a function here, it does something trivial and is only used twice, scrap 2023-08-14 02:54:27 +00:00
Andrej Karpathy 854c97b660 turn topp 0.9 back on by default thanks to recent PR contributions truncating before quicksort 2023-08-14 00:12:45 +00:00
Andrej 4a2c375df9 Merge pull request #276 from jrudolph/improve-top-p
optimize sample_topp by filtering out small value elements up front
2023-08-13 17:05:38 -07:00
atamyrat 36b54321e5 bugfix: allocate +1 in tokens buffer for dummy whitespace 2023-08-13 23:23:32 +03:00
Andrej Karpathy 38bfac90a8 bigchange: add multiquery support in run.c. we can now train and inference multiquery models (where n_kv_heads < n_heads). this also means that we, in principle, support Llama 2 34B and 70B models, which are multiquery 2023-08-13 19:34:05 +00:00
atamyrat daa9fd9b8a sort vocabulary for faster lookup with bsearch() 2023-08-13 15:02:11 +03:00
Andrej Karpathy f5fc0c245f final piece: run.c support for new tokenizer, super ez 2023-08-13 02:12:13 +00:00
Johannes Rudolph d421a95b2b optimize sample_topp by filtering out small value elements up front
This works because we know that in worst case only 1 element will be selected
and therefore the remaining (n-1) elements have to split the remaining (1-topp)
probability. Probabilities smaller than that cannot be selected and can
be filtered out up front.
2023-08-12 20:31:19 +02:00
Andrej Karpathy c42641205f turn off topp sampling by default because it is a bit too slow to be the default. it is likely that turning it on, e.g. -p 0.9 is midlly higher quality and safer samples, but this comes at a cost of too much performance in double digit percent sometimes, for it to be on by default i think... 2023-08-10 15:23:05 +00:00
atamyrat c02865df30 prompt tokenizer improvements: utf8 support, add_dummy_prefix and byte_fallback options to match sentencepiece 2023-08-07 13:12:44 +03:00
rdentato ff6a2f0a7a Reset the #include <omp.h> 2023-08-07 07:28:03 +00:00
rdentato e49c16caa5 Changed how rng_seed is handled. Now 0 is treated as time(NULL). 2023-08-07 06:51:57 +00:00