Andrej
4a7a62bd21
Merge branch 'master' into feature/chat
2023-08-25 07:58:33 -07:00
Andrej Karpathy
fbe324fc5a
adjust things a bit
2023-08-25 14:54:05 +00:00
Andrej Karpathy
3d787b2463
ok getting closer, and manually verified correctness of the schema matching python. still some weirdness in the printing to chase down, and also have to tune the buffer lengths and make them sensible and such
2023-08-24 04:31:06 +00:00
Andrej Karpathy
40fb902cf0
fix chat format bug i think
2023-08-24 03:33:44 +00:00
Ali Nehzat
9bc72acab0
steps shouldn't exceed the model's seq_len either
2023-08-24 09:09:16 +10:00
Andrej Karpathy
c5e0e7fce4
attempt at chat function, but it was 8AM and I didn't have coffee yet. Seems to work but it's probably subtly broken or too complex. version 1 only, lots of hard-coded non-sensical buffer sizes. Have to go to work now
2023-08-23 16:27:48 +00:00
Andrej Karpathy
7ac65cb2c2
make decode safer and fix issue with skipping bad byte tokens
2023-08-23 01:08:31 +00:00
Andrej Karpathy
d1eb18b8ec
add BOS and EOS function to the Tokenizer as we start to converge closer to the Llama 2 code from Meta, and as we're about to add the Chat capability
2023-08-23 00:08:22 +00:00
Andrej Karpathy
d26a499207
absorb our rng state into the Sampler. I feel that this is correct because it makes our use of entropy very explicit and localized, and the sampler is now well-contained without any global state. Code is increasingly more beautiful.
2023-08-22 03:22:56 +00:00
Andrej Karpathy
ad7a1ef525
clean up swiglu a little bit
2023-08-22 02:32:21 +00:00
Andrej Karpathy
0e362f735f
and finallygit add run.c split off the generate function. alongside it will come a chat function. we are close
2023-08-22 02:22:36 +00:00
Andrej Karpathy
d73b917d3b
hide temperature and topp into the sampler, it's a little bit less flexible but a little bit more cleaner
2023-08-22 02:17:51 +00:00
Andrej Karpathy
379f083b85
make sorted vocab a buffer of Tokenizer
2023-08-22 01:56:51 +00:00
Andrej
5eaca535cd
Merge pull request #335 from ozabluda/ozabluda-patch-5
...
Remove unneeded check of free(NULL)
2023-08-21 18:16:07 -07:00
Andrej Karpathy
83287ff254
fix steps=0 is max context
2023-08-22 01:15:00 +00:00
Oleg Zabluda
c2834c8a1f
Remove unneeded check of free(NULL)
...
Passing NULL to free() is totally allowed
2023-08-21 10:54:53 -07:00
Andrej Karpathy
33d94f60a5
parameter validation cleanup
2023-08-21 15:17:14 +00:00
rdentato
4444575c4e
Added check of generation parameters.
2023-08-21 06:43:39 +00:00
Andrej Karpathy
288b3cec09
remove dagger in the eyeball
2023-08-21 04:47:49 +00:00
Andrej Karpathy
14275bd623
minor clean. i think a lot of chaos has been reduced for today. we shall now rest.
2023-08-21 04:43:24 +00:00
Andrej Karpathy
3868f732a4
and finally refactor the Sampler. things are starting to look a lot cleaner I think
2023-08-21 04:23:02 +00:00
Andrej Karpathy
8a377a1d31
refactor the Transformer (Config, Weights, RunState) into a single object, with build and free too
2023-08-21 03:55:12 +00:00
Andrej Karpathy
ae2e4f8d88
name the tokenizer methods cleaner: encode and decode
2023-08-21 03:11:54 +00:00
Andrej Karpathy
c74456f3f0
refactor step 1. the tokenizer, and all the other abstractions, are a total mess, refactoring things a bit
2023-08-20 18:18:23 +00:00
Andrej Karpathy
1e335a41cf
remove freq_cis fields as they are not used anymore
2023-08-20 17:26:43 +00:00
Andrej Karpathy
c0511de617
probindex should never have been part of RunState. i apologize for this failure of abstraction
2023-08-20 17:18:06 +00:00
Andrej Karpathy
fa8dfd854e
isolate read_checkpoint, because i'd like to now make it support both version 0 and version 1
2023-08-19 19:21:12 +00:00
Andrej Karpathy
bd182289c5
calculate the freq_cis online, no need to write/read them to/from checkpoints
2023-08-17 04:13:13 +00:00
rdentato
55e60740f5
Added space to str_buffer in case max_token_length is 1.
2023-08-16 07:58:07 +00:00
rdentato
befe4867b3
minimal protection against invalid UTF8 encoding.
2023-08-16 07:42:53 +00:00
Andrej Karpathy
ca67253f28
smallfix: not sure what the point of this indirection was
2023-08-15 16:09:33 +00:00
Andrej Karpathy
4c63c5608d
shorten top comment on run.c file
2023-08-15 16:07:48 +00:00
Andrej Karpathy
a47f9b3969
collapsing copy paste code because it's driving my ocd crazy
2023-08-15 16:03:11 +00:00
Andrej Karpathy
a9a0628c92
thoroughly commented the UTF-8 byte reading code
2023-08-15 02:18:49 +00:00
Andrej Karpathy
d459fd4243
add back careful processing of the byte tokens
2023-08-15 01:42:33 +00:00
Andrej Karpathy
4bf36ecc17
get rid of the special byte decoding logic
2023-08-15 01:04:10 +00:00
Andrej Karpathy
8417cb438d
Merge branch 'utf8' of https://github.com/atamurad/llama2.c into feature/utf8
2023-08-15 00:18:53 +00:00
Andrej Karpathy
32c1ff97fb
missed p->dim to kv_dim for k,v vectors. we're not doing anything wrong we're just being wasteful with memory. thanks @xefoci7612 for pointing out
2023-08-14 14:52:07 +00:00
Andrej Karpathy
45afa91dca
the accum function has been bothering me, there is no real need to add a function here, it does something trivial and is only used twice, scrap
2023-08-14 02:54:27 +00:00
Andrej Karpathy
854c97b660
turn topp 0.9 back on by default thanks to recent PR contributions truncating before quicksort
2023-08-14 00:12:45 +00:00
Andrej
4a2c375df9
Merge pull request #276 from jrudolph/improve-top-p
...
optimize sample_topp by filtering out small value elements up front
2023-08-13 17:05:38 -07:00
atamyrat
36b54321e5
bugfix: allocate +1 in tokens buffer for dummy whitespace
2023-08-13 23:23:32 +03:00
Andrej Karpathy
38bfac90a8
bigchange: add multiquery support in run.c. we can now train and inference multiquery models (where n_kv_heads < n_heads). this also means that we, in principle, support Llama 2 34B and 70B models, which are multiquery
2023-08-13 19:34:05 +00:00
atamyrat
daa9fd9b8a
sort vocabulary for faster lookup with bsearch()
2023-08-13 15:02:11 +03:00
Andrej Karpathy
f5fc0c245f
final piece: run.c support for new tokenizer, super ez
2023-08-13 02:12:13 +00:00
Johannes Rudolph
d421a95b2b
optimize sample_topp by filtering out small value elements up front
...
This works because we know that in worst case only 1 element will be selected
and therefore the remaining (n-1) elements have to split the remaining (1-topp)
probability. Probabilities smaller than that cannot be selected and can
be filtered out up front.
2023-08-12 20:31:19 +02:00
Andrej Karpathy
c42641205f
turn off topp sampling by default because it is a bit too slow to be the default. it is likely that turning it on, e.g. -p 0.9 is midlly higher quality and safer samples, but this comes at a cost of too much performance in double digit percent sometimes, for it to be on by default i think...
2023-08-10 15:23:05 +00:00
atamyrat
c02865df30
prompt tokenizer improvements: utf8 support, add_dummy_prefix and byte_fallback options to match sentencepiece
2023-08-07 13:12:44 +03:00
rdentato
ff6a2f0a7a
Reset the #include <omp.h>
2023-08-07 07:28:03 +00:00
rdentato
e49c16caa5
Changed how rng_seed is handled. Now 0 is treated as time(NULL).
2023-08-07 06:51:57 +00:00