Logo
Explore Help
Sign In
schihei/llama2.c
1
0
Fork 0
You've already forked llama2.c
Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity
443 Commits 5 Branches 0 Tags
master
Commit Graph

8 Commits

Author SHA1 Message Date
Andrej Karpathy 8417cb438d Merge branch 'utf8' of https://github.com/atamurad/llama2.c into feature/utf8 2023-08-15 00:18:53 +00:00
Andrej Karpathy ea4cedc588 add ability to export custom tokenizer to .bin format for run.c file 2023-08-13 02:00:19 +00:00
Andrej Karpathy 4c6f0af9ff add the ability to train a custom sentencepiece tokenizer with a given vocab_size, and pretok with it. some more changes still needed to merge this branch, in train.py and ofc run.c. did this in a sadly bit ugly, but fully backwards compatible way. basically when we use custom tokenizer we create a whole new directory structure for that 2023-08-11 03:58:22 +00:00
atamyrat c02865df30 prompt tokenizer improvements: utf8 support, add_dummy_prefix and byte_fallback options to match sentencepiece 2023-08-07 13:12:44 +03:00
Andrej Karpathy b4bb47bb7b big change: adding prompting. many LOC, but critical. ty @atamurad for the first draft, i ended up tuning it quite a bit. 2023-07-28 04:12:54 +00:00
Andrej Karpathy 3bfa5665d1 delete the run_wrap file! yay. ty @python273 and @ggerganov for code snippets 2023-07-24 04:02:57 +00:00
Andrej Karpathy 5baaf9df06 small format tweaks, get rid of prints in tokenizer 2023-07-23 17:09:23 +00:00
Andrej Karpathy 5b161abb9a somewhere ~20 hours later 2023-07-23 05:23:45 +00:00
Powered by Gitea Version: 1.26.2 Page: 17ms Template: 3ms
Auto
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API