llama2.c

Author	SHA1	Message	Date
Diego Marcos Segura	19cfbeca71	Fix typo in README.md	2023-08-24 19:46:43 -07:00
Andrej	d7cd98633d	add todo item to add a PyTorch Engine	2023-08-24 09:04:52 -07:00
Andrej Karpathy	c7a26264a2	Merge branch 'master' of github.com:karpathy/llama2.c	2023-08-24 03:10:18 +00:00
Andrej Karpathy	446c1c0df3	Merge branch 'janimo-train-vocab-python'	2023-08-24 03:10:07 +00:00
Andrej Karpathy	096325b66c	bring back num_threads	2023-08-24 03:09:55 +00:00
Andrej	90104db721	Merge pull request #348 from nehzata/clip_steps Clip steps maximum value	2023-08-23 19:57:01 -07:00
Ali Nehzat	9bc72acab0	steps shouldn't exceed the model's seq_len either	2023-08-24 09:09:16 +10:00
Jani Monoses	fe9b9f2f15	Train vocab in Python	2023-08-23 19:10:28 +03:00
Andrej Karpathy	7ac65cb2c2	make decode safer and fix issue with skipping bad byte tokens	2023-08-23 01:08:31 +00:00
Andrej Karpathy	4b3e66021a	lol text	2023-08-23 00:26:47 +00:00
Andrej Karpathy	d1eb18b8ec	add BOS and EOS function to the Tokenizer as we start to converge closer to the Llama 2 code from Meta, and as we're about to add the Chat capability	2023-08-23 00:08:22 +00:00
Andrej Karpathy	d26a499207	absorb our rng state into the Sampler. I feel that this is correct because it makes our use of entropy very explicit and localized, and the sampler is now well-contained without any global state. Code is increasingly more beautiful.	2023-08-22 03:22:56 +00:00
Andrej Karpathy	ac6cf8d6e8	tweak todo list	2023-08-22 02:48:51 +00:00
Andrej Karpathy	ad7a1ef525	clean up swiglu a little bit	2023-08-22 02:32:21 +00:00
Andrej Karpathy	0e362f735f	and finallygit add run.c split off the generate function. alongside it will come a chat function. we are close	2023-08-22 02:22:36 +00:00
Andrej Karpathy	d73b917d3b	hide temperature and topp into the sampler, it's a little bit less flexible but a little bit more cleaner	2023-08-22 02:17:51 +00:00
Andrej Karpathy	379f083b85	make sorted vocab a buffer of Tokenizer	2023-08-22 01:56:51 +00:00
Andrej	5eaca535cd	Merge pull request #335 from ozabluda/ozabluda-patch-5 Remove unneeded check of free(NULL)	2023-08-21 18:16:07 -07:00
Andrej Karpathy	83287ff254	fix steps=0 is max context	2023-08-22 01:15:00 +00:00
Oleg Zabluda	c2834c8a1f	Remove unneeded check of free(NULL) Passing NULL to free() is totally allowed	2023-08-21 10:54:53 -07:00
Andrej	ee95b1bf29	Merge pull request #315 from davidar/vocab_source Fix vocab_source in sample.py	2023-08-21 08:26:28 -07:00
Andrej Karpathy	d02e0c90d8	Merge branch 'rdentato-patch-check-params'	2023-08-21 15:17:37 +00:00
Andrej Karpathy	33d94f60a5	parameter validation cleanup	2023-08-21 15:17:14 +00:00
Remo Dentato	2d972f1763	Merge branch 'karpathy:master' into patch-check-params	2023-08-21 17:02:42 +02:00
Andrej	8a3ea7b433	Merge pull request #329 from atamurad/import_meta Moved export_meta_llama_bin.py to new export.py	2023-08-21 07:34:32 -07:00
atamyrat	61c26d5392	Updated README to replace export_meta_llama_bin.py script with export.py	2023-08-21 14:24:01 +03:00
atamyrat	36a78af5e1	tested load_meta_model() in export.py, deleting old export_meta_llama_bin.py file	2023-08-21 14:19:56 +03:00
atamyrat	de005474d3	Added load_meta_model() to export.py	2023-08-21 14:13:47 +03:00
rdentato	4444575c4e	Added check of generation parameters.	2023-08-21 06:43:39 +00:00
Andrej Karpathy	dd61b13e57	delete the save_torchscript export file, but copy its content to the new export.py for the future maybe	2023-08-21 05:09:06 +00:00
Andrej Karpathy	ea44f53568	now that the export.py HF functionality is in master, we can delete this file, and update the readme	2023-08-21 04:58:19 +00:00
Andrej	801c68f5a1	Merge pull request #326 from atamurad/import_hf Added huggingface model loader/importer to export.py	2023-08-20 21:53:17 -07:00
Andrej	74a68eeb35	Merge pull request #325 from HarryGifford/users/hegi/update-readme-threading Update readme with suggestion on number of threads to use	2023-08-20 21:50:26 -07:00
Andrej Karpathy	288b3cec09	remove dagger in the eyeball	2023-08-21 04:47:49 +00:00
Andrej Karpathy	14275bd623	minor clean. i think a lot of chaos has been reduced for today. we shall now rest.	2023-08-21 04:43:24 +00:00
Andrej Karpathy	3868f732a4	and finally refactor the Sampler. things are starting to look a lot cleaner I think	2023-08-21 04:23:02 +00:00
Andrej Karpathy	8a377a1d31	refactor the Transformer (Config, Weights, RunState) into a single object, with build and free too	2023-08-21 03:55:12 +00:00
Andrej Karpathy	ae2e4f8d88	name the tokenizer methods cleaner: encode and decode	2023-08-21 03:11:54 +00:00
atamyrat	0dd82158f6	removed transformers from requirements.txt, added error message	2023-08-21 06:07:29 +03:00
atamyrat	155475a523	Fix WQ and WK permutation in huggingface models	2023-08-21 05:16:11 +03:00
atamyrat	d7704bdeaa	mark ModelArgs.hidden_dim as optional and calculate as previously if not provided	2023-08-21 03:40:34 +03:00
atamyrat	09db52c69e	Added huggingface model loader to export.py	2023-08-21 02:59:12 +03:00
Harry Gifford	a72b3b0206	Update readme with suggestion on number of threads to use Update the documentation to make suggestions on the number of threads. The performance difference can be very large. Also linked to the PyTorch docs which are relevant here.	2023-08-20 15:01:33 -07:00
Andrej Karpathy	c74456f3f0	refactor step 1. the tokenizer, and all the other abstractions, are a total mess, refactoring things a bit	2023-08-20 18:18:23 +00:00
Andrej Karpathy	1e335a41cf	remove freq_cis fields as they are not used anymore	2023-08-20 17:26:43 +00:00
Andrej Karpathy	c0511de617	probindex should never have been part of RunState. i apologize for this failure of abstraction	2023-08-20 17:18:06 +00:00
Andrej	8c93c7a30e	Merge pull request #322 from karpathy/feature/export New model export (the code remains "dead" and legacy version is still the default behavior, so no breaking changes are introduced). The major benefit is a new export.py file, which we can use to centralize work on formatting: both imports and exports.	2023-08-20 10:08:32 -07:00
Andrej Karpathy	13dcee493a	todos update	2023-08-20 17:02:22 +00:00
Andrej Karpathy	f3db92a2dc	use out_file.tell() instead of nbytes += arithmetic	2023-08-20 16:51:35 +00:00
Andrej Karpathy	fa8dfd854e	isolate read_checkpoint, because i'd like to now make it support both version 0 and version 1	2023-08-19 19:21:12 +00:00

1 2 3 4 5 ...

421 Commits