llama2.c

Author	SHA1	Message	Date
Andrej	614bf91e5d	Merge pull request #60 from emma-eva/patch-1 Fixed time_in_ms() compile time error (termux and neoterm)	2023-07-25 16:06:41 -07:00
Andrej Karpathy	05ee4cbf38	fix bug in timing - use steps not max seq len doh	2023-07-25 14:21:37 +00:00
Emma Eva	6ce91b1b3b	Fixed time_in_ms() compile time error (termux and neoterm) clang version 16.0.4	2023-07-25 12:12:40 +06:00
Andrej Karpathy	c3e0d73bd2	we can inference Meta's Llama 2 7B, yay	2023-07-25 04:21:07 +00:00
Andrej Karpathy	a1f6b4653e	merge conflict resolve with imports	2023-07-25 01:58:46 +00:00
richinseattle	b2857c6af2	Switch to using timespec_get() for cross OS compatibility	2023-07-24 16:31:38 -07:00
Andrej Karpathy	e6e3f1322b	candidate memmap implementation	2023-07-24 22:54:49 +00:00
richinseattle	2be7d7887b	MSVC Compatibility fix for timer use clock() instead of gettimeofday() for cross-platform compatibility	2023-07-24 15:22:20 -07:00
Andrej	669b75ddc8	Merge pull request #43 from krzysztof-jusiak/rmsnorm Speed up rmsnorm by using sqrtf/expf	2023-07-24 14:13:49 -07:00
Andrej Karpathy	791be9d991	tweak argparse. fix steps=256, even if some models may support longer maximum seq_len. get rid of seed option for now, use temp=0.0 for deterministic behavior	2023-07-24 20:59:32 +00:00
Kris Jusiak	c9b1f10124	Speed up rmsnorm by using sqrtf/expf Problem: - exp and sqrt are using double precision for operations which is not required. Solution: - Use expf and sqrtf intead. Notes: - Although it's using single precision doesn't seem to affect the result. Results: ~ 10% improvement - before: 940 tok/s - after: 1020 tok/s	2023-07-24 13:06:27 -05:00
Franz Louis Cesista	c9ad067c5d	parallelize multi-head attention	2023-07-25 01:10:12 +08:00
Andrej	d0ddf94cc3	Merge pull request #36 from hu-po/patch-1 typo	2023-07-24 07:27:36 -07:00
Andrej	228c4ea3ea	Merge pull request #28 from SlyEcho/master Fix tokenizer reading on Windows	2023-07-24 07:23:07 -07:00
hu-po	d95c7617c6	typo	2023-07-24 07:35:12 -05:00
Henri Vasserman	4d637983ad	Fix tokenizer reading on Windows	2023-07-24 11:08:29 +03:00
Kris Jusiak	0a0ca73c65	[openmp] 1.5x inference speedup Problem: - clock is CPU and doesn't work properly with parallel execution. - perf execution is matmul x weights bound. Solution: - use gettimeofday instead. - utilize openmp to parallelize matmul. Note: - if not compiled with -fopenmp the #pragma is ignored and single execution is performed. - there are additional env variable to setup for openmp (optinally) to setup the number of threads, scheduler etc. Benchmarks: ``` clang -Ofast -march=native run.c -lm -o run // achieved tok/s: 340.878828 clang -Ofast -fopenmp -march=native run.c -lm -o run // achieved tok/s: 524.590164 ```	2023-07-24 01:55:23 -05:00
Andrej Karpathy	f6388c99c8	delete the copy function in favor of memcpy. sadly we have to import string.h now...	2023-07-24 05:10:55 +00:00
Andrej Karpathy	99354a85ce	get rid of compiler warnings from ignoring return value of fread	2023-07-24 04:39:24 +00:00
Andrej Karpathy	6a61831e19	make init code much less sketchy	2023-07-24 04:22:32 +00:00
Andrej Karpathy	3bfa5665d1	delete the run_wrap file! yay. ty @python273 and @ggerganov for code snippets	2023-07-24 04:02:57 +00:00
Andrej Karpathy	d7e2c46915	slight tweaks to softmax	2023-07-24 02:02:12 +00:00
Andrej	7d6208870e	Merge pull request #19 from mcognetta/master Simplify Softmax	2023-07-23 18:57:23 -07:00
Marco Cognetta	80679e24be	simplify softmax	2023-07-24 10:51:46 +09:00
Junny	dc3962f356	remove unused parameter	2023-07-23 15:33:29 -07:00
Andrej Karpathy	9414e7a45e	tweaks and add a simple test	2023-07-23 14:52:08 +00:00
Andrej Karpathy	5b161abb9a	somewhere ~20 hours later	2023-07-23 05:23:45 +00:00

27 Commits