Commit Graph

45 Commits

Author SHA1 Message Date
richinseattle cddb05d5b2 use ssize_t/int64 and 64bit version of ftell on windows 2023-07-29 22:54:27 -07:00
Andrej Karpathy fd68dd222f reshuffle blocks of code a bit 2023-07-28 05:37:43 +00:00
aegkmq 6ce28fb388 Merge branch 'master' into better-rng 2023-07-28 13:52:34 +09:00
Andrej Karpathy b4bb47bb7b big change: adding prompting. many LOC, but critical. ty @atamurad for the first draft, i ended up tuning it quite a bit. 2023-07-28 04:12:54 +00:00
Andrej Karpathy e5752e1fc9 strip leading whitespace 2023-07-27 22:59:19 +00:00
Andrej Karpathy 25b50ee0e2 small stylistic fixes and adjustments, fix bug in Makefile, and change the timing code to skip the first (slow) iteration 2023-07-27 22:42:08 +00:00
aegkmq 71200f3092 Fix random_f32 2023-07-28 00:35:59 +09:00
Aydyn Tairov acf1e18e8f remove second ifdefs for windows timing by introducing ported version of clock_gettime 2023-07-27 16:33:21 +01:00
aegkmq 1bdf5af743 Replace the rand() with a portable PRNG 2023-07-27 20:14:08 +09:00
Andrej Karpathy f19f50a744 stylistic changes for the windows support ifdefs 2023-07-27 06:08:40 +00:00
richinseattle 4a6b7a471d Include windows support header (for mmap) 2023-07-26 22:40:01 -07:00
Andrej Karpathy 0d18fa7780 Merge branch 'patch-2' of https://github.com/richinseattle/llama2.c into richinseattle-patch-2 2023-07-27 05:23:05 +00:00
richinseattle 37e8c20f4f Windows compat: Use GetTickCount for delta timer
Intentionally not including a windows header here to avoid merge conflict on include with mmap support. cl.exe doesn't complain, mingw warns.
2023-07-26 22:19:49 -07:00
richinseattle 539dc73196 fix whitespace 2023-07-26 22:12:32 -07:00
richinseattle 7f7a3b2d56 update openmp pragmas for MSVC compatibility
This has no negative impact on Linux and is in preparation for windows support. Windows compiles will not work without additional timer and mmap compatibility patches
2023-07-26 22:06:23 -07:00
Bernardo Ramos 57034480b6 add some code comments 2023-07-26 19:48:14 -03:00
aegkmq 8986005f23 Minor cleanup 2023-07-26 16:57:08 +09:00
aegkmq 36c522a0d8 Improve locality 2023-07-26 13:24:27 +09:00
Andrej 614bf91e5d Merge pull request #60 from emma-eva/patch-1
Fixed time_in_ms() compile time error (termux and neoterm)
2023-07-25 16:06:41 -07:00
Andrej Karpathy 05ee4cbf38 fix bug in timing - use steps not max seq len doh 2023-07-25 14:21:37 +00:00
Emma Eva 6ce91b1b3b Fixed time_in_ms() compile time error (termux and neoterm)
clang version 16.0.4
2023-07-25 12:12:40 +06:00
Andrej Karpathy c3e0d73bd2 we can inference Meta's Llama 2 7B, yay 2023-07-25 04:21:07 +00:00
Andrej Karpathy a1f6b4653e merge conflict resolve with imports 2023-07-25 01:58:46 +00:00
richinseattle b2857c6af2 Switch to using timespec_get() for cross OS compatibility 2023-07-24 16:31:38 -07:00
Andrej Karpathy e6e3f1322b candidate memmap implementation 2023-07-24 22:54:49 +00:00
richinseattle 2be7d7887b MSVC Compatibility fix for timer
use clock() instead of gettimeofday() for cross-platform compatibility
2023-07-24 15:22:20 -07:00
Andrej 669b75ddc8 Merge pull request #43 from krzysztof-jusiak/rmsnorm
Speed up rmsnorm by using sqrtf/expf
2023-07-24 14:13:49 -07:00
Andrej Karpathy 791be9d991 tweak argparse. fix steps=256, even if some models may support longer maximum seq_len. get rid of seed option for now, use temp=0.0 for deterministic behavior 2023-07-24 20:59:32 +00:00
Kris Jusiak c9b1f10124 Speed up rmsnorm by using sqrtf/expf
Problem:
- exp and sqrt are using double precision for operations which is not
  required.

Solution:
- Use expf and sqrtf intead.

Notes:
- Although it's using single precision doesn't seem to affect the
  result.

Results: ~ 10% improvement
  - before:  940 tok/s
  - after:  1020 tok/s
2023-07-24 13:06:27 -05:00
Franz Louis Cesista c9ad067c5d parallelize multi-head attention 2023-07-25 01:10:12 +08:00
Andrej d0ddf94cc3 Merge pull request #36 from hu-po/patch-1
typo
2023-07-24 07:27:36 -07:00
Andrej 228c4ea3ea Merge pull request #28 from SlyEcho/master
Fix tokenizer reading on Windows
2023-07-24 07:23:07 -07:00
hu-po d95c7617c6 typo 2023-07-24 07:35:12 -05:00
Henri Vasserman 4d637983ad Fix tokenizer reading on Windows 2023-07-24 11:08:29 +03:00
Kris Jusiak 0a0ca73c65 [openmp] 1.5x inference speedup
Problem:
- clock is CPU and doesn't work properly with parallel execution.
- perf execution is matmul x weights bound.

Solution:
- use gettimeofday instead.
- utilize openmp to parallelize matmul.

Note:
- if not compiled with -fopenmp the #pragma is ignored and single
  execution is performed.
- there are additional env variable to setup for openmp (optinally)
  to setup the number of threads, scheduler etc.

Benchmarks:
```
clang -Ofast -march=native  run.c  -lm  -o run          // achieved tok/s: 340.878828
clang -Ofast -fopenmp -march=native  run.c  -lm  -o run // achieved tok/s: 524.590164
```
2023-07-24 01:55:23 -05:00
Andrej Karpathy f6388c99c8 delete the copy function in favor of memcpy. sadly we have to import string.h now... 2023-07-24 05:10:55 +00:00
Andrej Karpathy 99354a85ce get rid of compiler warnings from ignoring return value of fread 2023-07-24 04:39:24 +00:00
Andrej Karpathy 6a61831e19 make init code much less sketchy 2023-07-24 04:22:32 +00:00
Andrej Karpathy 3bfa5665d1 delete the run_wrap file! yay. ty @python273 and @ggerganov for code snippets 2023-07-24 04:02:57 +00:00
Andrej Karpathy d7e2c46915 slight tweaks to softmax 2023-07-24 02:02:12 +00:00
Andrej 7d6208870e Merge pull request #19 from mcognetta/master
Simplify Softmax
2023-07-23 18:57:23 -07:00
Marco Cognetta 80679e24be simplify softmax 2023-07-24 10:51:46 +09:00
Junny dc3962f356 remove unused parameter 2023-07-23 15:33:29 -07:00
Andrej Karpathy 9414e7a45e tweaks and add a simple test 2023-07-23 14:52:08 +00:00
Andrej Karpathy 5b161abb9a somewhere ~20 hours later 2023-07-23 05:23:45 +00:00