Commit Graph

19 Commits

Author SHA1 Message Date
Andrej 669b75ddc8 Merge pull request #43 from krzysztof-jusiak/rmsnorm
Speed up rmsnorm by using sqrtf/expf
2023-07-24 14:13:49 -07:00
Andrej Karpathy 791be9d991 tweak argparse. fix steps=256, even if some models may support longer maximum seq_len. get rid of seed option for now, use temp=0.0 for deterministic behavior 2023-07-24 20:59:32 +00:00
Kris Jusiak c9b1f10124 Speed up rmsnorm by using sqrtf/expf
Problem:
- exp and sqrt are using double precision for operations which is not
  required.

Solution:
- Use expf and sqrtf intead.

Notes:
- Although it's using single precision doesn't seem to affect the
  result.

Results: ~ 10% improvement
  - before:  940 tok/s
  - after:  1020 tok/s
2023-07-24 13:06:27 -05:00
Franz Louis Cesista c9ad067c5d parallelize multi-head attention 2023-07-25 01:10:12 +08:00
Andrej d0ddf94cc3 Merge pull request #36 from hu-po/patch-1
typo
2023-07-24 07:27:36 -07:00
Andrej 228c4ea3ea Merge pull request #28 from SlyEcho/master
Fix tokenizer reading on Windows
2023-07-24 07:23:07 -07:00
hu-po d95c7617c6 typo 2023-07-24 07:35:12 -05:00
Henri Vasserman 4d637983ad Fix tokenizer reading on Windows 2023-07-24 11:08:29 +03:00
Kris Jusiak 0a0ca73c65 [openmp] 1.5x inference speedup
Problem:
- clock is CPU and doesn't work properly with parallel execution.
- perf execution is matmul x weights bound.

Solution:
- use gettimeofday instead.
- utilize openmp to parallelize matmul.

Note:
- if not compiled with -fopenmp the #pragma is ignored and single
  execution is performed.
- there are additional env variable to setup for openmp (optinally)
  to setup the number of threads, scheduler etc.

Benchmarks:
```
clang -Ofast -march=native  run.c  -lm  -o run          // achieved tok/s: 340.878828
clang -Ofast -fopenmp -march=native  run.c  -lm  -o run // achieved tok/s: 524.590164
```
2023-07-24 01:55:23 -05:00
Andrej Karpathy f6388c99c8 delete the copy function in favor of memcpy. sadly we have to import string.h now... 2023-07-24 05:10:55 +00:00
Andrej Karpathy 99354a85ce get rid of compiler warnings from ignoring return value of fread 2023-07-24 04:39:24 +00:00
Andrej Karpathy 6a61831e19 make init code much less sketchy 2023-07-24 04:22:32 +00:00
Andrej Karpathy 3bfa5665d1 delete the run_wrap file! yay. ty @python273 and @ggerganov for code snippets 2023-07-24 04:02:57 +00:00
Andrej Karpathy d7e2c46915 slight tweaks to softmax 2023-07-24 02:02:12 +00:00
Andrej 7d6208870e Merge pull request #19 from mcognetta/master
Simplify Softmax
2023-07-23 18:57:23 -07:00
Marco Cognetta 80679e24be simplify softmax 2023-07-24 10:51:46 +09:00
Junny dc3962f356 remove unused parameter 2023-07-23 15:33:29 -07:00
Andrej Karpathy 9414e7a45e tweaks and add a simple test 2023-07-23 14:52:08 +00:00
Andrej Karpathy 5b161abb9a somewhere ~20 hours later 2023-07-23 05:23:45 +00:00