Commit Graph

11 Commits

Author SHA1 Message Date
Kris Jusiak 0a0ca73c65 [openmp] 1.5x inference speedup
Problem:
- clock is CPU and doesn't work properly with parallel execution.
- perf execution is matmul x weights bound.

Solution:
- use gettimeofday instead.
- utilize openmp to parallelize matmul.

Note:
- if not compiled with -fopenmp the #pragma is ignored and single
  execution is performed.
- there are additional env variable to setup for openmp (optinally)
  to setup the number of threads, scheduler etc.

Benchmarks:
```
clang -Ofast -march=native  run.c  -lm  -o run          // achieved tok/s: 340.878828
clang -Ofast -fopenmp -march=native  run.c  -lm  -o run // achieved tok/s: 524.590164
```
2023-07-24 01:55:23 -05:00
Andrej Karpathy f6388c99c8 delete the copy function in favor of memcpy. sadly we have to import string.h now... 2023-07-24 05:10:55 +00:00
Andrej Karpathy 99354a85ce get rid of compiler warnings from ignoring return value of fread 2023-07-24 04:39:24 +00:00
Andrej Karpathy 6a61831e19 make init code much less sketchy 2023-07-24 04:22:32 +00:00
Andrej Karpathy 3bfa5665d1 delete the run_wrap file! yay. ty @python273 and @ggerganov for code snippets 2023-07-24 04:02:57 +00:00
Andrej Karpathy d7e2c46915 slight tweaks to softmax 2023-07-24 02:02:12 +00:00
Andrej 7d6208870e Merge pull request #19 from mcognetta/master
Simplify Softmax
2023-07-23 18:57:23 -07:00
Marco Cognetta 80679e24be simplify softmax 2023-07-24 10:51:46 +09:00
Junny dc3962f356 remove unused parameter 2023-07-23 15:33:29 -07:00
Andrej Karpathy 9414e7a45e tweaks and add a simple test 2023-07-23 14:52:08 +00:00
Andrej Karpathy 5b161abb9a somewhere ~20 hours later 2023-07-23 05:23:45 +00:00