Andrej
614bf91e5d
Merge pull request #60 from emma-eva/patch-1
...
Fixed time_in_ms() compile time error (termux and neoterm)
2023-07-25 16:06:41 -07:00
Andrej Karpathy
05ee4cbf38
fix bug in timing - use steps not max seq len doh
2023-07-25 14:21:37 +00:00
Emma Eva
6ce91b1b3b
Fixed time_in_ms() compile time error (termux and neoterm)
...
clang version 16.0.4
2023-07-25 12:12:40 +06:00
Andrej Karpathy
c3e0d73bd2
we can inference Meta's Llama 2 7B, yay
2023-07-25 04:21:07 +00:00
Andrej Karpathy
a1f6b4653e
merge conflict resolve with imports
2023-07-25 01:58:46 +00:00
richinseattle
b2857c6af2
Switch to using timespec_get() for cross OS compatibility
2023-07-24 16:31:38 -07:00
Andrej Karpathy
e6e3f1322b
candidate memmap implementation
2023-07-24 22:54:49 +00:00
richinseattle
2be7d7887b
MSVC Compatibility fix for timer
...
use clock() instead of gettimeofday() for cross-platform compatibility
2023-07-24 15:22:20 -07:00
Andrej
669b75ddc8
Merge pull request #43 from krzysztof-jusiak/rmsnorm
...
Speed up rmsnorm by using sqrtf/expf
2023-07-24 14:13:49 -07:00
Andrej Karpathy
791be9d991
tweak argparse. fix steps=256, even if some models may support longer maximum seq_len. get rid of seed option for now, use temp=0.0 for deterministic behavior
2023-07-24 20:59:32 +00:00
Kris Jusiak
c9b1f10124
Speed up rmsnorm by using sqrtf/expf
...
Problem:
- exp and sqrt are using double precision for operations which is not
required.
Solution:
- Use expf and sqrtf intead.
Notes:
- Although it's using single precision doesn't seem to affect the
result.
Results: ~ 10% improvement
- before: 940 tok/s
- after: 1020 tok/s
2023-07-24 13:06:27 -05:00
Franz Louis Cesista
c9ad067c5d
parallelize multi-head attention
2023-07-25 01:10:12 +08:00
Andrej
d0ddf94cc3
Merge pull request #36 from hu-po/patch-1
...
typo
2023-07-24 07:27:36 -07:00
Andrej
228c4ea3ea
Merge pull request #28 from SlyEcho/master
...
Fix tokenizer reading on Windows
2023-07-24 07:23:07 -07:00
hu-po
d95c7617c6
typo
2023-07-24 07:35:12 -05:00
Henri Vasserman
4d637983ad
Fix tokenizer reading on Windows
2023-07-24 11:08:29 +03:00
Kris Jusiak
0a0ca73c65
[openmp] 1.5x inference speedup
...
Problem:
- clock is CPU and doesn't work properly with parallel execution.
- perf execution is matmul x weights bound.
Solution:
- use gettimeofday instead.
- utilize openmp to parallelize matmul.
Note:
- if not compiled with -fopenmp the #pragma is ignored and single
execution is performed.
- there are additional env variable to setup for openmp (optinally)
to setup the number of threads, scheduler etc.
Benchmarks:
```
clang -Ofast -march=native run.c -lm -o run // achieved tok/s: 340.878828
clang -Ofast -fopenmp -march=native run.c -lm -o run // achieved tok/s: 524.590164
```
2023-07-24 01:55:23 -05:00
Andrej Karpathy
f6388c99c8
delete the copy function in favor of memcpy. sadly we have to import string.h now...
2023-07-24 05:10:55 +00:00
Andrej Karpathy
99354a85ce
get rid of compiler warnings from ignoring return value of fread
2023-07-24 04:39:24 +00:00
Andrej Karpathy
6a61831e19
make init code much less sketchy
2023-07-24 04:22:32 +00:00
Andrej Karpathy
3bfa5665d1
delete the run_wrap file! yay. ty @python273 and @ggerganov for code snippets
2023-07-24 04:02:57 +00:00
Andrej Karpathy
d7e2c46915
slight tweaks to softmax
2023-07-24 02:02:12 +00:00
Andrej
7d6208870e
Merge pull request #19 from mcognetta/master
...
Simplify Softmax
2023-07-23 18:57:23 -07:00
Marco Cognetta
80679e24be
simplify softmax
2023-07-24 10:51:46 +09:00
Junny
dc3962f356
remove unused parameter
2023-07-23 15:33:29 -07:00
Andrej Karpathy
9414e7a45e
tweaks and add a simple test
2023-07-23 14:52:08 +00:00
Andrej Karpathy
5b161abb9a
somewhere ~20 hours later
2023-07-23 05:23:45 +00:00