Kris Jusiak
0a0ca73c65
[openmp] 1.5x inference speedup
...
Problem:
- clock is CPU and doesn't work properly with parallel execution.
- perf execution is matmul x weights bound.
Solution:
- use gettimeofday instead.
- utilize openmp to parallelize matmul.
Note:
- if not compiled with -fopenmp the #pragma is ignored and single
execution is performed.
- there are additional env variable to setup for openmp (optinally)
to setup the number of threads, scheduler etc.
Benchmarks:
```
clang -Ofast -march=native run.c -lm -o run // achieved tok/s: 340.878828
clang -Ofast -fopenmp -march=native run.c -lm -o run // achieved tok/s: 524.590164
```
2023-07-24 01:55:23 -05:00
Andrej Karpathy
d548245321
readme update; -Ofast enables the other ones so they become spurious
2023-07-24 05:20:21 +00:00
Andrej
0e4076cd52
Merge pull request #25 from wsmoses/master
...
Add information on compiler flags
2023-07-23 22:12:28 -07:00
Andrej Karpathy
f6388c99c8
delete the copy function in favor of memcpy. sadly we have to import string.h now...
2023-07-24 05:10:55 +00:00
William Moses
65e07462e4
Add information on compiler flags
2023-07-23 19:08:17 -10:00
Andrej Karpathy
b2204e1633
include example story from 44m model
2023-07-24 04:57:30 +00:00
Andrej Karpathy
ba6acc9378
add pointer to the new 44M param model. which is still way too fast to inference, i have to train an even bigger one.
2023-07-24 04:53:37 +00:00
Andrej Karpathy
99354a85ce
get rid of compiler warnings from ignoring return value of fread
2023-07-24 04:39:24 +00:00
Andrej Karpathy
6a61831e19
make init code much less sketchy
2023-07-24 04:22:32 +00:00
Andrej
bd9e837b14
Merge pull request #23 from awgu/pt2
...
Register `freqs_cis` as non-persistent buffer
2023-07-23 21:08:17 -07:00
Andrej Karpathy
3bfa5665d1
delete the run_wrap file! yay. ty @python273 and @ggerganov for code snippets
2023-07-24 04:02:57 +00:00
Andrew Gu
af3b5c0364
Register freqs_cis as non-persistent buffer
2023-07-24 03:18:20 +00:00
Andrej
44ecc784da
performance guide tweak
2023-07-23 20:09:25 -07:00
Andrej
f4e2cc7d96
Add performance optimization section
2023-07-23 20:04:39 -07:00
Andrej Karpathy
d7e2c46915
slight tweaks to softmax
2023-07-24 02:02:12 +00:00
Andrej
7d6208870e
Merge pull request #19 from mcognetta/master
...
Simplify Softmax
2023-07-23 18:57:23 -07:00
Andrej
15e7c92fad
Merge pull request #10 from luigifcruz/patch-1
...
Bigger number better with -funsafe-math-optimizations flag.
2023-07-23 18:52:36 -07:00
Marco Cognetta
80679e24be
simplify softmax
2023-07-24 10:51:46 +09:00
Andrej
1b63a9510e
Merge pull request #16 from zejunh/master
...
Remove unused config parameter
2023-07-23 18:38:12 -07:00
Luigi Cruz
1eb111adc7
Turn -funsafe-math-optimizations optional.
2023-07-23 20:09:45 -03:00
Junny
dc3962f356
remove unused parameter
2023-07-23 15:33:29 -07:00
Luigi Cruz
f9da392147
Add missing flag.
2023-07-23 17:14:11 -03:00
Luigi Cruz
114d8cfcb6
Add -funsafe-math-optimizations flag.
2023-07-23 17:08:27 -03:00
Andrej
7d401d530c
Merge pull request #5 from danielgross/pleasantify-dx
...
Make sample.py work out of the box
2023-07-23 11:58:03 -07:00
Andrej Karpathy
3b7b4878b4
compile with -O3 to increase tok/s from 18 to 98! wow, i have to train a bigger model now
2023-07-23 18:55:46 +00:00
Daniel Gross
8c383c28f9
Update README.md
2023-07-23 10:46:36 -07:00
Daniel Gross
518524f458
default to whatever system has
2023-07-23 10:41:03 -07:00
Andrej Karpathy
fa872540ba
fix comments in readme about spaces
2023-07-23 17:11:35 +00:00
Andrej Karpathy
5baaf9df06
small format tweaks, get rid of prints in tokenizer
2023-07-23 17:09:23 +00:00
Andrej
deb3818db9
Merge pull request #1 from sumo43/master
...
Fix streaming
2023-07-23 10:07:40 -07:00
Andrej Karpathy
ad67d5e29c
strike one tiny todo
2023-07-23 17:05:22 +00:00
Andrej
353266aaae
Merge pull request #3 from vovw/master
...
added requirement.txt
2023-07-23 10:04:42 -07:00
voidz7
13d7827ba4
added requirement.txt
2023-07-23 22:31:16 +05:30
Artem Yatsenko
0bddcd94c1
Update run_wrap.py
2023-07-23 09:28:49 -07:00
Andrej
00727ba1c0
Update README.md
2023-07-23 09:00:26 -07:00
Andrej Karpathy
4af4b8abd4
add sample output story
2023-07-23 15:59:19 +00:00
Andrej Karpathy
523ba69578
fix readme
2023-07-23 15:29:37 +00:00
Andrej Karpathy
24917b23de
fix run command
2023-07-23 15:28:24 +00:00
Andrej Karpathy
0c2a880063
add my pretrained model links
2023-07-23 15:24:23 +00:00
Andrej Karpathy
9414e7a45e
tweaks and add a simple test
2023-07-23 14:52:08 +00:00
Andrej Karpathy
f499d9d2b5
delete debug line
2023-07-23 05:37:44 +00:00
Andrej
405eefded1
Update README.md
2023-07-22 22:35:38 -07:00
Andrej
9148cae17d
Update README.md
2023-07-22 22:30:25 -07:00
Andrej Karpathy
60d32cf13a
move lines around
2023-07-23 05:25:07 +00:00
Andrej Karpathy
5b161abb9a
somewhere ~20 hours later
2023-07-23 05:23:45 +00:00
Andrej
731657856e
Initial commit
2023-07-22 22:15:06 -07:00