Andrej Karpathy
90ae37c3e6
git push origin masterMerge branch 'admu-progvar-master'
2023-07-24 20:39:40 +00:00
Kris Jusiak
c9b1f10124
Speed up rmsnorm by using sqrtf/expf
...
Problem:
- exp and sqrt are using double precision for operations which is not
required.
Solution:
- Use expf and sqrtf intead.
Notes:
- Although it's using single precision doesn't seem to affect the
result.
Results: ~ 10% improvement
- before: 940 tok/s
- after: 1020 tok/s
2023-07-24 13:06:27 -05:00
Franz Louis Cesista
c9ad067c5d
parallelize multi-head attention
2023-07-25 01:10:12 +08:00
Andrej Karpathy
50a086edde
add warning about fastmath
2023-07-24 15:18:04 +00:00
Andrej Karpathy
fff00ffd07
ack to lambda
2023-07-24 14:31:52 +00:00
Andrej
d0ddf94cc3
Merge pull request #36 from hu-po/patch-1
...
typo
2023-07-24 07:27:36 -07:00
Andrej
228c4ea3ea
Merge pull request #28 from SlyEcho/master
...
Fix tokenizer reading on Windows
2023-07-24 07:23:07 -07:00
Andrej Karpathy
624cdfc76a
add dropout support to model
2023-07-24 14:18:50 +00:00
Andrej
cdfb49208a
Merge pull request #37 from awgu/pt2
...
Have DDP ignore `freqs_cis` to avoid broadcast
2023-07-24 07:15:40 -07:00
Andrej Karpathy
9055766cf6
docs on how to run with openmp
2023-07-24 14:08:06 +00:00
Andrej Karpathy
cbbe4301b0
Merge branch 'krzysztof-jusiak-openmp'
2023-07-24 14:02:28 +00:00
Andrew Gu
25494f9cbc
Have DDP ignore freqs_cis to avoid broadcast
2023-07-24 13:58:09 +00:00
hu-po
d95c7617c6
typo
2023-07-24 07:35:12 -05:00
Henri Vasserman
4d637983ad
Fix tokenizer reading on Windows
2023-07-24 11:08:29 +03:00
Kris Jusiak
0a0ca73c65
[openmp] 1.5x inference speedup
...
Problem:
- clock is CPU and doesn't work properly with parallel execution.
- perf execution is matmul x weights bound.
Solution:
- use gettimeofday instead.
- utilize openmp to parallelize matmul.
Note:
- if not compiled with -fopenmp the #pragma is ignored and single
execution is performed.
- there are additional env variable to setup for openmp (optinally)
to setup the number of threads, scheduler etc.
Benchmarks:
```
clang -Ofast -march=native run.c -lm -o run // achieved tok/s: 340.878828
clang -Ofast -fopenmp -march=native run.c -lm -o run // achieved tok/s: 524.590164
```
2023-07-24 01:55:23 -05:00
Andrej Karpathy
d548245321
readme update; -Ofast enables the other ones so they become spurious
2023-07-24 05:20:21 +00:00
Andrej
0e4076cd52
Merge pull request #25 from wsmoses/master
...
Add information on compiler flags
2023-07-23 22:12:28 -07:00
Andrej Karpathy
f6388c99c8
delete the copy function in favor of memcpy. sadly we have to import string.h now...
2023-07-24 05:10:55 +00:00
William Moses
65e07462e4
Add information on compiler flags
2023-07-23 19:08:17 -10:00
Andrej Karpathy
b2204e1633
include example story from 44m model
2023-07-24 04:57:30 +00:00
Andrej Karpathy
ba6acc9378
add pointer to the new 44M param model. which is still way too fast to inference, i have to train an even bigger one.
2023-07-24 04:53:37 +00:00
Andrej Karpathy
99354a85ce
get rid of compiler warnings from ignoring return value of fread
2023-07-24 04:39:24 +00:00
Andrej Karpathy
6a61831e19
make init code much less sketchy
2023-07-24 04:22:32 +00:00
Andrej
bd9e837b14
Merge pull request #23 from awgu/pt2
...
Register `freqs_cis` as non-persistent buffer
2023-07-23 21:08:17 -07:00
Andrej Karpathy
3bfa5665d1
delete the run_wrap file! yay. ty @python273 and @ggerganov for code snippets
2023-07-24 04:02:57 +00:00
Andrew Gu
af3b5c0364
Register freqs_cis as non-persistent buffer
2023-07-24 03:18:20 +00:00
Andrej
44ecc784da
performance guide tweak
2023-07-23 20:09:25 -07:00
Andrej
f4e2cc7d96
Add performance optimization section
2023-07-23 20:04:39 -07:00
Andrej Karpathy
d7e2c46915
slight tweaks to softmax
2023-07-24 02:02:12 +00:00
Andrej
7d6208870e
Merge pull request #19 from mcognetta/master
...
Simplify Softmax
2023-07-23 18:57:23 -07:00
Andrej
15e7c92fad
Merge pull request #10 from luigifcruz/patch-1
...
Bigger number better with -funsafe-math-optimizations flag.
2023-07-23 18:52:36 -07:00
Marco Cognetta
80679e24be
simplify softmax
2023-07-24 10:51:46 +09:00
Andrej
1b63a9510e
Merge pull request #16 from zejunh/master
...
Remove unused config parameter
2023-07-23 18:38:12 -07:00
Luigi Cruz
1eb111adc7
Turn -funsafe-math-optimizations optional.
2023-07-23 20:09:45 -03:00
Junny
dc3962f356
remove unused parameter
2023-07-23 15:33:29 -07:00
Luigi Cruz
f9da392147
Add missing flag.
2023-07-23 17:14:11 -03:00
Luigi Cruz
114d8cfcb6
Add -funsafe-math-optimizations flag.
2023-07-23 17:08:27 -03:00
Andrej
7d401d530c
Merge pull request #5 from danielgross/pleasantify-dx
...
Make sample.py work out of the box
2023-07-23 11:58:03 -07:00
Andrej Karpathy
3b7b4878b4
compile with -O3 to increase tok/s from 18 to 98! wow, i have to train a bigger model now
2023-07-23 18:55:46 +00:00
Daniel Gross
8c383c28f9
Update README.md
2023-07-23 10:46:36 -07:00
Daniel Gross
518524f458
default to whatever system has
2023-07-23 10:41:03 -07:00
Andrej Karpathy
fa872540ba
fix comments in readme about spaces
2023-07-23 17:11:35 +00:00
Andrej Karpathy
5baaf9df06
small format tweaks, get rid of prints in tokenizer
2023-07-23 17:09:23 +00:00
Andrej
deb3818db9
Merge pull request #1 from sumo43/master
...
Fix streaming
2023-07-23 10:07:40 -07:00
Andrej Karpathy
ad67d5e29c
strike one tiny todo
2023-07-23 17:05:22 +00:00
Andrej
353266aaae
Merge pull request #3 from vovw/master
...
added requirement.txt
2023-07-23 10:04:42 -07:00
voidz7
13d7827ba4
added requirement.txt
2023-07-23 22:31:16 +05:30
Artem Yatsenko
0bddcd94c1
Update run_wrap.py
2023-07-23 09:28:49 -07:00
Andrej
00727ba1c0
Update README.md
2023-07-23 09:00:26 -07:00
Andrej Karpathy
4af4b8abd4
add sample output story
2023-07-23 15:59:19 +00:00