Andrej
98ec4ba23d
Update README.md
2023-07-24 22:54:54 -07:00
Andrej
81c90bfcb7
Update README.md: small tweaks
2023-07-24 22:51:39 -07:00
Andrej
cf625ecd7e
Update README.md
2023-07-24 21:25:31 -07:00
Andrej Karpathy
c3e0d73bd2
we can inference Meta's Llama 2 7B, yay
2023-07-25 04:21:07 +00:00
Andrej
133ad3ffff
Merge pull request #50 from karpathy/memmap
...
candidate memmap implementation
2023-07-24 18:59:29 -07:00
Andrej Karpathy
a1f6b4653e
merge conflict resolve with imports
2023-07-25 01:58:46 +00:00
Andrej
d18e9efd77
Merge pull request #48 from richinseattle/richinseattle-patch-1
...
MSVC Compatibility fix for timer
2023-07-24 16:37:37 -07:00
richinseattle
b2857c6af2
Switch to using timespec_get() for cross OS compatibility
2023-07-24 16:31:38 -07:00
richinseattle
f121f5f0c5
Merge branch 'karpathy:master' into richinseattle-patch-1
2023-07-24 16:30:07 -07:00
Andrej Karpathy
cae88dfbab
tune readme around timings etc
2023-07-24 23:27:48 +00:00
Andrej Karpathy
496466f78f
add rundebug to makefile, useful for spotting issues and such
2023-07-24 23:13:59 +00:00
Andrej Karpathy
e6e3f1322b
candidate memmap implementation
2023-07-24 22:54:49 +00:00
richinseattle
2be7d7887b
MSVC Compatibility fix for timer
...
use clock() instead of gettimeofday() for cross-platform compatibility
2023-07-24 15:22:20 -07:00
Andrej Karpathy
16edfe6364
add a simple makefile
2023-07-24 21:50:04 +00:00
Andrej
bf9f6f2ece
Add discord link to Readme
2023-07-24 14:22:29 -07:00
Andrej
669b75ddc8
Merge pull request #43 from krzysztof-jusiak/rmsnorm
...
Speed up rmsnorm by using sqrtf/expf
2023-07-24 14:13:49 -07:00
Andrej
687473c009
Update README.md with TinyStories model series
2023-07-24 14:11:27 -07:00
Andrej Karpathy
791be9d991
tweak argparse. fix steps=256, even if some models may support longer maximum seq_len. get rid of seed option for now, use temp=0.0 for deterministic behavior
2023-07-24 20:59:32 +00:00
Andrej Karpathy
90ae37c3e6
git push origin masterMerge branch 'admu-progvar-master'
2023-07-24 20:39:40 +00:00
Kris Jusiak
c9b1f10124
Speed up rmsnorm by using sqrtf/expf
...
Problem:
- exp and sqrt are using double precision for operations which is not
required.
Solution:
- Use expf and sqrtf intead.
Notes:
- Although it's using single precision doesn't seem to affect the
result.
Results: ~ 10% improvement
- before: 940 tok/s
- after: 1020 tok/s
2023-07-24 13:06:27 -05:00
Franz Louis Cesista
c9ad067c5d
parallelize multi-head attention
2023-07-25 01:10:12 +08:00
Andrej Karpathy
50a086edde
add warning about fastmath
2023-07-24 15:18:04 +00:00
Andrej Karpathy
fff00ffd07
ack to lambda
2023-07-24 14:31:52 +00:00
Andrej
d0ddf94cc3
Merge pull request #36 from hu-po/patch-1
...
typo
2023-07-24 07:27:36 -07:00
Andrej
228c4ea3ea
Merge pull request #28 from SlyEcho/master
...
Fix tokenizer reading on Windows
2023-07-24 07:23:07 -07:00
Andrej Karpathy
624cdfc76a
add dropout support to model
2023-07-24 14:18:50 +00:00
Andrej
cdfb49208a
Merge pull request #37 from awgu/pt2
...
Have DDP ignore `freqs_cis` to avoid broadcast
2023-07-24 07:15:40 -07:00
Andrej Karpathy
9055766cf6
docs on how to run with openmp
2023-07-24 14:08:06 +00:00
Andrej Karpathy
cbbe4301b0
Merge branch 'krzysztof-jusiak-openmp'
2023-07-24 14:02:28 +00:00
Andrew Gu
25494f9cbc
Have DDP ignore freqs_cis to avoid broadcast
2023-07-24 13:58:09 +00:00
hu-po
d95c7617c6
typo
2023-07-24 07:35:12 -05:00
Henri Vasserman
4d637983ad
Fix tokenizer reading on Windows
2023-07-24 11:08:29 +03:00
Kris Jusiak
0a0ca73c65
[openmp] 1.5x inference speedup
...
Problem:
- clock is CPU and doesn't work properly with parallel execution.
- perf execution is matmul x weights bound.
Solution:
- use gettimeofday instead.
- utilize openmp to parallelize matmul.
Note:
- if not compiled with -fopenmp the #pragma is ignored and single
execution is performed.
- there are additional env variable to setup for openmp (optinally)
to setup the number of threads, scheduler etc.
Benchmarks:
```
clang -Ofast -march=native run.c -lm -o run // achieved tok/s: 340.878828
clang -Ofast -fopenmp -march=native run.c -lm -o run // achieved tok/s: 524.590164
```
2023-07-24 01:55:23 -05:00
Andrej Karpathy
d548245321
readme update; -Ofast enables the other ones so they become spurious
2023-07-24 05:20:21 +00:00
Andrej
0e4076cd52
Merge pull request #25 from wsmoses/master
...
Add information on compiler flags
2023-07-23 22:12:28 -07:00
Andrej Karpathy
f6388c99c8
delete the copy function in favor of memcpy. sadly we have to import string.h now...
2023-07-24 05:10:55 +00:00
William Moses
65e07462e4
Add information on compiler flags
2023-07-23 19:08:17 -10:00
Andrej Karpathy
b2204e1633
include example story from 44m model
2023-07-24 04:57:30 +00:00
Andrej Karpathy
ba6acc9378
add pointer to the new 44M param model. which is still way too fast to inference, i have to train an even bigger one.
2023-07-24 04:53:37 +00:00
Andrej Karpathy
99354a85ce
get rid of compiler warnings from ignoring return value of fread
2023-07-24 04:39:24 +00:00
Andrej Karpathy
6a61831e19
make init code much less sketchy
2023-07-24 04:22:32 +00:00
Andrej
bd9e837b14
Merge pull request #23 from awgu/pt2
...
Register `freqs_cis` as non-persistent buffer
2023-07-23 21:08:17 -07:00
Andrej Karpathy
3bfa5665d1
delete the run_wrap file! yay. ty @python273 and @ggerganov for code snippets
2023-07-24 04:02:57 +00:00
Andrew Gu
af3b5c0364
Register freqs_cis as non-persistent buffer
2023-07-24 03:18:20 +00:00
Andrej
44ecc784da
performance guide tweak
2023-07-23 20:09:25 -07:00
Andrej
f4e2cc7d96
Add performance optimization section
2023-07-23 20:04:39 -07:00
Andrej Karpathy
d7e2c46915
slight tweaks to softmax
2023-07-24 02:02:12 +00:00
Andrej
7d6208870e
Merge pull request #19 from mcognetta/master
...
Simplify Softmax
2023-07-23 18:57:23 -07:00
Andrej
15e7c92fad
Merge pull request #10 from luigifcruz/patch-1
...
Bigger number better with -funsafe-math-optimizations flag.
2023-07-23 18:52:36 -07:00
Marco Cognetta
80679e24be
simplify softmax
2023-07-24 10:51:46 +09:00