Commit Graph

71 Commits

Author SHA1 Message Date
Andrej d18e9efd77 Merge pull request #48 from richinseattle/richinseattle-patch-1
MSVC Compatibility fix for timer
2023-07-24 16:37:37 -07:00
richinseattle b2857c6af2 Switch to using timespec_get() for cross OS compatibility 2023-07-24 16:31:38 -07:00
richinseattle f121f5f0c5 Merge branch 'karpathy:master' into richinseattle-patch-1 2023-07-24 16:30:07 -07:00
Andrej Karpathy cae88dfbab tune readme around timings etc 2023-07-24 23:27:48 +00:00
Andrej Karpathy 496466f78f add rundebug to makefile, useful for spotting issues and such 2023-07-24 23:13:59 +00:00
richinseattle 2be7d7887b MSVC Compatibility fix for timer
use clock() instead of gettimeofday() for cross-platform compatibility
2023-07-24 15:22:20 -07:00
Andrej Karpathy 16edfe6364 add a simple makefile 2023-07-24 21:50:04 +00:00
Andrej bf9f6f2ece Add discord link to Readme 2023-07-24 14:22:29 -07:00
Andrej 669b75ddc8 Merge pull request #43 from krzysztof-jusiak/rmsnorm
Speed up rmsnorm by using sqrtf/expf
2023-07-24 14:13:49 -07:00
Andrej 687473c009 Update README.md with TinyStories model series 2023-07-24 14:11:27 -07:00
Andrej Karpathy 791be9d991 tweak argparse. fix steps=256, even if some models may support longer maximum seq_len. get rid of seed option for now, use temp=0.0 for deterministic behavior 2023-07-24 20:59:32 +00:00
Andrej Karpathy 90ae37c3e6 git push origin masterMerge branch 'admu-progvar-master' 2023-07-24 20:39:40 +00:00
Kris Jusiak c9b1f10124 Speed up rmsnorm by using sqrtf/expf
Problem:
- exp and sqrt are using double precision for operations which is not
  required.

Solution:
- Use expf and sqrtf intead.

Notes:
- Although it's using single precision doesn't seem to affect the
  result.

Results: ~ 10% improvement
  - before:  940 tok/s
  - after:  1020 tok/s
2023-07-24 13:06:27 -05:00
Franz Louis Cesista c9ad067c5d parallelize multi-head attention 2023-07-25 01:10:12 +08:00
Andrej Karpathy 50a086edde add warning about fastmath 2023-07-24 15:18:04 +00:00
Andrej Karpathy fff00ffd07 ack to lambda 2023-07-24 14:31:52 +00:00
Andrej d0ddf94cc3 Merge pull request #36 from hu-po/patch-1
typo
2023-07-24 07:27:36 -07:00
Andrej 228c4ea3ea Merge pull request #28 from SlyEcho/master
Fix tokenizer reading on Windows
2023-07-24 07:23:07 -07:00
Andrej Karpathy 624cdfc76a add dropout support to model 2023-07-24 14:18:50 +00:00
Andrej cdfb49208a Merge pull request #37 from awgu/pt2
Have DDP ignore `freqs_cis` to avoid broadcast
2023-07-24 07:15:40 -07:00
Andrej Karpathy 9055766cf6 docs on how to run with openmp 2023-07-24 14:08:06 +00:00
Andrej Karpathy cbbe4301b0 Merge branch 'krzysztof-jusiak-openmp' 2023-07-24 14:02:28 +00:00
Andrew Gu 25494f9cbc Have DDP ignore freqs_cis to avoid broadcast 2023-07-24 13:58:09 +00:00
hu-po d95c7617c6 typo 2023-07-24 07:35:12 -05:00
Henri Vasserman 4d637983ad Fix tokenizer reading on Windows 2023-07-24 11:08:29 +03:00
Kris Jusiak 0a0ca73c65 [openmp] 1.5x inference speedup
Problem:
- clock is CPU and doesn't work properly with parallel execution.
- perf execution is matmul x weights bound.

Solution:
- use gettimeofday instead.
- utilize openmp to parallelize matmul.

Note:
- if not compiled with -fopenmp the #pragma is ignored and single
  execution is performed.
- there are additional env variable to setup for openmp (optinally)
  to setup the number of threads, scheduler etc.

Benchmarks:
```
clang -Ofast -march=native  run.c  -lm  -o run          // achieved tok/s: 340.878828
clang -Ofast -fopenmp -march=native  run.c  -lm  -o run // achieved tok/s: 524.590164
```
2023-07-24 01:55:23 -05:00
Andrej Karpathy d548245321 readme update; -Ofast enables the other ones so they become spurious 2023-07-24 05:20:21 +00:00
Andrej 0e4076cd52 Merge pull request #25 from wsmoses/master
Add information on compiler flags
2023-07-23 22:12:28 -07:00
Andrej Karpathy f6388c99c8 delete the copy function in favor of memcpy. sadly we have to import string.h now... 2023-07-24 05:10:55 +00:00
William Moses 65e07462e4 Add information on compiler flags 2023-07-23 19:08:17 -10:00
Andrej Karpathy b2204e1633 include example story from 44m model 2023-07-24 04:57:30 +00:00
Andrej Karpathy ba6acc9378 add pointer to the new 44M param model. which is still way too fast to inference, i have to train an even bigger one. 2023-07-24 04:53:37 +00:00
Andrej Karpathy 99354a85ce get rid of compiler warnings from ignoring return value of fread 2023-07-24 04:39:24 +00:00
Andrej Karpathy 6a61831e19 make init code much less sketchy 2023-07-24 04:22:32 +00:00
Andrej bd9e837b14 Merge pull request #23 from awgu/pt2
Register `freqs_cis` as non-persistent buffer
2023-07-23 21:08:17 -07:00
Andrej Karpathy 3bfa5665d1 delete the run_wrap file! yay. ty @python273 and @ggerganov for code snippets 2023-07-24 04:02:57 +00:00
Andrew Gu af3b5c0364 Register freqs_cis as non-persistent buffer 2023-07-24 03:18:20 +00:00
Andrej 44ecc784da performance guide tweak 2023-07-23 20:09:25 -07:00
Andrej f4e2cc7d96 Add performance optimization section 2023-07-23 20:04:39 -07:00
Andrej Karpathy d7e2c46915 slight tweaks to softmax 2023-07-24 02:02:12 +00:00
Andrej 7d6208870e Merge pull request #19 from mcognetta/master
Simplify Softmax
2023-07-23 18:57:23 -07:00
Andrej 15e7c92fad Merge pull request #10 from luigifcruz/patch-1
Bigger number better with -funsafe-math-optimizations flag.
2023-07-23 18:52:36 -07:00
Marco Cognetta 80679e24be simplify softmax 2023-07-24 10:51:46 +09:00
Andrej 1b63a9510e Merge pull request #16 from zejunh/master
Remove unused config parameter
2023-07-23 18:38:12 -07:00
Luigi Cruz 1eb111adc7 Turn -funsafe-math-optimizations optional. 2023-07-23 20:09:45 -03:00
Junny dc3962f356 remove unused parameter 2023-07-23 15:33:29 -07:00
Luigi Cruz f9da392147 Add missing flag. 2023-07-23 17:14:11 -03:00
Luigi Cruz 114d8cfcb6 Add -funsafe-math-optimizations flag. 2023-07-23 17:08:27 -03:00
Andrej 7d401d530c Merge pull request #5 from danielgross/pleasantify-dx
Make sample.py work out of the box
2023-07-23 11:58:03 -07:00
Andrej Karpathy 3b7b4878b4 compile with -O3 to increase tok/s from 18 to 98! wow, i have to train a bigger model now 2023-07-23 18:55:46 +00:00