Commit Graph

91 Commits

Author SHA1 Message Date
Andrej 7f9f5ca853 Update README.md: new llama model export 2023-07-25 16:30:28 -07:00
Andrej 5bcd19a204 Merge pull request #85 from python273/export-llama-without-llama
Export llama without llama
2023-07-25 16:23:56 -07:00
Andrej 614bf91e5d Merge pull request #60 from emma-eva/patch-1
Fixed time_in_ms() compile time error (termux and neoterm)
2023-07-25 16:06:41 -07:00
Andrej 366711acf8 Merge pull request #77 from madroidmaq/master
Update README.md: formate output samples
2023-07-25 16:01:55 -07:00
python273 4d1fa2f2c6 Export llama without llama 2023-07-26 01:32:00 +04:00
madroid ac22fbce7e Update README.md: formate output samples 2023-07-26 00:46:14 +08:00
Andrej 6cf34d610a Update README.md 2023-07-25 08:14:48 -07:00
Andrej Karpathy 34ccb64ed8 fix typo in readme after adding the 110m model 2023-07-25 15:02:11 +00:00
Andrej Karpathy 94730f1766 add the 110m model, as it finished training 2023-07-25 15:00:57 +00:00
Andrej Karpathy 05ee4cbf38 fix bug in timing - use steps not max seq len doh 2023-07-25 14:21:37 +00:00
Andrej d359fae505 Merge pull request #69 from RichardScottOZ/patch-1
intimately
2023-07-25 07:04:17 -07:00
RichardScottOZ f3a1e227fe intimately 2023-07-25 21:26:30 +09:30
Emma Eva 6ce91b1b3b Fixed time_in_ms() compile time error (termux and neoterm)
clang version 16.0.4
2023-07-25 12:12:40 +06:00
Andrej 98ec4ba23d Update README.md 2023-07-24 22:54:54 -07:00
Andrej 81c90bfcb7 Update README.md: small tweaks 2023-07-24 22:51:39 -07:00
Andrej cf625ecd7e Update README.md 2023-07-24 21:25:31 -07:00
Andrej Karpathy c3e0d73bd2 we can inference Meta's Llama 2 7B, yay 2023-07-25 04:21:07 +00:00
Andrej 133ad3ffff Merge pull request #50 from karpathy/memmap
candidate memmap implementation
2023-07-24 18:59:29 -07:00
Andrej Karpathy a1f6b4653e merge conflict resolve with imports 2023-07-25 01:58:46 +00:00
Andrej d18e9efd77 Merge pull request #48 from richinseattle/richinseattle-patch-1
MSVC Compatibility fix for timer
2023-07-24 16:37:37 -07:00
richinseattle b2857c6af2 Switch to using timespec_get() for cross OS compatibility 2023-07-24 16:31:38 -07:00
richinseattle f121f5f0c5 Merge branch 'karpathy:master' into richinseattle-patch-1 2023-07-24 16:30:07 -07:00
Andrej Karpathy cae88dfbab tune readme around timings etc 2023-07-24 23:27:48 +00:00
Andrej Karpathy 496466f78f add rundebug to makefile, useful for spotting issues and such 2023-07-24 23:13:59 +00:00
Andrej Karpathy e6e3f1322b candidate memmap implementation 2023-07-24 22:54:49 +00:00
richinseattle 2be7d7887b MSVC Compatibility fix for timer
use clock() instead of gettimeofday() for cross-platform compatibility
2023-07-24 15:22:20 -07:00
Andrej Karpathy 16edfe6364 add a simple makefile 2023-07-24 21:50:04 +00:00
Andrej bf9f6f2ece Add discord link to Readme 2023-07-24 14:22:29 -07:00
Andrej 669b75ddc8 Merge pull request #43 from krzysztof-jusiak/rmsnorm
Speed up rmsnorm by using sqrtf/expf
2023-07-24 14:13:49 -07:00
Andrej 687473c009 Update README.md with TinyStories model series 2023-07-24 14:11:27 -07:00
Andrej Karpathy 791be9d991 tweak argparse. fix steps=256, even if some models may support longer maximum seq_len. get rid of seed option for now, use temp=0.0 for deterministic behavior 2023-07-24 20:59:32 +00:00
Andrej Karpathy 90ae37c3e6 git push origin masterMerge branch 'admu-progvar-master' 2023-07-24 20:39:40 +00:00
Kris Jusiak c9b1f10124 Speed up rmsnorm by using sqrtf/expf
Problem:
- exp and sqrt are using double precision for operations which is not
  required.

Solution:
- Use expf and sqrtf intead.

Notes:
- Although it's using single precision doesn't seem to affect the
  result.

Results: ~ 10% improvement
  - before:  940 tok/s
  - after:  1020 tok/s
2023-07-24 13:06:27 -05:00
Franz Louis Cesista c9ad067c5d parallelize multi-head attention 2023-07-25 01:10:12 +08:00
Andrej Karpathy 50a086edde add warning about fastmath 2023-07-24 15:18:04 +00:00
Andrej Karpathy fff00ffd07 ack to lambda 2023-07-24 14:31:52 +00:00
Andrej d0ddf94cc3 Merge pull request #36 from hu-po/patch-1
typo
2023-07-24 07:27:36 -07:00
Andrej 228c4ea3ea Merge pull request #28 from SlyEcho/master
Fix tokenizer reading on Windows
2023-07-24 07:23:07 -07:00
Andrej Karpathy 624cdfc76a add dropout support to model 2023-07-24 14:18:50 +00:00
Andrej cdfb49208a Merge pull request #37 from awgu/pt2
Have DDP ignore `freqs_cis` to avoid broadcast
2023-07-24 07:15:40 -07:00
Andrej Karpathy 9055766cf6 docs on how to run with openmp 2023-07-24 14:08:06 +00:00
Andrej Karpathy cbbe4301b0 Merge branch 'krzysztof-jusiak-openmp' 2023-07-24 14:02:28 +00:00
Andrew Gu 25494f9cbc Have DDP ignore freqs_cis to avoid broadcast 2023-07-24 13:58:09 +00:00
hu-po d95c7617c6 typo 2023-07-24 07:35:12 -05:00
Henri Vasserman 4d637983ad Fix tokenizer reading on Windows 2023-07-24 11:08:29 +03:00
Kris Jusiak 0a0ca73c65 [openmp] 1.5x inference speedup
Problem:
- clock is CPU and doesn't work properly with parallel execution.
- perf execution is matmul x weights bound.

Solution:
- use gettimeofday instead.
- utilize openmp to parallelize matmul.

Note:
- if not compiled with -fopenmp the #pragma is ignored and single
  execution is performed.
- there are additional env variable to setup for openmp (optinally)
  to setup the number of threads, scheduler etc.

Benchmarks:
```
clang -Ofast -march=native  run.c  -lm  -o run          // achieved tok/s: 340.878828
clang -Ofast -fopenmp -march=native  run.c  -lm  -o run // achieved tok/s: 524.590164
```
2023-07-24 01:55:23 -05:00
Andrej Karpathy d548245321 readme update; -Ofast enables the other ones so they become spurious 2023-07-24 05:20:21 +00:00
Andrej 0e4076cd52 Merge pull request #25 from wsmoses/master
Add information on compiler flags
2023-07-23 22:12:28 -07:00
Andrej Karpathy f6388c99c8 delete the copy function in favor of memcpy. sadly we have to import string.h now... 2023-07-24 05:10:55 +00:00
William Moses 65e07462e4 Add information on compiler flags 2023-07-23 19:08:17 -10:00