Commit Graph

97 Commits

Author SHA1 Message Date
Andrej 7496ea8108 Update README.md 2023-07-26 08:59:42 -07:00
Andrej f5d8797af2 Update README.md 2023-07-26 08:59:12 -07:00
Andrej Karpathy 3aedfe59f1 Merge branch 'aegkmq-master' 2023-07-26 15:43:06 +00:00
aegkmq 8986005f23 Minor cleanup 2023-07-26 16:57:08 +09:00
aegkmq 36c522a0d8 Improve locality 2023-07-26 13:24:27 +09:00
Andrej Karpathy f5650891d5 honestly at this point this is a lot more my nanogpt code than llama code 2023-07-25 23:57:03 +00:00
Andrej 7f9f5ca853 Update README.md: new llama model export 2023-07-25 16:30:28 -07:00
Andrej 5bcd19a204 Merge pull request #85 from python273/export-llama-without-llama
Export llama without llama
2023-07-25 16:23:56 -07:00
Andrej 614bf91e5d Merge pull request #60 from emma-eva/patch-1
Fixed time_in_ms() compile time error (termux and neoterm)
2023-07-25 16:06:41 -07:00
Andrej 366711acf8 Merge pull request #77 from madroidmaq/master
Update README.md: formate output samples
2023-07-25 16:01:55 -07:00
python273 4d1fa2f2c6 Export llama without llama 2023-07-26 01:32:00 +04:00
madroid ac22fbce7e Update README.md: formate output samples 2023-07-26 00:46:14 +08:00
Andrej 6cf34d610a Update README.md 2023-07-25 08:14:48 -07:00
Andrej Karpathy 34ccb64ed8 fix typo in readme after adding the 110m model 2023-07-25 15:02:11 +00:00
Andrej Karpathy 94730f1766 add the 110m model, as it finished training 2023-07-25 15:00:57 +00:00
Andrej Karpathy 05ee4cbf38 fix bug in timing - use steps not max seq len doh 2023-07-25 14:21:37 +00:00
Andrej d359fae505 Merge pull request #69 from RichardScottOZ/patch-1
intimately
2023-07-25 07:04:17 -07:00
RichardScottOZ f3a1e227fe intimately 2023-07-25 21:26:30 +09:30
Emma Eva 6ce91b1b3b Fixed time_in_ms() compile time error (termux and neoterm)
clang version 16.0.4
2023-07-25 12:12:40 +06:00
Andrej 98ec4ba23d Update README.md 2023-07-24 22:54:54 -07:00
Andrej 81c90bfcb7 Update README.md: small tweaks 2023-07-24 22:51:39 -07:00
Andrej cf625ecd7e Update README.md 2023-07-24 21:25:31 -07:00
Andrej Karpathy c3e0d73bd2 we can inference Meta's Llama 2 7B, yay 2023-07-25 04:21:07 +00:00
Andrej 133ad3ffff Merge pull request #50 from karpathy/memmap
candidate memmap implementation
2023-07-24 18:59:29 -07:00
Andrej Karpathy a1f6b4653e merge conflict resolve with imports 2023-07-25 01:58:46 +00:00
Andrej d18e9efd77 Merge pull request #48 from richinseattle/richinseattle-patch-1
MSVC Compatibility fix for timer
2023-07-24 16:37:37 -07:00
richinseattle b2857c6af2 Switch to using timespec_get() for cross OS compatibility 2023-07-24 16:31:38 -07:00
richinseattle f121f5f0c5 Merge branch 'karpathy:master' into richinseattle-patch-1 2023-07-24 16:30:07 -07:00
Andrej Karpathy cae88dfbab tune readme around timings etc 2023-07-24 23:27:48 +00:00
Andrej Karpathy 496466f78f add rundebug to makefile, useful for spotting issues and such 2023-07-24 23:13:59 +00:00
Andrej Karpathy e6e3f1322b candidate memmap implementation 2023-07-24 22:54:49 +00:00
richinseattle 2be7d7887b MSVC Compatibility fix for timer
use clock() instead of gettimeofday() for cross-platform compatibility
2023-07-24 15:22:20 -07:00
Andrej Karpathy 16edfe6364 add a simple makefile 2023-07-24 21:50:04 +00:00
Andrej bf9f6f2ece Add discord link to Readme 2023-07-24 14:22:29 -07:00
Andrej 669b75ddc8 Merge pull request #43 from krzysztof-jusiak/rmsnorm
Speed up rmsnorm by using sqrtf/expf
2023-07-24 14:13:49 -07:00
Andrej 687473c009 Update README.md with TinyStories model series 2023-07-24 14:11:27 -07:00
Andrej Karpathy 791be9d991 tweak argparse. fix steps=256, even if some models may support longer maximum seq_len. get rid of seed option for now, use temp=0.0 for deterministic behavior 2023-07-24 20:59:32 +00:00
Andrej Karpathy 90ae37c3e6 git push origin masterMerge branch 'admu-progvar-master' 2023-07-24 20:39:40 +00:00
Kris Jusiak c9b1f10124 Speed up rmsnorm by using sqrtf/expf
Problem:
- exp and sqrt are using double precision for operations which is not
  required.

Solution:
- Use expf and sqrtf intead.

Notes:
- Although it's using single precision doesn't seem to affect the
  result.

Results: ~ 10% improvement
  - before:  940 tok/s
  - after:  1020 tok/s
2023-07-24 13:06:27 -05:00
Franz Louis Cesista c9ad067c5d parallelize multi-head attention 2023-07-25 01:10:12 +08:00
Andrej Karpathy 50a086edde add warning about fastmath 2023-07-24 15:18:04 +00:00
Andrej Karpathy fff00ffd07 ack to lambda 2023-07-24 14:31:52 +00:00
Andrej d0ddf94cc3 Merge pull request #36 from hu-po/patch-1
typo
2023-07-24 07:27:36 -07:00
Andrej 228c4ea3ea Merge pull request #28 from SlyEcho/master
Fix tokenizer reading on Windows
2023-07-24 07:23:07 -07:00
Andrej Karpathy 624cdfc76a add dropout support to model 2023-07-24 14:18:50 +00:00
Andrej cdfb49208a Merge pull request #37 from awgu/pt2
Have DDP ignore `freqs_cis` to avoid broadcast
2023-07-24 07:15:40 -07:00
Andrej Karpathy 9055766cf6 docs on how to run with openmp 2023-07-24 14:08:06 +00:00
Andrej Karpathy cbbe4301b0 Merge branch 'krzysztof-jusiak-openmp' 2023-07-24 14:02:28 +00:00
Andrew Gu 25494f9cbc Have DDP ignore freqs_cis to avoid broadcast 2023-07-24 13:58:09 +00:00
hu-po d95c7617c6 typo 2023-07-24 07:35:12 -05:00