richinseattle
cddb05d5b2
use ssize_t/int64 and 64bit version of ftell on windows
2023-07-29 22:54:27 -07:00
Andrej Karpathy
fd68dd222f
reshuffle blocks of code a bit
2023-07-28 05:37:43 +00:00
aegkmq
6ce28fb388
Merge branch 'master' into better-rng
2023-07-28 13:52:34 +09:00
Andrej Karpathy
b4bb47bb7b
big change: adding prompting. many LOC, but critical. ty @atamurad for the first draft, i ended up tuning it quite a bit.
2023-07-28 04:12:54 +00:00
Andrej Karpathy
e5752e1fc9
strip leading whitespace
2023-07-27 22:59:19 +00:00
Andrej Karpathy
25b50ee0e2
small stylistic fixes and adjustments, fix bug in Makefile, and change the timing code to skip the first (slow) iteration
2023-07-27 22:42:08 +00:00
aegkmq
71200f3092
Fix random_f32
2023-07-28 00:35:59 +09:00
Aydyn Tairov
acf1e18e8f
remove second ifdefs for windows timing by introducing ported version of clock_gettime
2023-07-27 16:33:21 +01:00
aegkmq
1bdf5af743
Replace the rand() with a portable PRNG
2023-07-27 20:14:08 +09:00
Andrej Karpathy
f19f50a744
stylistic changes for the windows support ifdefs
2023-07-27 06:08:40 +00:00
richinseattle
4a6b7a471d
Include windows support header (for mmap)
2023-07-26 22:40:01 -07:00
Andrej Karpathy
0d18fa7780
Merge branch 'patch-2' of https://github.com/richinseattle/llama2.c into richinseattle-patch-2
2023-07-27 05:23:05 +00:00
richinseattle
37e8c20f4f
Windows compat: Use GetTickCount for delta timer
...
Intentionally not including a windows header here to avoid merge conflict on include with mmap support. cl.exe doesn't complain, mingw warns.
2023-07-26 22:19:49 -07:00
richinseattle
539dc73196
fix whitespace
2023-07-26 22:12:32 -07:00
richinseattle
7f7a3b2d56
update openmp pragmas for MSVC compatibility
...
This has no negative impact on Linux and is in preparation for windows support. Windows compiles will not work without additional timer and mmap compatibility patches
2023-07-26 22:06:23 -07:00
Bernardo Ramos
57034480b6
add some code comments
2023-07-26 19:48:14 -03:00
aegkmq
8986005f23
Minor cleanup
2023-07-26 16:57:08 +09:00
aegkmq
36c522a0d8
Improve locality
2023-07-26 13:24:27 +09:00
Andrej
614bf91e5d
Merge pull request #60 from emma-eva/patch-1
...
Fixed time_in_ms() compile time error (termux and neoterm)
2023-07-25 16:06:41 -07:00
Andrej Karpathy
05ee4cbf38
fix bug in timing - use steps not max seq len doh
2023-07-25 14:21:37 +00:00
Emma Eva
6ce91b1b3b
Fixed time_in_ms() compile time error (termux and neoterm)
...
clang version 16.0.4
2023-07-25 12:12:40 +06:00
Andrej Karpathy
c3e0d73bd2
we can inference Meta's Llama 2 7B, yay
2023-07-25 04:21:07 +00:00
Andrej Karpathy
a1f6b4653e
merge conflict resolve with imports
2023-07-25 01:58:46 +00:00
richinseattle
b2857c6af2
Switch to using timespec_get() for cross OS compatibility
2023-07-24 16:31:38 -07:00
Andrej Karpathy
e6e3f1322b
candidate memmap implementation
2023-07-24 22:54:49 +00:00
richinseattle
2be7d7887b
MSVC Compatibility fix for timer
...
use clock() instead of gettimeofday() for cross-platform compatibility
2023-07-24 15:22:20 -07:00
Andrej
669b75ddc8
Merge pull request #43 from krzysztof-jusiak/rmsnorm
...
Speed up rmsnorm by using sqrtf/expf
2023-07-24 14:13:49 -07:00
Andrej Karpathy
791be9d991
tweak argparse. fix steps=256, even if some models may support longer maximum seq_len. get rid of seed option for now, use temp=0.0 for deterministic behavior
2023-07-24 20:59:32 +00:00
Kris Jusiak
c9b1f10124
Speed up rmsnorm by using sqrtf/expf
...
Problem:
- exp and sqrt are using double precision for operations which is not
required.
Solution:
- Use expf and sqrtf intead.
Notes:
- Although it's using single precision doesn't seem to affect the
result.
Results: ~ 10% improvement
- before: 940 tok/s
- after: 1020 tok/s
2023-07-24 13:06:27 -05:00
Franz Louis Cesista
c9ad067c5d
parallelize multi-head attention
2023-07-25 01:10:12 +08:00
Andrej
d0ddf94cc3
Merge pull request #36 from hu-po/patch-1
...
typo
2023-07-24 07:27:36 -07:00
Andrej
228c4ea3ea
Merge pull request #28 from SlyEcho/master
...
Fix tokenizer reading on Windows
2023-07-24 07:23:07 -07:00
hu-po
d95c7617c6
typo
2023-07-24 07:35:12 -05:00
Henri Vasserman
4d637983ad
Fix tokenizer reading on Windows
2023-07-24 11:08:29 +03:00
Kris Jusiak
0a0ca73c65
[openmp] 1.5x inference speedup
...
Problem:
- clock is CPU and doesn't work properly with parallel execution.
- perf execution is matmul x weights bound.
Solution:
- use gettimeofday instead.
- utilize openmp to parallelize matmul.
Note:
- if not compiled with -fopenmp the #pragma is ignored and single
execution is performed.
- there are additional env variable to setup for openmp (optinally)
to setup the number of threads, scheduler etc.
Benchmarks:
```
clang -Ofast -march=native run.c -lm -o run // achieved tok/s: 340.878828
clang -Ofast -fopenmp -march=native run.c -lm -o run // achieved tok/s: 524.590164
```
2023-07-24 01:55:23 -05:00
Andrej Karpathy
f6388c99c8
delete the copy function in favor of memcpy. sadly we have to import string.h now...
2023-07-24 05:10:55 +00:00
Andrej Karpathy
99354a85ce
get rid of compiler warnings from ignoring return value of fread
2023-07-24 04:39:24 +00:00
Andrej Karpathy
6a61831e19
make init code much less sketchy
2023-07-24 04:22:32 +00:00
Andrej Karpathy
3bfa5665d1
delete the run_wrap file! yay. ty @python273 and @ggerganov for code snippets
2023-07-24 04:02:57 +00:00
Andrej Karpathy
d7e2c46915
slight tweaks to softmax
2023-07-24 02:02:12 +00:00
Andrej
7d6208870e
Merge pull request #19 from mcognetta/master
...
Simplify Softmax
2023-07-23 18:57:23 -07:00
Marco Cognetta
80679e24be
simplify softmax
2023-07-24 10:51:46 +09:00
Junny
dc3962f356
remove unused parameter
2023-07-23 15:33:29 -07:00
Andrej Karpathy
9414e7a45e
tweaks and add a simple test
2023-07-23 14:52:08 +00:00
Andrej Karpathy
5b161abb9a
somewhere ~20 hours later
2023-07-23 05:23:45 +00:00