atamyrat
d7704bdeaa
mark ModelArgs.hidden_dim as optional and calculate as previously if not provided
2023-08-21 03:40:34 +03:00
atamyrat
09db52c69e
Added huggingface model loader to export.py
2023-08-21 02:59:12 +03:00
Andrej Karpathy
7f551dbfd7
new model export: versions 0 (legacy) and 1
2023-08-19 18:25:20 +00:00
Andrej Karpathy
38bfac90a8
bigchange: add multiquery support in run.c. we can now train and inference multiquery models (where n_kv_heads < n_heads). this also means that we, in principle, support Llama 2 34B and 70B models, which are multiquery
2023-08-13 19:34:05 +00:00
Andrej Karpathy
b0cfa2458d
ok i can train and sample a model with a custom tokenizer
2023-08-11 16:47:29 +00:00
Nicolas Pinto
98b515e44d
FIX: model.generate()
...
This patch fixes a simple bug in `generate()` due to model's `forward()` only returning logits and not losses since `f2e34e6b0ac55accd6ba930a04c6f683f5158b29`.
2023-08-06 14:48:47 -07:00
Andrej Karpathy
a1037d79ee
turned on trimTrailingWhitespace in my vscode sorry about that
2023-08-05 22:46:35 +00:00
Andrej Karpathy
e03d7ecf12
Merge branch 'mpcusack/jitsave' of https://github.com/mpcusack/llama2.c into mpcusack-mpcusack/jitsave
2023-08-05 18:11:21 +00:00
Andrej Karpathy
837796e0b7
get rid of unneeded comment now
2023-08-05 16:19:27 +00:00
Michael Cusack
f8d45f180d
Reinline loss function
2023-08-04 17:21:29 +07:00
Michael Cusack
11a8348dfc
extra line
2023-08-04 16:52:04 +07:00
Michael Cusack
f2e34e6b0a
Resolve jit.save errors
2023-08-04 16:49:26 +07:00
rahulschand
02cf3c7311
Small changes to ROPE & comments
2023-08-03 20:13:50 +05:30
aidoge
883cda1a2c
fix freq_cos, freq_sin serialize
2023-08-01 16:31:43 +08:00
aidoge
36bf904c18
Refactor freqs_cis into freqs_cos and freqs_sin, and remove complex64 for ONNX export compatibility
2023-07-26 14:23:25 +08:00
Andrej Karpathy
f5650891d5
honestly at this point this is a lot more my nanogpt code than llama code
2023-07-25 23:57:03 +00:00
Andrej Karpathy
624cdfc76a
add dropout support to model
2023-07-24 14:18:50 +00:00
Andrew Gu
af3b5c0364
Register freqs_cis as non-persistent buffer
2023-07-24 03:18:20 +00:00
Andrej Karpathy
9414e7a45e
tweaks and add a simple test
2023-07-23 14:52:08 +00:00
Andrej Karpathy
5b161abb9a
somewhere ~20 hours later
2023-07-23 05:23:45 +00:00