20 Commits

Author SHA1 Message Date
atamyrat d7704bdeaa mark ModelArgs.hidden_dim as optional and calculate as previously if not provided 2023-08-21 03:40:34 +03:00
atamyrat 09db52c69e Added huggingface model loader to export.py 2023-08-21 02:59:12 +03:00
Andrej Karpathy 7f551dbfd7 new model export: versions 0 (legacy) and 1 2023-08-19 18:25:20 +00:00
Andrej Karpathy 38bfac90a8 bigchange: add multiquery support in run.c. we can now train and inference multiquery models (where n_kv_heads < n_heads). this also means that we, in principle, support Llama 2 34B and 70B models, which are multiquery 2023-08-13 19:34:05 +00:00
Andrej Karpathy b0cfa2458d ok i can train and sample a model with a custom tokenizer 2023-08-11 16:47:29 +00:00
Nicolas Pinto 98b515e44d FIX: model.generate()
This patch fixes a simple bug in `generate()` due to model's `forward()` only returning logits and not losses since `f2e34e6b0ac55accd6ba930a04c6f683f5158b29`.
2023-08-06 14:48:47 -07:00
Andrej Karpathy a1037d79ee turned on trimTrailingWhitespace in my vscode sorry about that 2023-08-05 22:46:35 +00:00
Andrej Karpathy e03d7ecf12 Merge branch 'mpcusack/jitsave' of https://github.com/mpcusack/llama2.c into mpcusack-mpcusack/jitsave 2023-08-05 18:11:21 +00:00
Andrej Karpathy 837796e0b7 get rid of unneeded comment now 2023-08-05 16:19:27 +00:00
Michael Cusack f8d45f180d Reinline loss function 2023-08-04 17:21:29 +07:00
Michael Cusack 11a8348dfc extra line 2023-08-04 16:52:04 +07:00
Michael Cusack f2e34e6b0a Resolve jit.save errors 2023-08-04 16:49:26 +07:00
rahulschand 02cf3c7311 Small changes to ROPE & comments 2023-08-03 20:13:50 +05:30
aidoge 883cda1a2c fix freq_cos, freq_sin serialize 2023-08-01 16:31:43 +08:00
aidoge 36bf904c18 Refactor freqs_cis into freqs_cos and freqs_sin, and remove complex64 for ONNX export compatibility 2023-07-26 14:23:25 +08:00
Andrej Karpathy f5650891d5 honestly at this point this is a lot more my nanogpt code than llama code 2023-07-25 23:57:03 +00:00
Andrej Karpathy 624cdfc76a add dropout support to model 2023-07-24 14:18:50 +00:00
Andrew Gu af3b5c0364 Register freqs_cis as non-persistent buffer 2023-07-24 03:18:20 +00:00
Andrej Karpathy 9414e7a45e tweaks and add a simple test 2023-07-23 14:52:08 +00:00
Andrej Karpathy 5b161abb9a somewhere ~20 hours later 2023-07-23 05:23:45 +00:00