slightly tune todos
This commit is contained in:
@@ -230,16 +230,13 @@ If your candidate PRs have elements of these it doesn't mean they won't get merg
|
|||||||
|
|
||||||
## unsorted todos
|
## unsorted todos
|
||||||
|
|
||||||
- support Llama 2 7B Chat model and tune run.c to Chat UI/UX
|
- should calculate freq_cis online in the script run.c instead of loading them
|
||||||
|
- support Llama 2 7B Chat models and tune run.c to Chat UI/UX
|
||||||
- speed up 7B Llama 2 models sufficiently to work at interactive rates on Apple Silicon MacBooks
|
- speed up 7B Llama 2 models sufficiently to work at interactive rates on Apple Silicon MacBooks
|
||||||
- possibly include emscripten / web backend (as seen in @gg PR)
|
- investigate precisions other than just fp32: fp16, and quantization
|
||||||
- currently the project only runs in fp32, how easy would it be to different precisions?
|
- investigate running on other backends, especially GPUs
|
||||||
- look into quantization and what would be involved
|
- add multiquery support into run.c
|
||||||
- todo multiquery support? doesn't seem as useful for smaller models that run on CPU (?)
|
- (LoRA) finetuning and export of Llama 2 models
|
||||||
- todo support inferencing beyond max_seq_len steps, have to think through the kv cache
|
|
||||||
- why is MFU so low (~10%) on my A100 40GB for training?
|
|
||||||
- weird errors with torch.compile and wandb when using DDP
|
|
||||||
- (LoRA) finetuning of Llama 2 models
|
|
||||||
- make more better tests to decrease yolo
|
- make more better tests to decrease yolo
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|||||||
Reference in New Issue
Block a user