small note on traing times

2023-07-26 22:12:50 +00:00
parent 2711ae8c32
commit f0f43b7288
1 changed files with 1 additions and 1 deletions
@@ -63,7 +63,7 @@ base models... ¯\\_(ツ)_/¯. Since we can inference the base model, it should

 ## models

-For the sake of examples of smaller, from-scratch models, I trained multiple models on TinyStories and catalogue them here:
+For the sake of examples of smaller, from-scratch models, I trained multiple models on TinyStories and catalogue them below. All of these trained in a few hours on my training setup (4X A100 40GB GPUs). The 110M took around 24 hours.

 | model | dim | n_layers | n_heads | max context length | parameters | val loss | download
 | --- | --- | --- | --- | --- | --- | --- | --- |