Merge pull request #355 from janimo/export-vocab-size

Export vocab size and Code Llama usage docs
2023-08-26 13:24:55 -07:00
parent 49daf18f2f 604d3c59c0
commit e47bacdc62
2 changed files with 18 additions and 1 deletions
@@ -95,6 +95,22 @@ Then chat with it by specifying the chat mode using the `-m` flag, e.g.:
 ./run llama2_7b_chat.bin -m chat
 ```
 You can also try Meta's Code Llama models even if support for them is incomplete.
 Make sure to build the tokenizer for the plain and instruct variants and pass it when doing inference.
 ```bash
 python export.py codellama2_7b.bin --meta-llama /path/to/CodeLlama-7b
 python tokenizer.py --tokenizer-model=/path/to/CodeLlama-7b/tokenizer.model
 ./run codellama2_7b.bin -z /path/to/CodeLlama-7b/tokenizer.bin
 ```
 Chat with Code Llama Instruct:
 ```bash
 python export.py codellama2_7b_instruct.bin --meta-llama /path/to/CodeLlama-7b-Instruct
 python tokenizer.py --tokenizer-model=/path/to/CodeLlama-7b-Instruct/tokenizer.model
 ./run codellama2_7b_instruct.bin -m chat -z /path/to/CodeLlama-7b-Instruct/tokenizer.bin
 ## hugginface models
 We can load any huggingface models that use the Llama 2 architecture. See the script [export.py](export.py) and the `--hf` flag to export the model .bin file.
@@ -323,9 +323,10 @@ def load_meta_model(model_path):
    config.multiple_of = params["multiple_of"]
    config.norm_eps = params["norm_eps"]
-    config.vocab_size = 32000
+    config.vocab_size = state_dict['tok_embeddings.weight'].shape[0]
    config.max_seq_len = 2048
    # create a new Transformer object and set weights
    model = Transformer(config)