Added new memory efficient conversion script for hf to ggml format, tested on bloomz 176B + Added token conversion script to convert from tokenizer.json format to tokenizer.model by akumaburn · Pull Request #867 · ggml-org/llama.cpp

akumaburn · 2023-04-09T22:36:09Z

Converts tokenizer.json to tokenizer.model format, tested with bigscience model (eg https://huggingface.co/bigscience/bloomz), usage like:

python3 tokenconvert.py ./ad033898-d849-41a1-9ecd-ad24e016bc4f/bloomz

…o tokenizer.model format, tested with bigscience models

akumaburn · 2023-04-10T21:24:08Z

I've added a helper script as well in 74b92ff

This script is more memory efficient than the existing convert-hf-to-ggml.py ; I was able to use it to convert bloomz-176b to float16 ggml format with under 64GB of ram utilization.

bil-ash · 2023-04-11T02:18:39Z

What is the runtime memory usage of the converted bloomz-176b model?

akumaburn · 2023-04-11T13:45:01Z

What is the runtime memory usage of the converted bloomz-176b model?

I believe its around 340GB of ram, you'd need to run it with bloomz.cpp (this particular fork: NouamaneTazi/bloomz.cpp#21) which doesn't have mmap at the moment.

I only have around ~96GB of ram, so I've not been able to run the model yet, I've been working on a quantizer as well to quantize it to int4 / q4_0 which is going well, but I suspect it may still not be enough.

TheBloke · 2023-04-12T08:45:47Z

I've added a helper script as well in 74b92ff

This script is more memory efficient than the existing convert-hf-to-ggml.py ; I was able to use it to convert bloomz-176b to float16 ggml format with under 64GB of ram utilization.

Are you aware that @comex has already written a new conversion script for converting HF to GGML? It has been approved for merge but hasn't been merged yet. It can be seen here: #545

So you might want to compare your new convert script to that, rather than the original script provided in llama.cpp currently.

Converts tokenizer.json to tokenizer.model format, tested with bigscience model (eg https://huggingface.co/bigscience/bloomz), usage like:

Thanks very much for the tokenizer.json conversion script! I recently hoped to convert my GPTQ 4-bit version of GeorgiaTechResearch/Galpaca 30B (an OPT model) to GGML. My model repo is: huggingface/galpaca-30B-GPTQ-4bit-128g I couldn't use comex's convert.py due to lack of tokenizer.model.

I tried your script and it seemed to work to produce a tokenizer.model:

tomj@Eddie ~/src $ python tokenconvert.py huggingface/galpaca-30B-GPTQ-4bit-128g
/Users/tomj/src/tokenconvert.py:38: DeprecationWarning: Deprecated in 0.9.0: BPE.__init__ will not create from files anymore, try `BPE.from_file` instead
  tokenizer = Tokenizer(models.BPE(vocab_file.name, merges_file.name))
Saving.. tokenizer.model to huggingface/galpaca-30B-GPTQ-4bit-128g/tokenizer.model
Saved tokenizer.model to huggingface/galpaca-30B-GPTQ-4bit-128g/tokenizer.model

I have no idea if it's even possible to try and convert an OPT model to GGML, but I thought I'd give it a try anyway!

Unfortunately I still can't convert the model. comex's convert.py fails on the new tokenizer.model file:

tomj@Eddie ~/src $ python ./convert.py huggingface/galpaca-30B-GPTQ-4bit-128g/galpaca-30B-4bit-128g.no-act-order.pt --outfile huggingface/galpaca-30B-GPTQ-4bit-128g/galpaca-30B-4bit-128g.GGML.bin
Loading model file huggingface/galpaca-30B-GPTQ-4bit-128g/galpaca-30B-4bit-128g.no-act-order.pt
Loading vocab file huggingface/galpaca-30B-GPTQ-4bit-128g/tokenizer.model
Traceback (most recent call last):
  File "/Users/tomj/src/./convert.py", line 1053, in <module>
    main()
  File "/Users/tomj/src/./convert.py", line 1042, in main
    vocab = load_vocab(vocab_dir)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tomj/src/./convert.py", line 990, in load_vocab
    return SentencePieceVocab(path, added_tokens_path if added_tokens_path.exists() else None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tomj/src/./convert.py", line 125, in __init__
    self.sentencepiece_tokenizer = SentencePieceProcessor(str(fname_tokenizer))
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sentencepiece/__init__.py", line 447, in Init
    self.Load(model_file=model_file, model_proto=model_proto)
  File "/usr/local/lib/python3.11/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)

I tried your conversion script as well, but I can't get it working on any model.

Trying it with locally downloaded HF model:

tomj@Eddie ~/src $ ll huggingface/koala-7B-HF
total 26323264
drwxr-xr-x  13 tomj  staff   416B  7 Apr 13:27 .
drwxr-xr-x  14 tomj  staff   448B 12 Apr 09:52 ..
drwxr-xr-x  13 tomj  staff   416B  7 Apr 16:07 .git
-rw-r--r--   1 tomj  staff   1.4K  7 Apr 13:20 .gitattributes
-rw-r--r--   1 tomj  staff   2.1K  7 Apr 13:20 README.md
-rw-r--r--   1 tomj  staff   507B  7 Apr 13:20 config.json
-rw-r--r--   1 tomj  staff   137B  7 Apr 13:20 generation_config.json
-rw-r--r--   1 tomj  staff   9.3G  7 Apr 13:27 pytorch_model-00001-of-00002.bin
-rw-r--r--   1 tomj  staff   3.3G  7 Apr 13:24 pytorch_model-00002-of-00002.bin
-rw-r--r--   1 tomj  staff    26K  7 Apr 13:20 pytorch_model.bin.index.json
-rw-r--r--   1 tomj  staff     2B  7 Apr 13:20 special_tokens_map.json
-rw-r--r--   1 tomj  staff   488K  7 Apr 13:21 tokenizer.model
-rw-r--r--   1 tomj  staff   141B  7 Apr 13:20 tokenizer_config.json

tomj@Eddie ~/src $ ~/anaconda3/envs/torch21/bin/python ./convert-hf-to-ggml-v2.py huggingface/koala-7B-HF ./koala-7B-test
Loading model:  huggingface/koala-7B-HF
Traceback (most recent call last):
  File "/Users/tomj/src/./convert-hf-to-ggml-v2.py", line 61, in <module>
    fout.write(struct.pack("i", hparams["n_head"]))
KeyError: 'n_head'

tomj@Eddie ~/src $ ~/anaconda3/envs/torch21/bin/python -V
Python 3.10.10

I get the same error if I try it with a remote model:

tomj@Eddie ~/src $ ~/anaconda3/envs/torch21/bin/python  convert-hf-to-ggml-v2.py --debug "sallywww/Llama-7B" ./koala-13B-test
Downloading (…)okenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 264/264 [00:00<00:00, 48.2kB/s]
Downloading tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 1.75MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 3.00/3.00 [00:00<00:00, 2.70kB/s]
Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 578/578 [00:00<00:00, 478kB/s]
Loading model:  sallywww/Llama-7B
Traceback (most recent call last):
  File "/Users/tomj/src/convert-hf-to-ggml-v2.py", line 61, in <module>
    fout.write(struct.pack("i", hparams["n_head"]))
KeyError: 'n_head'

akumaburn · 2023-04-13T10:29:31Z

@TheBloke Thanks for linking me to @comex's script ; my script uses the HuggingFace library to handle the tokenization : ( https://github.com/huggingface/tokenizers ) , and assumes Byte-Pair Encoding by default- see: https://huggingface.co/docs/transformers/tokenizer_summary

Comex's script seems to be using SentencePiece (https://arxiv.org/pdf/1808.06226.pdf) which is a different tokenizer.

I had created it with the hope of using bloom models (which currently llama.cpp doesn't support) see: #452 and it works for that purpose.

In reality the conversion script would probably have to support all common tokenizers in order to work for each

…ordPiece tokenizers, updated arguments

akumaburn · 2023-04-13T11:11:15Z

I just updated the token conversion script to add support for "SentencePiece" and "WordPiece" tokenizers, Usage has been updated to:

python3 tokenconvert.py TokenizerType InDIR [OutDir] where TokenizerType can be one of ["BPE","WordPiece","SentencePiece"]

eg:

python3 tokenconvert.py BPE ./ad033898-d849-41a1-9ecd-ad24e016bc4f/bloomz

@TheBloke I'm not sure if this will help with your quest of converting your OPT model into a GGML model, but I thought I'd tag you anyways.

EDIT: Actually, looks like SentencePiece specifically isn't supported by HuggingFace's library, I'm taking a look to see...

aidaho · 2023-04-18T07:21:45Z

I've tried this on Bloomz mt0-xl:

(v:llama.cpp) aidaho@lin:~/bin/llama.cpp$ python3 tokenconvert.py BPE /media/aidaho/blue/llm-files/mt0-xl/
Traceback (most recent call last):
  File "/home/aidaho/bin/llama.cpp/tokenconvert.py", line 87, in <module>
    tokenizer = load_tokenizer_from_json(input_json_path, special_tokens_map_path, tokenizer_config_path, tokenizer_type)
  File "/home/aidaho/bin/llama.cpp/tokenconvert.py", line 34, in load_tokenizer_from_json
    merges = model_data["merges"]
KeyError: 'merges'
Code: 1

Am I doing anything wrong?

akumaburn · 2023-04-18T18:33:28Z

@aidaho It looks like that model is using T5Tokenizer(https://huggingface.co/bigscience/mt0-xl/blob/main/tokenizer_config.json) which is not supported by this script, which only supports BPE and WordPiece at the moment..

Added token conversion script to convert from tokenizer.json format t…

a78c42d

…o tokenizer.model format, tested with bigscience models

akumaburn mentioned this pull request Apr 9, 2023

Add support for running bloom models #452

Closed

Add helper script to convert hf (pytorch) models into ggml format

74b92ff

Updated tokenconvert.py script to add support for SentencePiece and W…

7c8ee5a

…ordPiece tokenizers, updated arguments

Add sentencepiece processor

9020757

akumaburn closed this by deleting the head repository May 3, 2023

ngxson mentioned this pull request Sep 6, 2024

ggml : fix missing cpu_set_t on emscripten #9336

Merged

4 tasks

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added new memory efficient conversion script for hf to ggml format, tested on bloomz 176B + Added token conversion script to convert from tokenizer.json format to tokenizer.model#867

Added new memory efficient conversion script for hf to ggml format, tested on bloomz 176B + Added token conversion script to convert from tokenizer.json format to tokenizer.model#867
akumaburn wants to merge 4 commits into
ggml-org:masterfrom
akumaburn:feature/tokenconvert

akumaburn commented Apr 9, 2023

Uh oh!

akumaburn commented Apr 10, 2023 •

edited

Loading

Uh oh!

bil-ash commented Apr 11, 2023

Uh oh!

akumaburn commented Apr 11, 2023

Uh oh!

TheBloke commented Apr 12, 2023 •

edited

Loading

Uh oh!

akumaburn commented Apr 13, 2023 •

edited

Loading

Uh oh!

akumaburn commented Apr 13, 2023 •

edited

Loading

Uh oh!

aidaho commented Apr 18, 2023

Uh oh!

akumaburn commented Apr 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

akumaburn commented Apr 9, 2023

Uh oh!

akumaburn commented Apr 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bil-ash commented Apr 11, 2023

Uh oh!

akumaburn commented Apr 11, 2023

Uh oh!

TheBloke commented Apr 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akumaburn commented Apr 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akumaburn commented Apr 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aidaho commented Apr 18, 2023

Uh oh!

akumaburn commented Apr 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

akumaburn commented Apr 10, 2023 •

edited

Loading

TheBloke commented Apr 12, 2023 •

edited

Loading

akumaburn commented Apr 13, 2023 •

edited

Loading

akumaburn commented Apr 13, 2023 •

edited

Loading