Added new memory efficient conversion script for hf to ggml format, tested on bloomz 176B + Added token conversion script to convert from tokenizer.json format to tokenizer.model#867
Conversation
…o tokenizer.model format, tested with bigscience models
|
I've added a helper script as well in 74b92ff This script is more memory efficient than the existing convert-hf-to-ggml.py ; I was able to use it to convert bloomz-176b to float16 ggml format with under 64GB of ram utilization. |
|
What is the runtime memory usage of the converted bloomz-176b model? |
I believe its around 340GB of ram, you'd need to run it with bloomz.cpp (this particular fork: NouamaneTazi/bloomz.cpp#21) which doesn't have mmap at the moment. I only have around ~96GB of ram, so I've not been able to run the model yet, I've been working on a quantizer as well to quantize it to int4 / q4_0 which is going well, but I suspect it may still not be enough. |
Are you aware that @comex has already written a new conversion script for converting HF to GGML? It has been approved for merge but hasn't been merged yet. It can be seen here: #545 So you might want to compare your new convert script to that, rather than the original script provided in llama.cpp currently.
Thanks very much for the tokenizer.json conversion script! I recently hoped to convert my GPTQ 4-bit version of GeorgiaTechResearch/Galpaca 30B (an OPT model) to GGML. My model repo is: huggingface/galpaca-30B-GPTQ-4bit-128g I couldn't use comex's convert.py due to lack of tokenizer.model. I tried your script and it seemed to work to produce a tokenizer.model: I have no idea if it's even possible to try and convert an OPT model to GGML, but I thought I'd give it a try anyway! Unfortunately I still can't convert the model. comex's convert.py fails on the new tokenizer.model file: I tried your conversion script as well, but I can't get it working on any model. Trying it with locally downloaded HF model: I get the same error if I try it with a remote model: |
|
@TheBloke Thanks for linking me to @comex's script ; my script uses the HuggingFace library to handle the tokenization : ( https://github.com/huggingface/tokenizers ) , and assumes Byte-Pair Encoding by default- see: https://huggingface.co/docs/transformers/tokenizer_summary Comex's script seems to be using SentencePiece (https://arxiv.org/pdf/1808.06226.pdf) which is a different tokenizer. I had created it with the hope of using bloom models (which currently llama.cpp doesn't support) see: #452 and it works for that purpose. In reality the conversion script would probably have to support all common tokenizers in order to work for each |
…ordPiece tokenizers, updated arguments
|
I just updated the token conversion script to add support for "SentencePiece" and "WordPiece" tokenizers, Usage has been updated to:
eg:
@TheBloke I'm not sure if this will help with your quest of converting your OPT model into a GGML model, but I thought I'd tag you anyways. EDIT: Actually, looks like SentencePiece specifically isn't supported by HuggingFace's library, I'm taking a look to see... |
|
I've tried this on Bloomz mt0-xl: Am I doing anything wrong? |
|
@aidaho It looks like that model is using T5Tokenizer(https://huggingface.co/bigscience/mt0-xl/blob/main/tokenizer_config.json) which is not supported by this script, which only supports BPE and WordPiece at the moment.. |
Converts tokenizer.json to tokenizer.model format, tested with bigscience model (eg https://huggingface.co/bigscience/bloomz), usage like:
python3 tokenconvert.py ./ad033898-d849-41a1-9ecd-ad24e016bc4f/bloomz