Hi,
Congratulate for your outstanding work, which provide a comprehensive understanding about LLM QAT.
I noticed that 3-bit even performs worse than ternary quantization.
What is the potential reason behind this? Maybe due to the different architecture of ternary quantization?
Thank you!
Hi,
Congratulate for your outstanding work, which provide a comprehensive understanding about LLM QAT.
I noticed that 3-bit even performs worse than ternary quantization.
What is the potential reason behind this? Maybe due to the different architecture of ternary quantization?
Thank you!