当输出超过限制时会出现卡死的情况

用Transformers库调用模型，max_new_tokens设置为20000，当生成长度为4096时会出现警告：This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (4096). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.
但是模型不会结束，会持续占用GPU，并且出现卡死的情况。