Hi,
In this block, the query, key, and value are being updated via _upad_input. Then in line 658, shouldn't the qkv be packed again?
https://github.com/cuda-mode/ring-attention/blob/d7aa7799bcc0191994d369b8cdebc7aebe1566b9/ring-llama/modeling_llama.py#L647-L664