Skip to content

Add environment variable checks for BLOCK_SIZE and CBLOCK_SIZE#7

Open
relic-yuexi wants to merge 1 commit intoOpenNLPLab:mainfrom
relic-yuexi:main
Open

Add environment variable checks for BLOCK_SIZE and CBLOCK_SIZE#7
relic-yuexi wants to merge 1 commit intoOpenNLPLab:mainfrom
relic-yuexi:main

Conversation

@relic-yuexi
Copy link

To avoid "OutOfResources" errors, two environment variables can be provided to adjust the related "BLOCK" and "CBLOCK" parameters. It seems that there is no existing solution in Triton, so we may consider trying different values for adjustment in future development. If an error occurs, you can attempt to reduce the values to -16, but it's important to note that I'm not sure about the specific impact of "BLOCK" and "CBLOCK" parameters, so it would require the project's author to consider how to make the appropriate adjustments.

example:

import torch
import time

import os
os.environ["BLOCK_SIZE"] = "32"
os.environ["CBLOCK_SIZE"] = "16"

from lightning_attn.ops import lightning_attn_func
from lightning_attn.utils import _build_slope_tensor

dtype = torch.bfloat16
device = torch.device("cuda")
b, h, n, d, e = 2, 6, 256, 192, 192

q = torch.randn((b, h, n, d), dtype=dtype, device=device).requires_grad_()
k = torch.randn((b, h, n, d), dtype=dtype, device=device).requires_grad_()
v = torch.randn((b, h, n, e), dtype=dtype, device=device).requires_grad_()
s = _build_slope_tensor(h).to(q.device).to(torch.float32)

print("start1")
start = time.time()
o = lightning_attn_func(q, k, v, s)
end = time.time()
print(end-start)
print(o.shape)
print("start2")
start = time.time()
loss = o.sum()
end = time.time()
print(end-start)
print(o.shape)
print("start3")
loss.backward()
end = time.time()
print(end-start)

@Doraemonzzz
Copy link
Collaborator

Thank you for your PR, BLOCK and CBLOCK are related to speed, and I am currently testing them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants