Compare ring-flash-attention & ring-attention-pytorch

lucidrains & zhuzilin were hard working the last days and have completed the following two ring-attention implementations:

- [lucidrains/ring-attention-pytorch](https://github.com/lucidrains/ring-attention-pytorch)
- [zhuzilin/ring-flash-attention](https://github.com/zhuzilin/ring-flash-attention)

Create a test setup that verifies correctness and compares the performance of both solutions.

Phil decided to use a custom triton kernel. Find out why this kernel is used and if it is indeed faster than the cuda flash-attention 2.

Please generate a little report of your findings, either as markdown file or ipynb.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare ring-flash-attention & ring-attention-pytorch #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compare ring-flash-attention & ring-attention-pytorch #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions