Skip to content

fused CUDA kernel for puzzlesim#6

Open
Arcanous98 wants to merge 2 commits into
nihermann:mainfrom
Arcanous98:fused_cuda
Open

fused CUDA kernel for puzzlesim#6
Arcanous98 wants to merge 2 commits into
nihermann:mainfrom
Arcanous98:fused_cuda

Conversation

@Arcanous98

Copy link
Copy Markdown

This PR adds new CUDA kernels to accelerate Puzzlesim

  • A set of custom CUDA kernels to accelerate key Puzzlesim operations (cosine similarity, bilinear upsampling) with arbitrary model support and reduced precission modes. Supports backward passes as well (~1.6x faster than the previous Pytorch backend)
  • A fully fused CUDA kernel, limited for now to the Squeezenet model backbone. Supports forward evaluation only (~3x faster than the Pytorch backend)

In the future, we could extend the fused kernel to support arbitrary architecture backbones and derivatives to use Puzzlesim as a loss function using this backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant