Skip to content

precompile in the current process#1730

Merged
shunting314 merged 1 commit intomainfrom
shunting314/stack/7
Mar 19, 2026
Merged

precompile in the current process#1730
shunting314 merged 1 commit intomainfrom
shunting314/stack/7

Conversation

@shunting314
Copy link
Copy Markdown
Contributor

@shunting314 shunting314 commented Mar 17, 2026

Stacked PRs:


precompile in the current process for distributed kernels.

    # The reason we need this is due to some tricky distributed kernels
    # like https://gist.github.com/shunting314/81f13ce00f835b21ab6466e21454b7c5 . We specialize the RANK argument for each GPU,
    # some rank may get out of resouce errors while others don't
    # due to the specialization.
    #
    # Without precompilation here, some rank may fail and skip running
    # the kernel while outer ranks waiting for its peers. It
    # results in a stuck job.
    #
    # Precompiilation happening in child process is not enough because
    # CUDA is not available there. We can not check resource usage
    # like shared-memory, tmem, max-threads etc.
    #
    # This precompilation has overhead. Only do it if distributed is
    # initialized.

Test is done in the next PR

@shunting314 shunting314 force-pushed the shunting314/stack/7 branch from 9a2e89d to 5885b44 Compare March 17, 2026 07:00
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 17, 2026
@shunting314 shunting314 requested review from jansel and yf225 March 17, 2026 07:02
stack-info: PR: #1730, branch: shunting314/stack/7
@shunting314 shunting314 marked this pull request as draft March 17, 2026 07:04
@shunting314 shunting314 force-pushed the shunting314/stack/7 branch from 5885b44 to d95c696 Compare March 17, 2026 07:04
@shunting314 shunting314 marked this pull request as ready for review March 17, 2026 07:04
@shunting314
Copy link
Copy Markdown
Contributor Author

test failures unrelated. The timeout one is not related since either autotuning is not enabled in the test or distributed is not initialized (thus the PR is a nop)

@shunting314 shunting314 marked this pull request as draft March 18, 2026 18:55
@shunting314 shunting314 marked this pull request as ready for review March 18, 2026 18:55
@shunting314 shunting314 marked this pull request as draft March 18, 2026 20:27
@shunting314 shunting314 marked this pull request as ready for review March 18, 2026 20:28
@shunting314 shunting314 marked this pull request as draft March 19, 2026 03:34
@shunting314 shunting314 marked this pull request as ready for review March 19, 2026 03:35
@shunting314 shunting314 marked this pull request as draft March 19, 2026 03:46
@shunting314 shunting314 marked this pull request as ready for review March 19, 2026 03:46
@shunting314 shunting314 merged commit 667ce4a into main Mar 19, 2026
19 of 21 checks passed
hinriksnaer pushed a commit to hinriksnaer/helion that referenced this pull request Mar 20, 2026
umechand-amd pushed a commit to umechand-amd/helion that referenced this pull request Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants