Skip to content

consolidate accuracy check APIs#1910

Open
shunting314 wants to merge 1 commit intomainfrom
shunting314/stack/29
Open

consolidate accuracy check APIs#1910
shunting314 wants to merge 1 commit intomainfrom
shunting314/stack/29

Conversation

@shunting314
Copy link
Copy Markdown
Contributor

@shunting314 shunting314 commented Apr 1, 2026

Stacked PRs:


There are 2 APIs to check accuracy

  1. the one used in BaseSearch would handle str inputs (like PROCESS_GROUP_NAME)
  2. the one added for custom check function can allow small percent of mismatch. Will feng added it. I found it useful for fp8 kernels.

Consolidate these 2 APIs so we only need maintain one.

@shunting314 shunting314 force-pushed the shunting314/stack/29 branch from c7c3e80 to e72af50 Compare April 1, 2026 19:17
@shunting314 shunting314 force-pushed the shunting314/stack/24 branch from 9ff9e88 to 546ff56 Compare April 1, 2026 19:17
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 1, 2026
@shunting314 shunting314 marked this pull request as draft April 1, 2026 19:19
@shunting314 shunting314 changed the base branch from shunting314/stack/24 to main April 1, 2026 19:19
@shunting314 shunting314 force-pushed the shunting314/stack/29 branch from e72af50 to 535386d Compare April 1, 2026 19:19
@shunting314 shunting314 changed the base branch from main to shunting314/stack/24 April 1, 2026 19:19
@shunting314 shunting314 marked this pull request as ready for review April 1, 2026 19:19
@shunting314 shunting314 marked this pull request as draft April 1, 2026 22:37
@shunting314 shunting314 changed the base branch from shunting314/stack/24 to main April 1, 2026 22:37
@shunting314 shunting314 force-pushed the shunting314/stack/29 branch from 535386d to b8beec1 Compare April 1, 2026 22:37
@shunting314 shunting314 changed the base branch from main to shunting314/stack/24 April 1, 2026 22:37
@shunting314 shunting314 marked this pull request as ready for review April 1, 2026 22:38
@shunting314 shunting314 marked this pull request as draft April 2, 2026 18:02
@shunting314 shunting314 changed the base branch from shunting314/stack/24 to main April 2, 2026 18:02
@shunting314 shunting314 force-pushed the shunting314/stack/29 branch from b8beec1 to bd61daf Compare April 2, 2026 18:02
@shunting314 shunting314 changed the base branch from main to shunting314/stack/28 April 2, 2026 18:03
@shunting314 shunting314 marked this pull request as ready for review April 2, 2026 18:03
@shunting314 shunting314 marked this pull request as draft April 2, 2026 18:10
@shunting314 shunting314 changed the base branch from shunting314/stack/28 to main April 2, 2026 18:10
@shunting314 shunting314 force-pushed the shunting314/stack/29 branch from bd61daf to 9649742 Compare April 2, 2026 18:10
@shunting314 shunting314 changed the base branch from main to shunting314/stack/26 April 2, 2026 18:11
@shunting314 shunting314 marked this pull request as ready for review April 2, 2026 18:11
@shunting314 shunting314 marked this pull request as draft April 2, 2026 18:36
@shunting314 shunting314 changed the base branch from main to shunting314/stack/26 April 2, 2026 18:36
@shunting314 shunting314 marked this pull request as ready for review April 2, 2026 18:36
@shunting314 shunting314 marked this pull request as draft April 6, 2026 20:48
@shunting314 shunting314 changed the base branch from shunting314/stack/26 to main April 6, 2026 20:48
@shunting314 shunting314 force-pushed the shunting314/stack/29 branch from e9841d2 to a94558d Compare April 6, 2026 20:48
@shunting314 shunting314 changed the title WIP: consolidate accuracy check APIs consolidate accuracy check APIs Apr 6, 2026
@shunting314 shunting314 marked this pull request as ready for review April 6, 2026 20:48
@shunting314 shunting314 requested review from jansel and yf225 April 6, 2026 20:49
@shunting314 shunting314 marked this pull request as draft April 6, 2026 21:34
@shunting314 shunting314 marked this pull request as ready for review April 6, 2026 21:34
stack-info: PR: #1910, branch: shunting314/stack/29
@shunting314 shunting314 marked this pull request as draft April 6, 2026 21:53
@shunting314 shunting314 force-pushed the shunting314/stack/29 branch from a94558d to 868686e Compare April 6, 2026 21:53
@shunting314 shunting314 marked this pull request as ready for review April 6, 2026 21:53
@shunting314 shunting314 marked this pull request as draft April 7, 2026 18:21
@shunting314 shunting314 marked this pull request as ready for review April 7, 2026 18:22
@shunting314 shunting314 marked this pull request as draft April 7, 2026 23:12
@shunting314 shunting314 marked this pull request as ready for review April 7, 2026 23:13
Copy link
Copy Markdown
Contributor

@jansel jansel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if we even need the alternate strategy. Where is it used?

Comment on lines +223 to +224
max_abs_diff: float | None = None,
max_rel_diff: float | None = None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from atol/rtol?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't use these 2 arguments and just copied over from the existing code. @yf225 is there any usage of them?

Comment on lines +260 to +265
mismatched = (abs_diff > atol + rtol * e.abs()).sum().item()
mismatch_pct = mismatched / total if total > 0 else 0.0
if mismatch_pct > max_mismatch_pct:
raise AssertionError(
f"Too many mismatches: {mismatch_pct:.4%} > {max_mismatch_pct:.4%}"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it semantically different to do this in a chunked way? You are measuring percentage different on a chunk not on the entire tensor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chunk is handled inside _chunked_assert_close. This part of code operates on the entire tensor though

@shunting314 shunting314 marked this pull request as draft April 8, 2026 18:16
@shunting314 shunting314 marked this pull request as ready for review April 8, 2026 18:16
@shunting314
Copy link
Copy Markdown
Contributor Author

I do wonder if we even need the alternate strategy. Where is it used?

The alternative strategy allowing small percent of mismatch was added here #1733 by @yf225 . I used it in a fp8 kernel. There is a small percent of mismatch. If we just raise tolerance, we need a quite large tolerance.

@jansel
Copy link
Copy Markdown
Contributor

jansel commented Apr 9, 2026

@yf225 can you comment on if we could remove this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants