Remove duplicate throughput call and change string literal check to allow negative offsets by stefandesouza · Pull Request #120 · RRZE-HPC/OSACA

stefandesouza · 2026-01-19T16:20:10Z

The optimal throughput calculation is invoked twice during the inspect() routine which seems to be unecessary.

In the x86 Intel parser code, there seems to be a string literal check that should instead be a dictionary value conditional
if displacement and "operator_disp" == "-":
This results in instructions with negative memory offsets being parsed incorrectly, for example
"\tlea\trcx, OFFSET FLAT:??_R0M@8-4"

codecov · 2026-01-19T16:34:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.48%. Comparing base (60d725b) to head (e0d6273).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #120      +/-   ##
==========================================
- Coverage   90.48%   90.48%   -0.01%     
==========================================
  Files          25       25              
  Lines        3668     3667       -1     
==========================================
- Hits         3319     3318       -1     
  Misses        349      349

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pleroy · 2026-01-19T18:36:34Z

Could we have a test for the parser change? Something that doesn't parse before and parses after.

…pace as well

stefandesouza · 2026-01-20T09:11:20Z

Sure, that's fair. I added two tests of type identifier-offset and identifier+ -offset that fail to parse correctly before and successfully now.

pleroy · 2026-01-20T15:21:29Z

Thanks for adding the test.

I am a tad concerned by the removal of that call to assign_optimal_throughput. I think it would need a deeper analysis: that function adds semantic attributes to the instructions, and it's entirely conceivable that it would take two passes to do the full analysis.

I am not familiar with this part of the code, but I note that the second call was added by @JanLJL in commit 7724ce2 and the same commit added something call "multiple assignments". Do multiple assignments require multiples passes? Is the removal of the second call going to break Zen3?

Of course, if the repeated call was intentional, it would have been good to have a comment, but here we are. I hope that Jan remembers what is going on here.

JanLJL · 2026-01-22T15:24:23Z

Hi guys and sorry for not replying earlier!
The second assign_optimal_throughput was in fact there for a reason.
As OSACA increases or decreases the port pressure per instruction and without looking ahead, it might not get the optimal case in one go. A second run of (re-)distributing the throughput numbers is supposed to help balancing the overall throughput better and get to the global minima.
However, I tried various kernels in the past two days and could not find a single example in my code snippet folder, where the removal of the second assign run made a difference to the bottleneck, just some kernels where the port pressure of non-limiting ports (i.e., not performance relevant ports) changed by 0.01 cycles. So even though there was some thought behind it, I do not have a proof that it is still necessary for optimal port distribution. As for now, I would like to keep it in and do some more tests before removing it for good.

What Pascal mentioned with "multiple assignments" becomes relevant when we have different options for a port assigned, like an OR in the port mapping (contrary to the ANDs we usually see). This is written as dict with different (irrelevant) keys, e.g., smlal for A64FX in line 2600 of its machine model:

- name: [smlal, smlal2]
  operands:
  ...
  throughput: 8.0
  latency: 18.0
  port_pressure: {0: [[8, '0']], 1: [[8, '2']]}

This means, OSACA will schedule either full 8 cycles on port 0 or full 8 cycles on port 2.
This could be also modeled as port_pressure: [[16, '02']], but if there is already high pressure on one of the two ports this would effectively double the actual runtime and could lead to an incorrect shift of the bottleneck. However, while investigating this together with the two-times assign_optimal_throughput() call, I realized there was a bug in the way OSACA searches in a DFS-manner for the best solution, so thank you for bringing this up! I will commit this separately to this PR.

Finally, for the parsing of memory offsets, this looks very good to me, thank you for adding the test cases!
If it's OK for you and as mentioned above, I'd like to merge the parser changes and reject the throughput assignment change (for now).

stefandesouza · 2026-01-23T08:26:49Z

Sure, please go ahead:)

While I couldn't find an example where the second call is still necessary for better port balancing, we keep it in for now to check this more closely (see #120)

souza added 2 commits January 19, 2026 16:38

Remove duplicated call

5040543

Fix the string literal check to the dict key

385194c

souza added 2 commits January 20, 2026 09:58

Update parser to allow both postive and negative displacements

e739bcb

Add tests for negative offset directly after identifier and after a s…

e0d6273

…pace as well

stefandesouza changed the title ~~Remove duplicate throughput call and change string literal check~~ Remove duplicate throughput call and change string literal check to allow negative offsets Jan 20, 2026

JanLJL self-assigned this Jan 22, 2026

JanLJL merged commit ae54403 into RRZE-HPC:master Jan 23, 2026
6 checks passed

JanLJL added a commit that referenced this pull request Jan 23, 2026

Add second semantics.assign_optimal_throughput() call

1d5c1ed

While I couldn't find an example where the second call is still necessary for better port balancing, we keep it in for now to check this more closely (see #120)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove duplicate throughput call and change string literal check to allow negative offsets#120

Remove duplicate throughput call and change string literal check to allow negative offsets#120
JanLJL merged 4 commits into
RRZE-HPC:masterfrom
stefandesouza:master

stefandesouza commented Jan 19, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jan 19, 2026 •

edited

Loading

Uh oh!

pleroy commented Jan 19, 2026

Uh oh!

stefandesouza commented Jan 20, 2026 •

edited

Loading

Uh oh!

pleroy commented Jan 20, 2026

Uh oh!

JanLJL commented Jan 22, 2026

Uh oh!

stefandesouza commented Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stefandesouza commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pleroy commented Jan 19, 2026

Uh oh!

stefandesouza commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pleroy commented Jan 20, 2026

Uh oh!

JanLJL commented Jan 22, 2026

Uh oh!

stefandesouza commented Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stefandesouza commented Jan 19, 2026 •

edited

Loading

codecov Bot commented Jan 19, 2026 •

edited

Loading

stefandesouza commented Jan 20, 2026 •

edited

Loading