Update Hamiltonian simulation compilation tutorial to new template#4969
Update Hamiltonian simulation compilation tutorial to new template#4969henryzou50 wants to merge 15 commits into
Conversation
…utorial Replace the old notebook with a revised version that follows the tutorial template and focuses exclusively on Hamiltonian simulation circuits from the Hamlib benchmark collection (removing the EfficientSU2 section). Key changes: - Compare SABRE, AI transpiler, and Rustiq on Hamlib circuits split into small-scale (<20 qubits) and large-scale (>=20 qubits) groups - Filter out circuits exceeding backend qubit count or 5000 decomposed gates - Add styled summary tables with mean/stdev and % improvement over SABRE - Add per-circuit comparison tables with best-value highlighting - Improve plots: line charts by circuit index, % improvement over SABRE, grouped bar charts for best-performing method with tie tracking - Add mirror circuit execution for noise evaluation (Aer sim for small-scale, real hardware for large-scale) with survival probability metric - Revise all commentary to reflect benchmark observations
|
One or more of the following people are relevant to this code:
|
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
kaelynj
left a comment
There was a problem hiding this comment.
Left a handful of comments to address! I also think it'd be worth it here to include hardware results, since you're already generating the mirror circuits it'd be fairly easy to check.
| "In addition to these standard metrics, we also record the 2-qubit gate depth, which is a particularly important metric for evaluating execution on quantum hardware. Unlike total depth, which includes all gates, the 2-qubit depth more accurately reflects the circuit's*actual execution duration on hardware. This is because 2-qubit gates typically dominate the time and error budget in most quantum devices. As such, minimizing 2-qubit depth is critical for improving fidelity and reducing decoherence effects during execution.\n", | ||
| "\n", | ||
| "We will use this function to analyze the performance of the different compilation methods across multiple circuits." | ||
| "The following function transpiles a list of circuits using a given pass manager and records the key metrics (two-qubit depth, circuit size, and runtime) in a DataFrame." |
There was a problem hiding this comment.
We should avoid using a DataFrame to present the results. They don't render correctly on the platform
| "\n", | ||
| "### Two-qubit depth and gate count\n", | ||
| "\n", | ||
| "At large scale, SABRE and the AI transpiler produce similar results overall, with each having a slight edge in different areas: SABRE tends to achieve a slightly lower gate count on average, which aligns with how its heuristic is designed to minimize the number of inserted SWAP gates. The AI transpiler edges ahead slightly on two-qubit depth, consistent with the fact that part of its reinforcement learning training objective optimizes for circuit depth. Both methods are consistent and reliable across the full range of circuits.\n", |
There was a problem hiding this comment.
SABRE tends to achieve a slightly lower gate count on average, which aligns with how its heuristic is designed to minimize the number of inserted SWAP gates.
This isn't true from the plot. The AI transpiler has a slightly lower gate count on average. And Rustiq performs just about as well as the AI transpiler in terms of 2Q depth.
There was a problem hiding this comment.
Thanks for catching this! I dug into it and I think the discrepancy comes from the hardware mirror-circuit plot, which only shows a single circuit (one 26-qubit tfim case), and that one happens to be an outlier where SABRE's two-qubit depth is unusually high.
Looking at the aggregate charts instead (% improvement over SABRE and best-performing method by metric), the trend across all the large-scale circuits is:
- SABRE wins gate count on most circuits (~73%)
- AI wins two-qubit depth on most circuits (~64%)
- Rustiq is best on only a small share and isn't comparable to AI on 2Q depth at scale (its averages are dominated by a few large outliers)
So the original statement was actually correct for the large example circuits, I've kept it but reworded the section to make the per-metric split explicit and added a note clarifying that the mirror plot reflects just one circuit, not the overall results. I also explained this in my latest summary comment above, but let me know if there should be anything changed here.
…circuits.ipynb Co-authored-by: Kaelyn Ferris <43348706+kaelynj@users.noreply.github.com>
…circuits.ipynb Co-authored-by: Kaelyn Ferris <43348706+kaelynj@users.noreply.github.com>
…circuits.ipynb Co-authored-by: Kaelyn Ferris <43348706+kaelynj@users.noreply.github.com>
…circuits.ipynb Co-authored-by: Kaelyn Ferris <43348706+kaelynj@users.noreply.github.com>
…circuits.ipynb Co-authored-by: Kaelyn Ferris <43348706+kaelynj@users.noreply.github.com>
…circuits.ipynb Co-authored-by: Kaelyn Ferris <43348706+kaelynj@users.noreply.github.com>
…circuits.ipynb Co-authored-by: Kaelyn Ferris <43348706+kaelynj@users.noreply.github.com>
…utorial From kaelynj's review: - Remove the pandas DataFrames used to present results, which don't render on the docs platform. Results are now stored as a list of dicts and shown via plain-text printed tables (print_summary_table, print_per_circuit_ comparison), matching the AI transpiler tutorial. Removes pandas entirely and rewrites the three plot helpers in pure Python. - Use "fidelity" instead of the unfamiliar term "survival probability" throughout (prose, plot labels, and variables). - Clarify the large-scale analysis behind kaelynj's gate-count comment. Her note came from the single hardware mirror-circuit plot (one tfim circuit, an outlier where SABRE's depth is high). The aggregate best-performing- method and %-improvement charts show the actual trend, which was already the case: SABRE wins gate count on most circuits and the AI transpiler wins two-qubit depth on most. The analysis now states this explicitly and flags that the mirror plot reflects a single circuit, not the aggregate. Additional changes: - Reconcile the rest of the commentary with the re-run results, including the small-scale best-method analysis (all three methods close except AI on runtime; Rustiq a slight overall edge) and Rustiq's outlier behavior at large scale. - Convert absolute quantum.cloud.ibm.com doc links to relative paths for consistency with other tutorials. - Remove the link to the deprecated Qiskit Transpiler Service from Next steps.
|
Thanks for the review, @kaelynj! I've pushed changes addressing all of your comments:
While I was in there, I also:
Let me know if you'd like any further changes! |
Summary
Revised
compilation-methods-for-hamiltonian-simulation-circuits.ipynbto follow the Tutorial_Template structure, focusing exclusively on benchmarking SABRE, AI transpiler, and Rustiq compilation methods on Hamiltonian simulation circuits from the Hamlib collection.Key changes from the old notebook:
efficient_su2circuits. The revised version focuses entirely on Hamlib Hamiltonian simulation circuits built withPauliEvolutionGateibm_torinotoleast_busy()Tutorial structure: