feat(decompiler): Rec 31 #31-3 RAII Stage 2B — XmlScan::lvalue unique_ptr migration#51
Merged
Merged
Conversation
CodeQL's autobuilder has been failing on every PR (including doc-only ones — see #44, #47, #48) with: cpp/autobuilder: Incompatible operating system (expected Windows). cpp/autobuilder: No supported build system detected. The autobuilder scans the repo root for a recognized build file (Makefile, CMakeLists.txt, etc.); the decompiler's Makefile lives at Ghidra/Features/Decompiler/src/decompile/cpp/, so the scan fails and the c-cpp matrix leg goes red even when no C++ source is touched. The actual Java/Kotlin, Actions, and Python legs have always been fine; only c-cpp was affected. Fix: replace the `github/codeql-action/autobuild@v3` step with an explicit `make libdecomp_dbg.a` invocation that cd's into the decompiler tree. libdecomp_dbg.a is the static-archive target that compiles every LIBDECOMP_NAMES source into com_dbg/*.o then `ar qc`s them; it does NOT need BFD at link time (no -lbfd dependency), so binutils-dev / libiberty-dev drop out of the apt-get list too. CodeQL's tracer picks up the .o compile commands during the make invocation, which is exactly the input static analysis needs. The autobuild step stays for the `actions` and `python` matrix legs (it works fine for both — actions is just YAML, python doesn't need a build at all). Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_ptr migration
Converts XmlScan's per-token string buffer from a raw owning pointer
to unique_ptr<string>. This eliminates 7 raw `new string()` allocation
sites (one per scan* mode helper) and the manual `delete` in
~XmlScan / clearlvalue, replacing them with make_unique + the unique_ptr
destructor's automatic cleanup. The single ownership-transfer point
(lval(), called by bison's yylex to hand the string up to the parser
value stack) uses unique_ptr::release() so the bison side's existing
raw-pointer ownership convention is unchanged.
Before:
class XmlScan {
string *lvalue; // raw owning pointer
...
string *lval(void) {
string *ret = lvalue;
lvalue = (string *)0; // manual null-out
return ret;
}
void clearlvalue(void) {
if (lvalue != (string *)0) // manual null-check
delete lvalue; // manual delete
}
};
// Allocate (×7 across scan* helpers):
lvalue = new string();
After:
class XmlScan {
unique_ptr<string> lvalue; // owning RAII handle
...
string *lval(void) {
return lvalue.release(); // ownership transfer; lvalue → nullptr
}
void clearlvalue(void) {
lvalue.reset(); // null-safe, automatic delete
}
};
// Allocate (×7):
lvalue = make_unique<string>();
Bison sync: xml.y is the grammar source; xml.cc is the generated
parser. The edits are entirely in the epilogue / prologue %{...%}
regions that bison copies verbatim, so both files get the identical
hand-edit and stay in lockstep without invoking bison. The repo's
xml.cc was generated by bison 3.0.4; the local box has 3.8.2 and a
clean regeneration would produce huge unrelated diff — keeping the
parallel hand-edit avoids that. xml.hh picks up <memory> and the
make_unique / move / unique_ptr usings alongside the existing std::
aliases.
Scope intentionally limited:
- Semantic-action raw `new`s (xml.y:150, 153, 198, 200, 208 and the
parallel sites in xml.cc's yyparse) are NOT touched in this PR.
They live inside bison's table-driven yyparse and a clean
migration requires actual bison-3.0.4 regeneration; that's a
separate follow-up PR (Stage 2C).
- The big-object epilogue raw new's (line 525 `new XmlScan`,
line 538 `new Element`, line 624 `new Document`) require API
redesign across xml.hh + callers and are separate PRs of their
own.
- cppRaiiAudit's PROTECTED_FILES does NOT yet include xml.cc /
xml.hh / xml.y — adding them now would fail the gate on the
still-present semantic-action raw new's. Add after Stage 2C
lands.
Tests: this is functionally a no-op refactor. The behavior change
is "manual delete replaced by unique_ptr destructor"; both forms
free the string on exactly the same code paths. Marshal RAII Stage
2A (PR #46) had the same shape and passed C++ unit tests + ASan +
UBSan.
Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 26, 2026
CryptoJones
added a commit
that referenced
this pull request
May 26, 2026
…ed + std::span deviation (#54) The Sprint 6 row for Rec 31 #31-3 (RAII Stage 2) + Rec 32 #32-4 (std::span adoption) was a single open checkbox that no longer matched reality: - marshal.cc RAII landed via PR #46 (Stage 2A). - marshal std::span did NOT land — deviation from the documented "same files, same PR" plan in docs/decompiler/CPP20_ADOPTION.md. marshal's public API is [start, end) pointer-pair ranges, not the (T*, size_t) shape that std::span naturally replaces; whether there's a natural std::span site here at all is now an explicit open question rather than implicit slippage. - xml.y / xml.cc RAII is in flight as a multi-PR thread (PRs #51 + #52 stacked); bigger pieces still pending. - xml std::span is open and best audited alongside the bison %union redesign needed for the semantic-action sites. Replaced the single open checkbox with four bullets recording each piece's current status. Surfaced during the 2026-05-26 self-audit. Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/ Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CryptoJones
added a commit
that referenced
this pull request
May 26, 2026
…gration (#71) Closes audit finding #8 from the 2026-05-26 self-audit by replacing the "Stage 2C in flight" hand-wave with an honest scoping doc. The original assumption was "regenerate xml.cc with bison-3.0.4 to apply RAII to the semantic actions." Investigation showed: - bison-3.0.4 doesn't build on modern glibc (gnulib fseterr.c portability bug). bison-3.0.5 builds cleanly and produces an ~615-line mostly-cosmetic diff against the in-tree xml.cc. - The real blocker is not the bison version. The %union holds raw pointers (string*, Attributes*, NameValue*) and a C-style union can't hold non-trivially-destructible types like unique_ptr. - Migrating to RAII therefore requires switching xml.y to bison's C++ variant mode (`%define api.value.type variant`), which makes yyparse a method of an xml::parser class, changes the yylex contract, and wholesale-rewrites the generated xml.cc. That's a strategic sprint, not a multi-PR thread. This doc lays out: - what's left after Stage 2B (the seven specific raw-new sites and their exact line numbers in xml.y and xml.cc); - why hand-edit-parallel (the 2B technique) doesn't extend to these sites (they cross the %union boundary); - two architectural options: A. switch to `%define api.value.type variant` — clean RAII, wholesale rewrite of xml.cc, full xml_parse shim needed; B. keep %union of raw pointers, treat the five bison-value-stack sites as a documented exception in cppRaiiAudit, and clean up only the one obvious code-smell (a heap-allocated stack-scoped temporary in xml.y:208); - recommendation: B for the next PR (small, shipping-ready), A for a future strategic sprint; - a four-step plan ordering Stage 2C-min, the Element parse-tree ownership migration, the Document return-value migration, and the eventual Option A sprint. Updates docs/decompiler/RAII_MIGRATION.md's #31-3 row to point at the new design doc and to reflect what's shipped (#46 marshal) vs. in-flight (#51, #52 xml epilogue) vs. scoped (the new doc). No code change in this PR — design / scoping only. Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/ Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CryptoJones
added a commit
that referenced
this pull request
May 26, 2026
…n work (#72) CHANGELOG.md's [Unreleased] section wasn't updated as PRs landed throughout the day. Adding a single dated section (2026-05-26) that records the 28 PRs merged this session, grouped by Rec: - Rec 28: ignoreAudit Stage 2 strict, 17 author-declared-not-a- regression-test deletions, tracking-issue re-file, inventory honesty refresh. - Rec 31: cppRaiiAudit per-file gate (Stage 1), marshal RAII Stage 2A, Stage 2C design doc. - Rec 13/14: OSS-Fuzz primary_contact fill-in + in-tree/upstream sync + upstream PR (google/oss-fuzz#15545) submitted. - CI / housekeeping: sync-labels live mode, 26-branch sweep. - Doc sync: SprintPlanning marshalshipped + std::span deviation. Also noted the three in-flight PRs (#50/#51/#52) that landed-as-CI but didn't merge yet, so they appear as "queued" rather than as shipped work. Also fixes a stale "Work toward v26.1.10" header — v26.1.10 already shipped (per the Released section); [Unreleased] is now toward v26.1.11. Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/ Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 26, 2026
CryptoJones
added a commit
that referenced
this pull request
May 26, 2026
PR #51's branch was inadvertently based on the in-flight PR #50 CodeQL-fix branch (not master), so PR #51's squash-merge included PR #50's broken first commit alongside the intended lvalue RAII change. Master at f41d8fc ended up with the broken CodeQL config; PR #50 couldn't merge as-is; PR #52 (xml global_scan, stacked on PR #51) also auto-closed when its base disappeared. Mitigation: - PR #74 cherry-picks PR #50's second commit (binutils-dev fix) onto current master cleanly. - PR #73 cherry-picks PR #52's global_scan commit onto current master cleanly. Adds an entry to Apologies.md at the top (per the log policy) recording cause + downstream damage + mitigation. Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/ Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CryptoJones
added a commit
that referenced
this pull request
May 26, 2026
…unique_ptr (#73) Stacked PR on top of the XmlScan::lvalue migration (PR #51). Same hand-edit pattern: epilogue change in xml.y + matching parallel edit in xml.cc, no bison regeneration required. Converts xml_parse()'s pairing of `global_scan = new XmlScan(i)` / `delete global_scan;` to a scope-bound unique_ptr that holds ownership across the yyparse call: auto scan = make_unique<XmlScan>(i); global_scan = scan.get(); // raw observer pointer for grammar actions ... int4 res = yyparse(); ... global_scan = (XmlScan *)0; // null observer before scan goes out of scope return res; `global_scan` stays a `static XmlScan *` (raw observer) because it's accessed from yyparse / yylex / grammar semantic actions — those see it through the symbol declared at file scope, and changing it to unique_ptr would require changing every accessor. The observer pattern keeps the surface area minimal: only xml_parse owns; everyone else just borrows during the parse. The explicit `global_scan = nullptr` before return is defensive — if anything ever tries to dereference global_scan after xml_parse exits, crash on null is far better than use-after-free on dangling. After this PR, the only raw `new`s remaining in the xml.y / xml.cc epilogue are: - line 538 `new Element(cur)` — parse-tree node allocation, owned by parent's children list; needs Element class field refactor. - line 624 `new Document()` — returned across xml_tree() API to XmlDecode; needs xml.hh + XmlDecode coordination. Both are separate PRs. The bison semantic-action raw `new`s (xml.y:150, 153, 198, 200, 208) remain blocked on `%union` redesign. Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/ Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CryptoJones
added a commit
that referenced
this pull request
May 26, 2026
…leased] (#76) The catch-up changelog PR (#72) listed #50, #51, #52 as in-flight. Now resolved: - #51 (lvalue) — merged (was the lone in-flight item that landed cleanly). - #50 (CodeQL fix) — superseded by #74 after the stacking mistake. #74 landed and Analyze (c-cpp) now passes on master. - #52 (global_scan) — superseded by #73 after the same stacking mistake. #73 landed. - #75 (Apologies) — landed alongside, recording the chain. Removes the "in flight" footnote and replaces with a paragraph explaining the chain of events so readers understand why #50 / #52 are absent from the merged ledger and #73 / #74 are present covering the same scope. Aaron's per-PR changelog feedback (feedback_changelog_per_pr.md) applied: this PR ships its own changelog touch alongside the actual state change, not as a catch-up. Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/ Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Converts
XmlScan's per-token string buffer from a raw owning pointer tounique_ptr<string>. This eliminates 7 rawnew string()allocation sites (one perscan*mode helper) and the manualdeletein~XmlScan/clearlvalue, replacing them withmake_unique+ the unique_ptr destructor's automatic cleanup. The single ownership-transfer point (lval(), called by bison'syylexto hand the string up to the parser value stack) usesunique_ptr::release()so the bison side's existing raw-pointer ownership convention is unchanged.Before:
After:
Bison sync:
xml.yis the grammar source;xml.ccis the generated parser. The edits are entirely in the epilogue / prologue%{...%}regions that bison copies verbatim, so both files get the identical hand-edit and stay in lockstep without invoking bison. The repo'sxml.ccwas generated by bison 3.0.4; the local box has 3.8.2 and a clean regeneration would produce huge unrelated diff — keeping the parallel hand-edit avoids that.xml.hhpicks up<memory>and themake_unique/move/unique_ptrusings alongside the existingstd::aliases.Scope intentionally limited:
news (xml.y:150, 153, 198, 200, 208and the parallel sites inxml.cc'syyparse) are NOT touched in this PR. They live inside bison's table-drivenyyparseand a clean migration requires actual bison-3.0.4 regeneration; that's a separate follow-up PR (Stage 2C).news (line 525new XmlScan, line 538new Element, line 624new Document) require API redesign acrossxml.hh+ callers and are separate PRs of their own.cppRaiiAudit'sPROTECTED_FILESdoes NOT yet includexml.cc/xml.hh/xml.y— adding them now would fail the gate on the still-present semantic-action rawnews. Add after Stage 2C lands.Tests: this is functionally a no-op refactor. The behavior change is "manual delete replaced by unique_ptr destructor"; both forms free the string on exactly the same code paths. Marshal RAII Stage 2A (PR #46) had the same shape and passed C++ unit tests + ASan + UBSan.
Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/