feat(decompiler): Rec 31 #31-3 RAII Stage 2 — marshal.cc ByteChunk owns its buffer#46
Merged
Merged
Conversation
…ns its buffer
Converts PackedDecode's `inStream` chunks from raw-pointer-owning to
unique_ptr-owning. Pre-Stage-2 design pattern:
class ByteChunk { // Just two raw pointers
uint1 *start;
uint1 *end;
};
list<ByteChunk> inStream;
// allocateNextInputBuffer():
uint1 *buf = new uint1[BUFFER_SIZE + pad];
inStream.emplace_back(buf, buf+BUFFER_SIZE);
// endIngest() tail-padding:
uint1 *endbuf = new uint1[1];
inStream.emplace_back(endbuf, endbuf+1);
// ~PackedDecode():
for (auto iter = inStream.begin(); iter != inStream.end(); ++iter)
delete[] (*iter).start; // Manual cleanup of owning raw pointers
The ownership lives in the destructor's `delete[]` loop, which is
exactly the leak surface (any exception or early return that bypasses
the destructor would leak; any chunk inserted but not destroyed via
~PackedDecode leaks). After Stage 2 the ownership is local to the
chunk itself:
class ByteChunk {
unique_ptr<uint1[]> storage; // Owning storage
uint1 *start; // [start, end) window into storage
uint1 *end;
};
// allocateNextInputBuffer():
auto owned = make_unique<uint1[]>(BUFFER_SIZE + pad);
uint1 *buf = owned.get();
inStream.emplace_back(move(owned), buf, buf+BUFFER_SIZE);
// endIngest() tail-padding: same shape.
// ~PackedDecode():
// empty — unique_ptr cleans up per chunk
ByteChunk becomes move-only (unique_ptr has no copy ctor). `list::
emplace_back` constructs in-place, so no copy is ever required for
the inStream usage pattern.
Behavior preserved:
- start / end pointers still expose the same [start, end) byte
window; consumers (getNextByte, advancePosition, the inStream
iterator walk in Position) are read-only and unchanged.
- The endIngest tail-pad case still allocates a 1-byte chunk and
writes ELEMENT_END at offset 0.
- Lifetime of the buffer still matches the chunk's lifetime,
which still matches the PackedDecode's lifetime via inStream.
Audit gate: marshal.cc and marshal.hh added to cppRaiiAudit.gradle's
PROTECTED_FILES. Stage 1 (#31-2) already gated address.cc, space.cc,
rangeutil.cc; Stage 2's marshal pair joins them. xml.cc / xml.y (the
yacc-generated parser's bison-source-level edits) come in a follow-up
PR — those raw `new`s live in regenerated parser actions, not in
hand-written code, and need a separate bison-source-level approach.
Note: make_unique<uint1[]>(N) value-initializes (zeroes) the buffer,
whereas `new uint1[N]` did not. The buffer is overwritten by
`s.get((char *)buf, BUFFER_SIZE+1, '\0')` before any read; the zero-
init cost is one memset per chunk (BUFFER_SIZE bytes typically), which
is negligible relative to the I/O it precedes.
Closes #31-3 (marshal half) of Rec 31. xml.y / xml.cc remain open;
RAII_MIGRATION.md sequencing row to be updated as that lands.
Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 26, 2026
CryptoJones
added a commit
that referenced
this pull request
May 26, 2026
…ed + std::span deviation (#54) The Sprint 6 row for Rec 31 #31-3 (RAII Stage 2) + Rec 32 #32-4 (std::span adoption) was a single open checkbox that no longer matched reality: - marshal.cc RAII landed via PR #46 (Stage 2A). - marshal std::span did NOT land — deviation from the documented "same files, same PR" plan in docs/decompiler/CPP20_ADOPTION.md. marshal's public API is [start, end) pointer-pair ranges, not the (T*, size_t) shape that std::span naturally replaces; whether there's a natural std::span site here at all is now an explicit open question rather than implicit slippage. - xml.y / xml.cc RAII is in flight as a multi-PR thread (PRs #51 + #52 stacked); bigger pieces still pending. - xml std::span is open and best audited alongside the bison %union redesign needed for the semantic-action sites. Replaced the single open checkbox with four bullets recording each piece's current status. Surfaced during the 2026-05-26 self-audit. Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/ Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CryptoJones
added a commit
that referenced
this pull request
May 26, 2026
…gration (#71) Closes audit finding #8 from the 2026-05-26 self-audit by replacing the "Stage 2C in flight" hand-wave with an honest scoping doc. The original assumption was "regenerate xml.cc with bison-3.0.4 to apply RAII to the semantic actions." Investigation showed: - bison-3.0.4 doesn't build on modern glibc (gnulib fseterr.c portability bug). bison-3.0.5 builds cleanly and produces an ~615-line mostly-cosmetic diff against the in-tree xml.cc. - The real blocker is not the bison version. The %union holds raw pointers (string*, Attributes*, NameValue*) and a C-style union can't hold non-trivially-destructible types like unique_ptr. - Migrating to RAII therefore requires switching xml.y to bison's C++ variant mode (`%define api.value.type variant`), which makes yyparse a method of an xml::parser class, changes the yylex contract, and wholesale-rewrites the generated xml.cc. That's a strategic sprint, not a multi-PR thread. This doc lays out: - what's left after Stage 2B (the seven specific raw-new sites and their exact line numbers in xml.y and xml.cc); - why hand-edit-parallel (the 2B technique) doesn't extend to these sites (they cross the %union boundary); - two architectural options: A. switch to `%define api.value.type variant` — clean RAII, wholesale rewrite of xml.cc, full xml_parse shim needed; B. keep %union of raw pointers, treat the five bison-value-stack sites as a documented exception in cppRaiiAudit, and clean up only the one obvious code-smell (a heap-allocated stack-scoped temporary in xml.y:208); - recommendation: B for the next PR (small, shipping-ready), A for a future strategic sprint; - a four-step plan ordering Stage 2C-min, the Element parse-tree ownership migration, the Document return-value migration, and the eventual Option A sprint. Updates docs/decompiler/RAII_MIGRATION.md's #31-3 row to point at the new design doc and to reflect what's shipped (#46 marshal) vs. in-flight (#51, #52 xml epilogue) vs. scoped (the new doc). No code change in this PR — design / scoping only. Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/ Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CryptoJones
added a commit
that referenced
this pull request
May 26, 2026
…_ptr migration (#51) * ci(codeql): manual c-cpp build replaces autobuild CodeQL's autobuilder has been failing on every PR (including doc-only ones — see #44, #47, #48) with: cpp/autobuilder: Incompatible operating system (expected Windows). cpp/autobuilder: No supported build system detected. The autobuilder scans the repo root for a recognized build file (Makefile, CMakeLists.txt, etc.); the decompiler's Makefile lives at Ghidra/Features/Decompiler/src/decompile/cpp/, so the scan fails and the c-cpp matrix leg goes red even when no C++ source is touched. The actual Java/Kotlin, Actions, and Python legs have always been fine; only c-cpp was affected. Fix: replace the `github/codeql-action/autobuild@v3` step with an explicit `make libdecomp_dbg.a` invocation that cd's into the decompiler tree. libdecomp_dbg.a is the static-archive target that compiles every LIBDECOMP_NAMES source into com_dbg/*.o then `ar qc`s them; it does NOT need BFD at link time (no -lbfd dependency), so binutils-dev / libiberty-dev drop out of the apt-get list too. CodeQL's tracer picks up the .o compile commands during the make invocation, which is exactly the input static analysis needs. The autobuild step stays for the `actions` and `python` matrix legs (it works fine for both — actions is just YAML, python doesn't need a build at all). Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(decompiler): Rec 31 #31-3 RAII Stage 2B — XmlScan::lvalue unique_ptr migration Converts XmlScan's per-token string buffer from a raw owning pointer to unique_ptr<string>. This eliminates 7 raw `new string()` allocation sites (one per scan* mode helper) and the manual `delete` in ~XmlScan / clearlvalue, replacing them with make_unique + the unique_ptr destructor's automatic cleanup. The single ownership-transfer point (lval(), called by bison's yylex to hand the string up to the parser value stack) uses unique_ptr::release() so the bison side's existing raw-pointer ownership convention is unchanged. Before: class XmlScan { string *lvalue; // raw owning pointer ... string *lval(void) { string *ret = lvalue; lvalue = (string *)0; // manual null-out return ret; } void clearlvalue(void) { if (lvalue != (string *)0) // manual null-check delete lvalue; // manual delete } }; // Allocate (×7 across scan* helpers): lvalue = new string(); After: class XmlScan { unique_ptr<string> lvalue; // owning RAII handle ... string *lval(void) { return lvalue.release(); // ownership transfer; lvalue → nullptr } void clearlvalue(void) { lvalue.reset(); // null-safe, automatic delete } }; // Allocate (×7): lvalue = make_unique<string>(); Bison sync: xml.y is the grammar source; xml.cc is the generated parser. The edits are entirely in the epilogue / prologue %{...%} regions that bison copies verbatim, so both files get the identical hand-edit and stay in lockstep without invoking bison. The repo's xml.cc was generated by bison 3.0.4; the local box has 3.8.2 and a clean regeneration would produce huge unrelated diff — keeping the parallel hand-edit avoids that. xml.hh picks up <memory> and the make_unique / move / unique_ptr usings alongside the existing std:: aliases. Scope intentionally limited: - Semantic-action raw `new`s (xml.y:150, 153, 198, 200, 208 and the parallel sites in xml.cc's yyparse) are NOT touched in this PR. They live inside bison's table-driven yyparse and a clean migration requires actual bison-3.0.4 regeneration; that's a separate follow-up PR (Stage 2C). - The big-object epilogue raw new's (line 525 `new XmlScan`, line 538 `new Element`, line 624 `new Document`) require API redesign across xml.hh + callers and are separate PRs of their own. - cppRaiiAudit's PROTECTED_FILES does NOT yet include xml.cc / xml.hh / xml.y — adding them now would fail the gate on the still-present semantic-action raw new's. Add after Stage 2C lands. Tests: this is functionally a no-op refactor. The behavior change is "manual delete replaced by unique_ptr destructor"; both forms free the string on exactly the same code paths. Marshal RAII Stage 2A (PR #46) had the same shape and passed C++ unit tests + ASan + UBSan. Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Converts PackedDecode's
inStreamchunks from raw-pointer-owning tounique_ptr-owning. The ownership previously lived in the destructor's manualdelete[]loop — exactly the leak surface (any exception or early return that bypasses the destructor would leak). After Stage 2 the ownership is local to the chunk itself.Before:
After:
ByteChunkbecomes move-only (unique_ptr has no copy ctor).list::emplace_backconstructs in-place, so no copy is required for the inStream usage pattern.Behavior preserved:
start/endpointers still expose the same [start, end) byte window; consumers (getNextByte,advancePosition, theinStreamiterator walk inPosition) are read-only and unchanged.endIngesttail-pad case still allocates a 1-byte chunk and writesELEMENT_ENDat offset 0.PackedDecode's lifetime viainStream.Audit gate:
marshal.ccandmarshal.hhadded tocppRaiiAudit.gradle'sPROTECTED_FILES. Stage 1 (#31-2) already gatedaddress.cc,space.cc,rangeutil.cc; Stage 2's marshal pair joins them.xml.cc/xml.y(the yacc-generated parser's bison-source-level edits) come in a follow-up PR — those rawnews live in regenerated parser actions, not in hand-written code, and need a separate bison-source-level approach.Note:
make_unique<uint1[]>(N)value-initializes (zeroes) the buffer, whereasnew uint1[N]did not. The buffer is overwritten bys.get((char *)buf, BUFFER_SIZE+1, '\0')before any read; the zero-init cost is one memset per chunk (BUFFER_SIZE bytes typically), negligible relative to the I/O it precedes.Closes #31-3 (marshal half) of Rec 31.
xml.y/xml.ccremain open.Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/