Skip to content

feat(decompiler): Rec 31 #31-3 RAII Stage 2 — marshal.cc ByteChunk owns its buffer#46

Merged
CryptoJones merged 1 commit into
masterfrom
feat/rec-31-raii-stage2-marshal-bytechunk
May 26, 2026
Merged

feat(decompiler): Rec 31 #31-3 RAII Stage 2 — marshal.cc ByteChunk owns its buffer#46
CryptoJones merged 1 commit into
masterfrom
feat/rec-31-raii-stage2-marshal-bytechunk

Conversation

@CryptoJones
Copy link
Copy Markdown
Owner

Converts PackedDecode's inStream chunks from raw-pointer-owning to unique_ptr-owning. The ownership previously lived in the destructor's manual delete[] loop — exactly the leak surface (any exception or early return that bypasses the destructor would leak). After Stage 2 the ownership is local to the chunk itself.

Before:

class ByteChunk {                    // Just two raw pointers
  uint1 *start;
  uint1 *end;
};

// allocateNextInputBuffer():
uint1 *buf = new uint1[BUFFER_SIZE + pad];
inStream.emplace_back(buf, buf+BUFFER_SIZE);

// endIngest() tail-padding:
uint1 *endbuf = new uint1[1];
inStream.emplace_back(endbuf, endbuf+1);

// ~PackedDecode():
for (auto iter = inStream.begin(); iter != inStream.end(); ++iter)
  delete[] (*iter).start;            // Manual cleanup

After:

class ByteChunk {
  unique_ptr<uint1[]> storage;       // Owning storage
  uint1 *start;                      // [start, end) window into storage
  uint1 *end;
};

// allocateNextInputBuffer():
auto owned = make_unique<uint1[]>(BUFFER_SIZE + pad);
uint1 *buf = owned.get();
inStream.emplace_back(move(owned), buf, buf+BUFFER_SIZE);

// ~PackedDecode():
// empty — unique_ptr cleans up per chunk

ByteChunk becomes move-only (unique_ptr has no copy ctor). list::emplace_back constructs in-place, so no copy is required for the inStream usage pattern.

Behavior preserved:

  • start / end pointers still expose the same [start, end) byte window; consumers (getNextByte, advancePosition, the inStream iterator walk in Position) are read-only and unchanged.
  • The endIngest tail-pad case still allocates a 1-byte chunk and writes ELEMENT_END at offset 0.
  • Lifetime of the buffer still matches the chunk's lifetime, which still matches the PackedDecode's lifetime via inStream.

Audit gate: marshal.cc and marshal.hh added to cppRaiiAudit.gradle's PROTECTED_FILES. Stage 1 (#31-2) already gated address.cc, space.cc, rangeutil.cc; Stage 2's marshal pair joins them. xml.cc / xml.y (the yacc-generated parser's bison-source-level edits) come in a follow-up PR — those raw news live in regenerated parser actions, not in hand-written code, and need a separate bison-source-level approach.

Note: make_unique<uint1[]>(N) value-initializes (zeroes) the buffer, whereas new uint1[N] did not. The buffer is overwritten by s.get((char *)buf, BUFFER_SIZE+1, '\0') before any read; the zero-init cost is one memset per chunk (BUFFER_SIZE bytes typically), negligible relative to the I/O it precedes.

Closes #31-3 (marshal half) of Rec 31. xml.y / xml.cc remain open.

Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/

…ns its buffer

Converts PackedDecode's `inStream` chunks from raw-pointer-owning to
unique_ptr-owning. Pre-Stage-2 design pattern:

  class ByteChunk {                    // Just two raw pointers
    uint1 *start;
    uint1 *end;
  };
  list<ByteChunk> inStream;

  // allocateNextInputBuffer():
  uint1 *buf = new uint1[BUFFER_SIZE + pad];
  inStream.emplace_back(buf, buf+BUFFER_SIZE);

  // endIngest() tail-padding:
  uint1 *endbuf = new uint1[1];
  inStream.emplace_back(endbuf, endbuf+1);

  // ~PackedDecode():
  for (auto iter = inStream.begin(); iter != inStream.end(); ++iter)
    delete[] (*iter).start;            // Manual cleanup of owning raw pointers

The ownership lives in the destructor's `delete[]` loop, which is
exactly the leak surface (any exception or early return that bypasses
the destructor would leak; any chunk inserted but not destroyed via
~PackedDecode leaks). After Stage 2 the ownership is local to the
chunk itself:

  class ByteChunk {
    unique_ptr<uint1[]> storage;       // Owning storage
    uint1 *start;                      // [start, end) window into storage
    uint1 *end;
  };

  // allocateNextInputBuffer():
  auto owned = make_unique<uint1[]>(BUFFER_SIZE + pad);
  uint1 *buf = owned.get();
  inStream.emplace_back(move(owned), buf, buf+BUFFER_SIZE);

  // endIngest() tail-padding: same shape.

  // ~PackedDecode():
  // empty — unique_ptr cleans up per chunk

ByteChunk becomes move-only (unique_ptr has no copy ctor). `list::
emplace_back` constructs in-place, so no copy is ever required for
the inStream usage pattern.

Behavior preserved:
  - start / end pointers still expose the same [start, end) byte
    window; consumers (getNextByte, advancePosition, the inStream
    iterator walk in Position) are read-only and unchanged.
  - The endIngest tail-pad case still allocates a 1-byte chunk and
    writes ELEMENT_END at offset 0.
  - Lifetime of the buffer still matches the chunk's lifetime,
    which still matches the PackedDecode's lifetime via inStream.

Audit gate: marshal.cc and marshal.hh added to cppRaiiAudit.gradle's
PROTECTED_FILES. Stage 1 (#31-2) already gated address.cc, space.cc,
rangeutil.cc; Stage 2's marshal pair joins them. xml.cc / xml.y (the
yacc-generated parser's bison-source-level edits) come in a follow-up
PR — those raw `new`s live in regenerated parser actions, not in
hand-written code, and need a separate bison-source-level approach.

Note: make_unique<uint1[]>(N) value-initializes (zeroes) the buffer,
whereas `new uint1[N]` did not. The buffer is overwritten by
`s.get((char *)buf, BUFFER_SIZE+1, '\0')` before any read; the zero-
init cost is one memset per chunk (BUFFER_SIZE bytes typically), which
is negligible relative to the I/O it precedes.

Closes #31-3 (marshal half) of Rec 31. xml.y / xml.cc remain open;
RAII_MIGRATION.md sequencing row to be updated as that lands.

Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@CryptoJones CryptoJones merged commit 4093d22 into master May 26, 2026
15 of 16 checks passed
@CryptoJones CryptoJones deleted the feat/rec-31-raii-stage2-marshal-bytechunk branch May 26, 2026 06:17
CryptoJones added a commit that referenced this pull request May 26, 2026
…ed + std::span deviation (#54)

The Sprint 6 row for Rec 31 #31-3 (RAII Stage 2) + Rec 32 #32-4
(std::span adoption) was a single open checkbox that no longer
matched reality:

  - marshal.cc RAII landed via PR #46 (Stage 2A).
  - marshal std::span did NOT land — deviation from the documented
    "same files, same PR" plan in docs/decompiler/CPP20_ADOPTION.md.
    marshal's public API is [start, end) pointer-pair ranges, not the
    (T*, size_t) shape that std::span naturally replaces; whether
    there's a natural std::span site here at all is now an explicit
    open question rather than implicit slippage.
  - xml.y / xml.cc RAII is in flight as a multi-PR thread (PRs #51
    + #52 stacked); bigger pieces still pending.
  - xml std::span is open and best audited alongside the bison
    %union redesign needed for the semantic-action sites.

Replaced the single open checkbox with four bullets recording each
piece's current status. Surfaced during the 2026-05-26 self-audit.

Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CryptoJones added a commit that referenced this pull request May 26, 2026
…gration (#71)

Closes audit finding #8 from the 2026-05-26 self-audit by replacing
the "Stage 2C in flight" hand-wave with an honest scoping doc. The
original assumption was "regenerate xml.cc with bison-3.0.4 to apply
RAII to the semantic actions." Investigation showed:

  - bison-3.0.4 doesn't build on modern glibc (gnulib fseterr.c
    portability bug). bison-3.0.5 builds cleanly and produces an
    ~615-line mostly-cosmetic diff against the in-tree xml.cc.
  - The real blocker is not the bison version. The %union holds raw
    pointers (string*, Attributes*, NameValue*) and a C-style union
    can't hold non-trivially-destructible types like unique_ptr.
  - Migrating to RAII therefore requires switching xml.y to bison's
    C++ variant mode (`%define api.value.type variant`), which makes
    yyparse a method of an xml::parser class, changes the yylex
    contract, and wholesale-rewrites the generated xml.cc.

That's a strategic sprint, not a multi-PR thread.

This doc lays out:
  - what's left after Stage 2B (the seven specific raw-new sites
    and their exact line numbers in xml.y and xml.cc);
  - why hand-edit-parallel (the 2B technique) doesn't extend to
    these sites (they cross the %union boundary);
  - two architectural options:
      A. switch to `%define api.value.type variant` — clean RAII,
         wholesale rewrite of xml.cc, full xml_parse shim needed;
      B. keep %union of raw pointers, treat the five
         bison-value-stack sites as a documented exception in
         cppRaiiAudit, and clean up only the one obvious code-smell
         (a heap-allocated stack-scoped temporary in xml.y:208);
  - recommendation: B for the next PR (small, shipping-ready),
    A for a future strategic sprint;
  - a four-step plan ordering Stage 2C-min, the Element parse-tree
    ownership migration, the Document return-value migration, and
    the eventual Option A sprint.

Updates docs/decompiler/RAII_MIGRATION.md's #31-3 row to point at
the new design doc and to reflect what's shipped (#46 marshal) vs.
in-flight (#51, #52 xml epilogue) vs. scoped (the new doc).

No code change in this PR — design / scoping only.

Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CryptoJones added a commit that referenced this pull request May 26, 2026
…_ptr migration (#51)

* ci(codeql): manual c-cpp build replaces autobuild

CodeQL's autobuilder has been failing on every PR (including doc-only
ones — see #44, #47, #48) with:

  cpp/autobuilder: Incompatible operating system (expected Windows).
  cpp/autobuilder: No supported build system detected.

The autobuilder scans the repo root for a recognized build file
(Makefile, CMakeLists.txt, etc.); the decompiler's Makefile lives at
Ghidra/Features/Decompiler/src/decompile/cpp/, so the scan fails and
the c-cpp matrix leg goes red even when no C++ source is touched.
The actual Java/Kotlin, Actions, and Python legs have always been
fine; only c-cpp was affected.

Fix: replace the `github/codeql-action/autobuild@v3` step with an
explicit `make libdecomp_dbg.a` invocation that cd's into the
decompiler tree. libdecomp_dbg.a is the static-archive target that
compiles every LIBDECOMP_NAMES source into com_dbg/*.o then `ar qc`s
them; it does NOT need BFD at link time (no -lbfd dependency), so
binutils-dev / libiberty-dev drop out of the apt-get list too.
CodeQL's tracer picks up the .o compile commands during the make
invocation, which is exactly the input static analysis needs.

The autobuild step stays for the `actions` and `python` matrix legs
(it works fine for both — actions is just YAML, python doesn't need
a build at all).

Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(decompiler): Rec 31 #31-3 RAII Stage 2B — XmlScan::lvalue unique_ptr migration

Converts XmlScan's per-token string buffer from a raw owning pointer
to unique_ptr<string>. This eliminates 7 raw `new string()` allocation
sites (one per scan* mode helper) and the manual `delete` in
~XmlScan / clearlvalue, replacing them with make_unique + the unique_ptr
destructor's automatic cleanup. The single ownership-transfer point
(lval(), called by bison's yylex to hand the string up to the parser
value stack) uses unique_ptr::release() so the bison side's existing
raw-pointer ownership convention is unchanged.

Before:
  class XmlScan {
    string *lvalue;                // raw owning pointer
    ...
    string *lval(void) {
      string *ret = lvalue;
      lvalue = (string *)0;        // manual null-out
      return ret;
    }
    void clearlvalue(void) {
      if (lvalue != (string *)0)   // manual null-check
        delete lvalue;             // manual delete
    }
  };
  // Allocate (×7 across scan* helpers):
  lvalue = new string();

After:
  class XmlScan {
    unique_ptr<string> lvalue;     // owning RAII handle
    ...
    string *lval(void) {
      return lvalue.release();     // ownership transfer; lvalue → nullptr
    }
    void clearlvalue(void) {
      lvalue.reset();              // null-safe, automatic delete
    }
  };
  // Allocate (×7):
  lvalue = make_unique<string>();

Bison sync: xml.y is the grammar source; xml.cc is the generated
parser. The edits are entirely in the epilogue / prologue %{...%}
regions that bison copies verbatim, so both files get the identical
hand-edit and stay in lockstep without invoking bison. The repo's
xml.cc was generated by bison 3.0.4; the local box has 3.8.2 and a
clean regeneration would produce huge unrelated diff — keeping the
parallel hand-edit avoids that. xml.hh picks up <memory> and the
make_unique / move / unique_ptr usings alongside the existing std::
aliases.

Scope intentionally limited:
  - Semantic-action raw `new`s (xml.y:150, 153, 198, 200, 208 and the
    parallel sites in xml.cc's yyparse) are NOT touched in this PR.
    They live inside bison's table-driven yyparse and a clean
    migration requires actual bison-3.0.4 regeneration; that's a
    separate follow-up PR (Stage 2C).
  - The big-object epilogue raw new's (line 525 `new XmlScan`,
    line 538 `new Element`, line 624 `new Document`) require API
    redesign across xml.hh + callers and are separate PRs of their
    own.
  - cppRaiiAudit's PROTECTED_FILES does NOT yet include xml.cc /
    xml.hh / xml.y — adding them now would fail the gate on the
    still-present semantic-action raw new's. Add after Stage 2C
    lands.

Tests: this is functionally a no-op refactor. The behavior change
is "manual delete replaced by unique_ptr destructor"; both forms
free the string on exactly the same code paths. Marshal RAII Stage
2A (PR #46) had the same shape and passed C++ unit tests + ASan +
UBSan.

Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant