Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/decompiler/RAII_MIGRATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ can land at C++11 and C++20 later.
|---|---|---|
| #31-1 | This plan | shipped |
| #31-2 | Stage 1 — `address.cc`, `space.cc`, `rangeutil.cc` (audit's "range.cc" target) | shipped: the three files were already raw-`new`-free in tree; the `gradle cppRaiiAudit` per-file gate (see [`gradle/cppRaiiAudit.gradle`](../../gradle/cppRaiiAudit.gradle)) was added to fail CI on any regression. Wired into `.github/workflows/build-ghidra.yml` alongside `ignoreAudit` and `objectInputStreamAudit`. |
| #31-3 | Stage 2 — `marshal.cc`, `xml.cc` | open |
| #31-3 | Stage 2 — `marshal.cc`, `xml.cc` | partial: marshal shipped ([PR #46](https://github.com/CryptoJones/GayHydra/pull/46)), xml.y/xml.cc in flight ([PR #51](https://github.com/CryptoJones/GayHydra/pull/51) + [PR #52](https://github.com/CryptoJones/GayHydra/pull/52)); semantic-action sites scoped in [RAII_STAGE_2C_XML.md](RAII_STAGE_2C_XML.md) |
| #31-4 | Stage 3 — `database.cc`, `comment.cc`, `cover.cc` | open |
| #31-5 | Stage 4 — `type.cc`, `userop.cc` | open |
| #31-6 | Stage 5 — pcode core | open |
Expand Down
121 changes: 121 additions & 0 deletions docs/decompiler/RAII_STAGE_2C_XML.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Rec 31 Stage 2C — `xml.y` / `xml.cc` semantic-action RAII migration

*Status: design / scoping. No code change in this PR; describes the work that Stage 2C entails so a future PR can be scoped accurately.*

## What's left after Stage 2B

PRs [#51](https://github.com/CryptoJones/GayHydra/pull/51) (`XmlScan::lvalue`) and [#52](https://github.com/CryptoJones/GayHydra/pull/52) (`xml_parse::global_scan` lifetime) converted the *epilogue*-section raw `new`s in `xml.y` / `xml.cc` — the hand-written C++ that bison passes through verbatim. The migration was minimal-disruption because the edits were entirely outside the bison-generated table-driven `yyparse` body.

The remaining raw `new` sites live **inside bison semantic actions**:

xml.y:150 $$ = new string; // attsinglemid
xml.y:153 $$ = new string; // attdoublemid
xml.y:198 $$ = new Attributes($2); // stagstart
xml.y:200 $$ = new NameValue; // SAttribute
xml.y:208 string *tmp = new string(); ...
(Reference; tmp owned locally, deleted before return)
xml.y:538 Element *newel = new Element(cur); // TreeHandler::startElement
xml.y:624 Document *doc = new Document(); // xml_tree()

These get inlined into `yyparse`'s big switch statement when bison compiles the grammar. The corresponding parallel sites in `xml.cc` are:

xml.cc:1598 (yyval.str) = new string;
xml.cc:1616 (yyval.str) = new string;
xml.cc:1736 (yyval.attr) = new Attributes((yyvsp[0].str));
xml.cc:1748 (yyval.pair) = new NameValue; ...
xml.cc:1790 string *tmp = new string(); ...

## Why this isn't a hand-edit

Stages 2A (marshal) and 2B (xml.y epilogue) succeeded with the *parallel-hand-edit* technique: change `xml.y` and `xml.cc` consistently in the regions bison copies verbatim, no bison regeneration required.

The semantic-action sites can't be hand-edited safely because:

1. `xml.cc:1598` etc. live inside the bison-generated `yyparse()` body — a state machine driven by the `yytable` / `yydefact` / etc. arrays that bison computes from the grammar. Hand-edits there are robust as long as the surrounding code shape doesn't change.

2. **But the semantic actions cross the bison `%union` boundary.** The `%union` declaration in `xml.y` is:

%union {
int4 i;
string *str;
Attributes *attr;
NameValue *pair;
}

It's a C-style union of raw pointers. The RAII candidate replacements (`unique_ptr<string>`, `unique_ptr<Attributes>`, etc.) **cannot live in a C `union`** — unions require trivially-constructible / trivially-destructible members, which `unique_ptr` is not.

3. The grammar also declares destructors:

%destructor { delete $$; } <*>

bison generates calls to this whenever the parser discards a semantic value on its stack (e.g. during error recovery). If we change the value type to a smart pointer, the destructor body changes shape too.

So Stage 2C is **not** "change `new` to `make_unique` in five places and call it done." The grammar's value-type model itself has to change.

## Approach options

### Option A — switch to `%define api.value.type variant` (bison's C++ variants)

Bison 3.0 introduced an opt-in C++ tagged-variant value type:

%language "c++"
%require "3.0"
%define api.value.type variant
%token <unique_ptr<string>> CHARDATA CDATA ATTVALUE ...
%type <unique_ptr<string>> AttValue attsinglemid attdoublemid ...
%type <unique_ptr<Attributes>> EmptyElemTag STag stagstart
%type <unique_ptr<NameValue>> SAttribute

This *does* support non-trivially-destructible value types. The generated parser:

- becomes a C++ class (`xml::parser` instead of a free `yyparse` function);
- uses move semantics on the value stack;
- has the destructors automatically called when a value is popped — no `%destructor { delete $$; }` needed.

**Pros:** clean RAII end-to-end. Eliminates manual `delete` in actions (`delete $2; *$$ += *$2;` becomes `*$$ += std::move(*$2);` etc.). The bison maintainers' recommended path for new C++ grammars.

**Cons:** large API break in the generated code. `xml_parse()`'s body has to be rewritten: `int res = yyparse();` becomes something like `xml::parser p; int res = p.parse();`. `yylex` signature changes — instead of writing to global `yylval`, the lexer fills a parser-supplied `xml::parser::symbol_type`. `yyerror` becomes a member. The diff against the current `xml.cc` would be wholesale rewrite (~2700 of the ~2700 lines).

The wholesale rewrite is the right move long-term but requires:
- Pin the bison version (3.0.5 is locally buildable; 3.0.4 isn't on modern glibc). Document it in the Makefile + CONTRIBUTING.
- Write a from-scratch xml_parse shim that bridges the new C++ parser API to the existing `xml_parse(istream &, ContentHandler *, int4)` interface that decompiler callers depend on.
- Verify the bison-regenerated `xml.cc` produces the same parse trees on the existing XML corpus.

### Option B — keep `%union` of raw pointers; manage ownership tighter at the boundary

Leave the `%union` as-is (raw pointers everywhere on the parser value stack), but:

- Audit every semantic action that calls `new X(...)` and ensure the corresponding "ownership transfer in" / "consumer deletes" / "drop on error" rules are explicit.
- Convert the `string *tmp = new string(); ... delete tmp;` pattern in line 208 to a stack-local `string tmp;` (no `new` needed, no `delete` needed — it's used once within the action then goes out of scope).
- Accept that the `%destructor { delete $$; }` is the project's RAII story for the parser value stack.
- Add `xml.y` / `xml.cc` to `cppRaiiAudit`'s `PROTECTED_FILES` once the obvious sites (208's `tmp`) are cleaned, with the understanding that the `%union` raw-pointer pattern is grandfathered as bison's contract, not user-level raw ownership.

**Pros:** small, mechanical, no bison-version dance, no API break. Lines 208 (the `tmp`) is the only one that's an unambiguous code-smell; the rest are bison value-stack convention.

**Cons:** doesn't actually achieve "no raw `new` in `xml.cc`" — it just decides that some raw-`new`s are OK because bison wrote them. The audit gate has to carry an explicit exception for the `%union`-mediated sites. Slightly weakens the Rec 31 invariant.

## Recommendation

**Option B for the next PR; Option A as a separate strategic project.**

The `string *tmp = new string(); ... delete tmp;` at line 208 is a clear win — it's a stack-local that the author wrote as heap-allocated for no apparent reason. Convert it. The other five semantic-action `new`s are bison's value-stack contract; leaving them as raw pointers and adding `xml.y` / `xml.cc` to `cppRaiiAudit`'s `PROTECTED_FILES` with a documented exception is honest and shipping-ready.

Option A is the right *eventual* destination but its scope is "rewrite the xml parser front-end." That needs its own sprint, its own design review with the decompiler maintainer, its own corpus-equivalence verification. Not something to slip into a multi-PR thread.

## Concrete next actions (in order)

1. **PR — Stage 2C-min (Option B):** convert `xml.y:208`'s `string *tmp = new string(); ... delete tmp;` to a stack-local; mirror in `xml.cc:1790`. Add `xml.y` / `xml.cc` to `cppRaiiAudit`'s `PROTECTED_FILES` with an exception comment for the bison `%union` semantic-action sites. After this, `cppRaiiAudit` would gate any *new* raw `new` introduced in those files but accepts the five existing bison-mediated sites.

2. **PR — Element parse-tree ownership:** the `xml.y:538` `new Element(cur)` is parse-tree owned by the parent's `children` vector. Convert `Element::children` to `vector<unique_ptr<Element>>` and update the few callers. This is independent of the `%union` question.

3. **PR — Document return:** `xml.y:624` `new Document()` flows out through `xml_tree()` to `XmlDecode::ingestStream` where `XmlDecode::~XmlDecode` does `delete document;`. Convert `xml_tree` to return `unique_ptr<Document>` and `XmlDecode::document` to `unique_ptr<Document>`. Drop the manual destructor.

4. **Strategic sprint — Option A:** switch `xml.y` to `%define api.value.type variant`, regenerate `xml.cc` with bison-3.0.5, write the C++-parser shim. Coordinate with decompiler maintainer.

## Bison version note

Local builds of bison-3.0.4 fail on modern glibc due to a gnulib `fseterr.c` portability bug. **Bison-3.0.5** builds cleanly (verified in this session at `/tmp/bison30/bin/bison`) and the diff against `xml.cc` (which was generated by 3.0.4) is ~615 lines, almost all cosmetic (`#line` directives, version-string bumps, a `default:` case added in 3.0.5). If a regeneration ever lands, pin **bison-3.0.5** in `decompile/cpp/Makefile` and a `CONTRIBUTING` note.

---

*Proudly Made in Nebraska. Go Big Red! 🌽 <https://xkcd.com/2347/>*
Loading