diff --git a/CLAUDE.md b/CLAUDE.md index 2488bd760..2c6e18606 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -91,7 +91,9 @@ Multiple skill files may apply to a single task. For example, creating a new das When you discover something new about daslang syntax, semantics, or conventions — whether through compiler errors, user corrections, or experimentation — **update this file** with the new knowledge. If it relates to a specific skill area, update the relevant `skills/*.md` file instead. -**Doc improvements at stopping points.** When a task wraps and you spot a typo or factual error in CLAUDE.md or `skills/*.md` — fix it in-place and flag the edit in the end-of-turn summary. Anything more — clarifications, additions, restructuring, removing existing guidance, **or proposing a new skill file when you see a recurring pattern that no existing skill covers** — propose first. Default toward propose-first; doc edits direct future Claude behavior and silent diffs are not OK. +**Syntax and factual corrections are fix-in-place, always.** If a compiler error, probe, or user correction shows that a claim in CLAUDE.md or `skills/*.md` is wrong, incomplete, or stale, fix it in the same session and flag the edit in the end-of-turn summary — never defer it to a proposal. Verify the corrected claim before writing it (grammar truth is `src/parser/ds2_parser.ypp`; behavior truth is a probe-compile with the current binary). + +**Doc improvements at stopping points.** Propose-first applies only to what's left: restructuring, removing existing guidance, **or proposing a new skill file when you see a recurring pattern that no existing skill covers**. Doc edits direct future Claude behavior, so structural diffs still get review — but factual drift must be self-healing, not queued behind it. ## daslang Language — Gen2 Syntax (REQUIRED) @@ -106,8 +108,9 @@ All code MUST use gen2 syntax (add `options gen2` at the top of every file). Key - **Table literals:** `{ "k" => v, "k2" => v2 }` — NOT `{{ "k" => v; "k2" => v2 }}` - **Bare blocks:** `{ var x = 1; ... }` at statement level creates a lexical scope (NOT a table literal). Supports `finally`: `{ ... } finally { ... }` - **Named arguments:** `foo([name = value])` with square brackets -- **Block arguments:** block/lambda after `func()` pipes as last arg. No `$` for parameterless blocks: `defer() { ... }`. With params: `build_string() $(var writer) { ... }`. Lambdas: `emplace() @(x : int) { ... }` +- **Block arguments:** block/lambda after `func()` pipes as last arg. No `$` for parameterless blocks: `defer() { ... }`. With params: `build_string() $(var writer) { ... }`. Lambdas: `emplace() @(x : int) { ... }`. **Arrow shorthand for single-expression blocks:** `arr |> sort() $(a, b) => a < b`. Defaulted parameters sitting between the explicit args and a trailing block are padded automatically — don't spell them out - **Lambda:** `@(args) { body }` or `@@(args) { body }` (no-capture). **Inline arrow form:** `@(x) => expr` (capture lambda) and `@@(x) => expr` (no-capture function pointer) — preferred for short transforms passed as arguments: `sometimes(pat, @@(x) => fast(x, 2.0lf))` +- **Function/method arrow body:** `def add(a, b : int) : int => a + b` — single-expression body, return type optional (`def add(a, b : int) => a + b` infers). Works on class methods too: `def get() : int => count + 2` - **Generator:** `$() { yield value; }` or `$ { yield value; }` - **Tuple `=>`:** `a => b` creates `tuple` - **`typeinfo`:** `typeinfo trait_name(type)` — trait name outside parens @@ -123,6 +126,16 @@ All code MUST use gen2 syntax (add `options gen2` at the top of every file). Key - **Function pointer with explicit type:** `@@<(var self : T) : RetT> funcName` — specifies the exact parameter/return types of a function pointer literal - **OR types in params** (`T1 | T2 | …`) — a parameter may list alternative accepted types: `int | float | double`, or heterogeneous forms like `array | table | auto(NT)`. This is a **generic "OR" type, NOT a runtime tagged variant** — the function is monomorphized per the concrete argument type that matches one alternative, so each instantiation sees that concrete type with no per-call dispatch or unpacking cost. Don't "hoist the union cast out of the loop" — there is no union value; a cast like `float(n)` inside the body is just a trivial concrete cast in each instantiation. Use it to widen an overload set in one signature (e.g. `def fast(n : int | float | double)` accepting bare ints, floats, and doubles at the same call site) +### Fixed arrays (structural since 0.6.3) + +`int[10]` is a **structural type** (`Type::tFixedArray`), not a qualifier vector on the element: one node per dimension, element in `firstType`, size in `fixedDim`, outermost first (`int[3][4]` = FA(3, FA(4, int))); ref/const/temporary live on the chain head. Operations act on the **outermost level only** (the one-peel rule). + +- **Generic binding:** `auto(TT)` binds the WHOLE array (`TT = int[3][4]`); `auto(TT)[]` peels one level (`TT = int[4]`, parameter constness inherited); `TT - []` in a return/alias removes ONE level — pre-0.6.3 docs saying "removes all dims" describe the deleted flattened world +- **`safe_addr(arr)` on a fixed array returns a pointer to the FIRST ELEMENT** (C-style decay; multi-dim peels one level: `int[2][3]` → `int[3]?#`) — this is what makes `glGetBooleanv(what, safe_addr(flags4))`-style C interop work +- **Typedefs compose:** `typedef M4 = float[4]` then `M4[10]` is `float[10][4]` — array-ness survives aliasing and generic substitution (the pre-0.6.3 flattening bugs are gone) +- **Runtime `TypeInfo` is still flattened** (`dim[]`/`dimSize`) — only the AST is structural. C++ `TypeDecl::isArray()` means "is a fixed array" in both pre- and post-rework daslang (useful for external modules spanning versions) +- Macro authors: das-side `TypeDecl` fields are `fixedDim`/`fixedDimExpr` (NOT `dim`/`dimExpr` — deleted); typemacro payloads live in `typeMacroExpr`; build chains with `make_fixed_array_type(total, element)` from `daslib/ast_boost` — details in `skills/das_macros.md` + ### Important defaults - No implicit type promotion: `int + float` is a compile error — both sides must match diff --git a/COVERAGE_GAP.md b/COVERAGE_GAP.md new file mode 100644 index 000000000..2e8110c45 --- /dev/null +++ b/COVERAGE_GAP.md @@ -0,0 +1,72 @@ +# Closing the local-vs-CI coverage gap + +Follow-up to the fixed-array rework (PR #3095). Every CI failure during that +PR's babysitting was an **oracle mismatch** — a gate CI enforces that no local +step mirrored — not a wrong change. This plan closes those gaps at four levels. +Process: per-stage plan → implement → review, same as FIXED_ARRAY_REWORK.md. + +## Evidence base (from the #3095 session, 2026-06-11) + +| Incident | Gap | +|---|---| +| doc.yml failed twice: das2rst positional handmade-doc validation, then the `Uncategorized` grep | Only 2 of doc.yml's 6 gates were mirrored locally; CI stops at the FIRST das2rst panic, hiding the rest | +| doctest `CHECK` on TypeDecl bit-fields: MSVC tolerates, clang/gcc reject — 15 lanes red | No clang-family compile of tests-cpp before push; a syntax-only pass would catch it in seconds | +| `safe_addr(bool[4])` decay regression hid in `modules/dasOpenGL/opengl/*.das` | GLFW-gated das code is compiled by NOTHING local — only extended_checks' sequence smoke | +| extended_checks installs external dasImgui from ITS master → `no member named 'dim'` | ABI breaks vs external repos invisible until CI; resolved pre-merge only via the both-worlds `isArray()` spelling | +| (Not hit, same family) Debug lanes bypass fused interpreter permutations | Release-only local testing misses Debug-divergent paths | + +## Stage 0 — language-doc currency (CLAUDE.md + skills) + +- Sharpen the "Updating Instructions" rule: syntax/factual corrections are + **fix-in-place, always** (flag in end-of-turn summary); propose-first stays + for restructuring, removals, new skill files. +- Fixed-array sweep: purge stale `dim`/`dimExpr` guidance (das_macros.md, + macro examples, typemacro payload references → `typeMacroExpr`); document + the structural model where macro authors look (element in `firstType`, one + size per node in `fixedDim`, quals on the chain head, one-peel rule, + `make_fixed_array_type`); document the new user-facing semantics + (`auto(TT)` whole-bind, `auto(TT)[]` peel + const inheritance, one-level + `-[]`, `safe_addr` element-pointer decay). +- Gen2 currency review: extract every syntax claim from CLAUDE.md's gen2 + section + syntax-adjacent skills; verify each against `ds2_parser.ypp` AND a + probe-compile with the current binary; fix stale claims in place. Probe + corpus kept (re-runnable after future grammar changes). + +## Stage 1 — process docs (shared, in-repo) + +- `skills/preflight.md`: CI-lane ↔ local-mirror table for every lane in + build.yml / extended_checks.yml / doc.yml — exact local command, or an + honest "not mirrorable, see skills/wsl_ci_repro.md". +- `skills/make_pr.md`: step-4 trigger gains "removed fields / new enum + values" (das2rst validates positionally); gate list gains the Uncategorized + grep, the latex sphinx build, and a "type-system / daslib-generics change → + sequence smoke + externals sweep" trigger. +- New `skills/abi_break_sweep.md`: externals checklist — grep patterns, + both-worlds spellings (prefer a predicate that exists with the same meaning + in both worlds over feature macros — the `isArray()` precedent), junction + rebuild order, stale-`.shared_module` startup trap, daspkg-index as scope. + +## Stage 2 — tools + +- `utils/preflight/` (das, clargs): one command, tiered. `--fast` = format + + lint + clang syntax-only pass on changed C++. `--full` adds interp/AOT/JIT + suites, tests-cpp, all six doc gates, sequence smoke, CI-only-das compile + sweep. Parallelize on big machines. +- clang syntax-only checker for changed `.cpp`/`.h` (`clang-cl /Zs` with + project includes) — the 80/20 for the MSVC-vs-clang diagnostic gap. +- Docs-gate runner mirroring doc.yml steps 1–6 exactly. +- CI-only das surface list (dasOpenGL, sequence, release tooling …) compiled + via `compile_check` with proper mounts. + +## Stage 3 — local build matrix + +- clang-cl CMake preset in a gitignored alt build dir, wired into + `preflight --full` (compile-focused; full builds affordable but rarely needed). +- mingw preset: deferred unless a mingw-specific failure class appears. +- Debug-config build target for the fused-path divergence family. + +## Stage 4 — CI + +- Nightly cron on daspkg-index building every index package against daslang + master — external ABI breakage surfaces as a nightly signal instead of + inside an unrelated PR's extended_checks. diff --git a/install/CLAUDE.md b/install/CLAUDE.md index d0a7d2baa..30833dc23 100644 --- a/install/CLAUDE.md +++ b/install/CLAUDE.md @@ -76,6 +76,10 @@ Task-specific instructions are in skill files under `skills/`. Read the relevant | `skills/linq.md` | Filter/map/sort/group/aggregate transforms — prefer comprehension → linq_boost → plain `for`; avoid `daslib/functional` for new code | | `skills/decs.md` | Programming with `daslib/decs` / `decs_boost` — entities, components, queries, `[decs_template]`, stages, bulk creation, `from_decs` linq bridge | | `skills/regex.md` | Writing regular expressions in `.das` code | +| `skills/strings.md` | Any `.das` string operation — `find`/`replace`/`split`/parsing/`build_string`/`peek_data` (covers `strings`, `daslib/strings_boost`, `daslib/strings_convert`) | +| `skills/glob.md` | Writing or reviewing any glob/wildcard pattern handling — file selection, include/exclude masks, pattern-match-on-paths (`*` / `?` / `**` / `[abc]`) | +| `skills/sql.md` | SQL via `dasSQLITE` — `[sql_table]` / `[sql_view]` / `[sql_fts5]` / `[sql_function]`, the `_sql(...)` LINQ-to-SQL flagship, custom-type adapters, `@sql_json` / `@sql_blob` columns, transactions, migrations | +| `skills/gc_migration.md` | Migrating older code (external repos, archived projects) from `smart_ptr` AST patterns to gc_node | | `skills/strudel_port.md` | Porting strudel.cc patterns into `dasStrudel` | Multiple skill files may apply to a single task. For example, embedding daslang and calling its standard library requires reading both `skills/cpp_integration.md` and `skills/daslib_modules.md`. @@ -93,21 +97,33 @@ All code MUST use gen2 syntax (add `options gen2` at the top of every file). Key - **Table literals:** `{ "k" => v, "k2" => v2 }` — NOT `{{ "k" => v; "k2" => v2 }}` - **Bare blocks:** `{ var x = 1; ... }` at statement level creates a lexical scope (NOT a table literal). Supports `finally`: `{ ... } finally { ... }` - **Named arguments:** `foo([name = value])` with square brackets -- **Block arguments:** block/lambda after `func()` pipes as last arg. No `$` for parameterless blocks: `defer() { ... }`. With params: `build_string() $(var writer) { ... }`. Lambdas: `emplace() @(x : int) { ... }` +- **Block arguments:** block/lambda after `func()` pipes as last arg. No `$` for parameterless blocks: `defer() { ... }`. With params: `build_string() $(var writer) { ... }`. Lambdas: `emplace() @(x : int) { ... }`. **Arrow shorthand for single-expression blocks:** `arr |> sort() $(a, b) => a < b`. Defaulted parameters sitting between the explicit args and a trailing block are padded automatically — don't spell them out - **Lambda:** `@(args) { body }` or `@@(args) { body }` (no-capture). **Inline arrow form:** `@(x) => expr` (capture lambda) and `@@(x) => expr` (no-capture function pointer) — preferred for short transforms passed as arguments: `sometimes(pat, @@(x) => fast(x, 2.0lf))` +- **Function/method arrow body:** `def add(a, b : int) : int => a + b` — single-expression body, return type optional (`def add(a, b : int) => a + b` infers). Works on class methods too: `def get() : int => count + 2` - **Generator:** `$() { yield value; }` or `$ { yield value; }` - **Tuple `=>`:** `a => b` creates `tuple` - **`typeinfo`:** `typeinfo trait_name(type)` — trait name outside parens - **`static_if`:** `static_if (condition) { ... }` — parentheses required - **Type function call:** `take(type, 1, 2)` — NOT `take < int > (1, 2)` - **Newlines inside `(...)`, `[...]`, `{...}` are free** — long pipe chains, multi-arg calls, array/table literals can wrap freely. Statement-level (no surrounding bracket) still requires one statement per line, so wrap the RHS in `(...)` if a `let x = a |> b |> c` needs to break across lines -- **Inline literals over temp-var-and-push** — for short arrays consumed in one expression, write `stack([a, b, c])` rather than `var xs : array; xs |> emplace(a); xs |> emplace(b); stack(xs)`. Faster in interpreted mode and easier to read; same applies to table literals +- **Inline literals over temp-var-and-push** — for short arrays consumed in one expression, write `stack([a, b, c])` rather than `var xs : array; xs |> emplace(a); xs |> emplace(b); stack(xs)`. Faster in interpreted mode and easier to read; same applies to table literals and other bracketed constructors. Threshold: while it stays readable ### Type modifiers - **`==const`** on a parameter type — propagates the caller's constness (NOT "always non-const"): `def foo(self : MyStruct ==const)` accepts either `MyStruct` or `MyStruct const`, and inside the body `self`'s constness matches what the caller passed. Use plain `Foo?` for non-const-only, `Foo const?` for const-only, `Foo? ==const` when you want the callee to accept either and inherit the caller's view - **`-const`** strips constness in type expressions — used with `reinterpret` for interior mutability: `unsafe(reinterpret(addr(self)))` - **Function pointer with explicit type:** `@@<(var self : T) : RetT> funcName` — specifies the exact parameter/return types of a function pointer literal +- **OR types in params** (`T1 | T2 | …`) — a parameter may list alternative accepted types: `int | float | double`, or heterogeneous forms like `array | table | auto(NT)`. This is a **generic "OR" type, NOT a runtime tagged variant** — the function is monomorphized per the concrete argument type that matches one alternative, so each instantiation sees that concrete type with no per-call dispatch or unpacking cost. Don't "hoist the union cast out of the loop" — there is no union value; a cast like `float(n)` inside the body is just a trivial concrete cast in each instantiation. Use it to widen an overload set in one signature (e.g. `def fast(n : int | float | double)` accepting bare ints, floats, and doubles at the same call site) + +### Fixed arrays (structural since 0.6.3) + +`int[10]` is a **structural type** (`Type::tFixedArray`), not a qualifier vector on the element: one node per dimension, element in `firstType`, size in `fixedDim`, outermost first (`int[3][4]` = FA(3, FA(4, int))); ref/const/temporary live on the chain head. Operations act on the **outermost level only** (the one-peel rule). + +- **Generic binding:** `auto(TT)` binds the WHOLE array (`TT = int[3][4]`); `auto(TT)[]` peels one level (`TT = int[4]`, parameter constness inherited); `TT - []` in a return/alias removes ONE level — pre-0.6.3 docs saying "removes all dims" describe the deleted flattened world +- **`safe_addr(arr)` on a fixed array returns a pointer to the FIRST ELEMENT** (C-style decay; multi-dim peels one level: `int[2][3]` → `int[3]?#`) — this is what makes `glGetBooleanv(what, safe_addr(flags4))`-style C interop work +- **Typedefs compose:** `typedef M4 = float[4]` then `M4[10]` is `float[10][4]` — array-ness survives aliasing and generic substitution (the pre-0.6.3 flattening bugs are gone) +- **Runtime `TypeInfo` is still flattened** (`dim[]`/`dimSize`) — only the AST is structural. C++ `TypeDecl::isArray()` means "is a fixed array" in both pre- and post-rework daslang (useful for external modules spanning versions) +- Macro authors: das-side `TypeDecl` fields are `fixedDim`/`fixedDimExpr` (NOT `dim`/`dimExpr` — deleted); typemacro payloads live in `typeMacroExpr`; build chains with `make_fixed_array_type(total, element)` from `daslib/ast_boost` — details in `skills/das_macros.md` ### Important defaults @@ -125,6 +141,7 @@ All code MUST use gen2 syntax (add `options gen2` at the top of every file). Key - Structs/arrays/tables always pass by reference — no `&` needed. - Only **workhorse types** (`int`, `float`, `bool`, `string`, …, `isWorkhorseType` on the C++ side) pass by value. - **AST pointers (gc_node) pass by value** — copying the pointer, no refcount, no allocation. `def foo(p : ExpressionPtr)` shares the node; `var p` lets you reassign locally; `var p : ExpressionPtr&` propagates reassignment back. For mutable field access, take the param as `var`. +- **Lambdas are copyable.** A `lambda<…>` is a fat pointer to a heap-allocated capture frame; `=` and pass-by-value copy the pointer (creates an alias), and `push`/array storage works without `push_clone`. **`delete lam` requires `unsafe`** since other aliases may still be live — same rule as raw pointer / class `delete`. The unsafe-delete rule cascades: `array>`, structs with a lambda field, tuple/variant containing a lambda — all inherit the unsafe-delete requirement. - **Strings:** `var s : string` is a writable local copy (no propagation). `var s : string&` propagates. `:=` clones into current context's heap (required across contexts); plain `=` copies the pointer. - **Residual `smart_ptr` types** (`ProgramPtr`, `ContextPtr`, `FileAccessPtr`, `DebugAgentPtr`, `VisitorAdapterPtr`) still use refcount semantics — variables holding them need `var inscope`. AST types do NOT — see below. @@ -138,7 +155,7 @@ Quick rules: - Tools that build AST at runtime (outside the compile pipeline) must wrap their scope in `ast_gc_guard() { ... }` from `daslib/ast`, or the leak detector reports `GC APP LEAK` at exit. - daslang has garbage collection, but plain `var arr : array` does NOT finalize on scope exit. Either declare with `var inscope` (smart_ptr only), call `delete` explicitly, or move out via `<-`. Per-frame leaks in hot paths usually trace to a local `var arr` never deleted. -For migrating older code that uses `var inscope`/`<-` on AST types, see `skills/das_macros.md`. +Full migration table (when reading older docs that say `var inscope` or `<-` for AST types): **`skills/gc_migration.md`**. ### Context heaps and threading @@ -152,22 +169,25 @@ For migrating older code that uses `var inscope`/`<-` on AST types, see `skills/ ### Unsafe -- **`unsafe(expr)`** — narrow-scope unsafe, preferred over `unsafe { block }`. Limits unsafe to the exact expression that needs it +- **`unsafe(expr)`** — narrow-scope unsafe, preferred over `unsafe { block }`. Limits unsafe to the exact expression that needs it. Lint backs this: STYLE024 flags `unsafe` wraps with no descendant needing unsafe; STYLE025 flags blocks where exactly one statement needs unsafe (narrow to expression form); STYLE026 flags nested `unsafe { ... }` - **Local reference binding is unsafe:** `let blk & = expr` requires `unsafe` whenever it creates a local reference to a non-local expression — `let blk & = unsafe(expr)` - **Variant `as` read access is safe:** `(v as _field).member` works without `unsafe` after an `is` check - **Variant field assignment is always unsafe:** `v._field = value` and `set_variant_index(v, N)` require `unsafe` - **`reinterpret(expr)`** requires `unsafe` — used for const-stripping on regular pointers: `unsafe(reinterpret(const_ptr))` +- **`typeinfo is_unsafe_when_uninitialized(type)`** — the trait that gates unsafe-ness per type in generic code. Pairs with field-level `@safe_when_uninitialized` and struct-level `[safe_when_uninitialized]` annotations. Canonical use in `daslib/builtin.das`: declare `var x : TT` inside `static_if (typeinfo is_unsafe_when_uninitialized(type)) { unsafe { ... } } else { ... }` — the unsafe block needs `// nolint:STYLE025` on the `unsafe {` line because STYLE025 sees only one statement needing unsafe at any instantiation and can't reason across the static_if branches ### Error handling - `try/recover` — NOT `try/catch` (`recover` is the keyword) - `panic("message")`, `assert(condition)`, `verify(condition)` (stays in release) - **Postfix conditional:** `return expr if (cond)`, `break if (cond)`, `continue if (cond)` — early-exit guard on one line +- **Braceless early-exit:** prefer `if (cond) return X` (or postfix `return X if (cond)`) over `if (cond) { return X }` — STYLE005 flags the braced single-terminator form as noise +- **Panic is fatal, not an exception.** daslang has no C++/JS-style exception model. A `panic` (or failed `assert` / `verify`) means the program is broken — the only correct response is to print diagnostics and exit. `try/recover` exists to capture the message before exit so you can log it nicely, NOT to recover-and-continue. Do not write code that relies on continuing after `recover`; do not design APIs around panic-as-control-flow. Corollary: `{ body } finally { cleanup }` deliberately skips `cleanup` on panic (the cleanup can't run safely on a broken program); this is not a bug. Don't try to "fix" it; don't use `finally` for cleanup that needs to run on panic. If you need post-statements that run after a block in the normal path, just put them after the block — panic skips everything, and that's the design. ### Generic function dispatch -- **`_::foo(x)`**: resolves in the **calling** module — caller's overloads visible. Use in library generics. -- **Unqualified** `foo(x)`: resolves in the **defining** module — caller's overloads NOT visible. +- **`_::foo(x)`**: resolves in the **calling** module — caller’s overloads visible. Use in library generics. +- **Unqualified** `foo(x)`: resolves in the **defining** module — caller’s overloads NOT visible. - This is why `:=` and `delete` emit `_::clone` / `_::finalize` ### Dot as pseudo-pipe @@ -175,8 +195,8 @@ For migrating older code that uses `var inscope`/`<-` on AST types, see `skills/ `a.foo(b)` is sugar for `foo(a, b)` — but **only when `a` is a struct/class value** (chains: `a.foo().bar(x)` ≡ `bar(foo(a), x)`). - **Works on:** struct / class values (incl. by-ref). -- **Does NOT work on:** primitives (`let n = 5; n.double()` → "can't get field 'double' of int const&"), tuples/arrays, and **lambda typedefs**. For lambda-typed values you must use `|>`. -- **When telling someone "use pipe here":** check the receiver type — for structs `.method()` is idiomatic, for lambdas only `|>` works. +- **Does NOT work on:** primitives (`let n = 5; n.double()` → "can't get field 'double' of int const&"), tuples/arrays, and **lambda typedefs** — most importantly strudel's `Pattern` type (`typedef Pattern = lambda<...>`); `s("bd").fast(2.0lf)` fails. Pattern chains must use `|>` (or direct call). +- **When telling someone "use pipe here":** check the receiver type — for structs `.method()` is idiomatic, for lambdas only `|>` works. Don't say "daslang uses pipes instead of method chains" without qualification. ### Table operations @@ -184,21 +204,25 @@ For migrating older code that uses `var inscope`/`<-` on AST types, see `skills/ - `key_exists(table, key)` — check without inserting - `table |> insert(key, value)` / `table |> erase(key)` - **Never use two `[]` lookups on the same table in one expression** — re-hashing can invalidate references +- `table[key]` (read or assign) is **safe** — do NOT wrap in `unsafe(...)`. Some legacy daslib code has `unsafe(tab[k])`; do not propagate that pattern - **Move-assign table literal:** `tab <- { "k" => v }` works for both `var tab <- { ... }` declarations and `tab <- { ... }` reassignment to existing variables - **Table comprehension move-assign:** `tab <- { for(x in range(5)); x => x*x }` — same move-assign rules apply -- `table[key]` (read or assign) is **safe** — do NOT wrap in `unsafe(...)`. Some legacy daslib code has `unsafe(tab[k])`; do not propagate that pattern +- **`table` (one type param) is the set type** — value type elided. `var s : table; s |> insert(5); key_exists(s, 5)`. Distinct from `table` (the map form); both shapes coexist. Set-literal init: `let STOP_WORDS : table <- { "a", "an", "the" }` — value-less braces, comma-separated. Use this instead of declaring `var X : table` and populating in an `[init]` function. ### Iterators and `each` - `[unsafe_outside_of_for] def each(x) : iterator` makes a type iterable in `for` loops - When the iterator is named `each`, the call can be omitted: `for (v in each(x))` is identical to `for (v in x)` - Other iterator names (e.g. `filter`, `map`) cannot be omitted +- **Iterator element-const variance is pointer-like:** `iterator` flows into `iterator` (mut → const), not the reverse. So a generic param declared as `iterator` takes both `each(array)` (yields `iterator`) and iterator-comprehension (yields `iterator`) sources via a single instantiation. The `const` qualifier alone is enough — do NOT add `&` (that's a separate ref-form modifier, not what you want for variance) +- **Generic-mangling pitfall:** instantiations of the same generic that differ only in inner element-const (`iterator` vs `iterator`) currently hash-collide in the instance registry, producing `error[50609]: multiple instances of …` when both arise in one module. Workaround at the library level: declare the iterator overload as `iterator` instead of `iterator` — both source flavors then converge on a single instance per the variance rule above. Caveat: the constify makes `it` inside the body const, so the body must not move from / mutate / call non-const operators on `it`. linq.das only constifies `all` and `contains` for this reason; the rest stay vulnerable until the mangler is fixed upstream ### String access functions - **`peek_data(str) $(arr) { ... }`** — safe O(1) per-element read access to string as `array const#`. One `strlen` call total. Preferred over `character_at` for iteration. - **`modify_data(str) $(var arr) { ... }`** — returns a modified copy; allocates new string, opens as mutable `array`. Use for character-level transformations. - **`character_at(s, i)`** — O(n) per call (`strlen` + bounds check). Fine for isolated checks, but use `peek_data` in loops or hot paths. +- Pointer-based string access (`reinterpret`) is for core library implementations only — user code should use `peek_data`/`modify_data` for safety. ### Common gotchas @@ -206,18 +230,21 @@ For migrating older code that uses `var inscope`/`<-` on AST types, see `skills/ - String builder requires `unsafe` or `options persistent_heap` if returned - Tuple field access: `t._0`, `t._1`, `t._2` - Annotations: `[export]`, `[test]`; `options no_aot`, `options rtti` +- **`options` are MODULE-LOCAL for pass-macros** (`[lint_macro]` / `AstPassMacro`). The macro fires once per module in the require chain, reading `prog._options` from THAT module's options table — not the program-root's. So `options _my_lint_off = true` in `foo.das` suppresses YOUR lint in `foo`, but `require foo` from `bar.das` does not inherit the flag — `bar` gets linted unless it sets its own. Don't confuse with runtime options (`gc`, `multiple_contexts`, `persistent_heap`, `rtti`) which DO unify across the program codegen and effectively cascade up to consumers - **Visibility is a prefix keyword, not an annotation:** `def private foo()`, `struct private Foo { ... }`, `enum private E { ... }`, `variable private x = 0`, `alias private X = Y`. There is **no** `[private]` annotation — it's a grammar error -- **Field/variable annotations use `@name` only:** `@safe_when_uninitialized at : LineInfo`, `@sql_primary_key id : int64`. The `[name]` form is reserved for struct/function/global-level annotations and does NOT parse on a struct field +- **Field/variable annotations use `@name` only:** `@safe_when_uninitialized at : LineInfo`, `@sql_primary_key id : int64`, `@do_not_delete ctx : Context?`. The `[name]` form is reserved for struct/function/global-level annotations and does NOT parse on a struct field - `require` uses forward slash: `require daslib/linq` — NOT backslash - `require foo public` — re-exports `foo` transitively +- **`require` path resolution:** bare `require name` resolves a module-root mount (`daslib/x`, a registered module) **or** a *same-directory* `name.das`. To require a file *elsewhere in the tree* by path, use a **file-relative path with the explicit `.das` extension** — `require ../../foo/bar.das`. Dropping the `.das`, or using a non-mounted root-relative path, fails with `error[20605] missing prerequisite … file not found`. Upshot: a file needing a shared fixture should keep the fixture in its *own* directory (bare same-dir require) rather than reaching across the tree - `[export] def main()` defaults to returning `void`, but you can declare it as `def main() : int { ... return rc }` when you need to surface a non-zero process exit code (e.g. CLI tools whose callers branch on exit). Don't reach for `panic` just to force a non-zero exit; declare `: int` and `return rc` instead. - `push` copies (fails for non-copyable types), `emplace` moves (zeros source), `push_clone` clones (preserves source) -- Non-copyable types (`array`, `table`, lambdas): use `:=`, `push_clone`, or `<-` +- Non-copyable types (`array`, `table`): use `:=`, `push_clone`, or `<-`. (Lambdas are copyable — see above.) - Blocks cannot be stored/returned/captured — use lambdas or function pointers - Class methods: `def const`, `def abstract const`, `def static`; call syntax `obj.method()`, `obj->method()`, `obj |> method()` - **`is`/`as` on handled types checks EXACT type**, not C++ inheritance — `expr is ExprField` is `false` when `expr` is `ExprSafeField`. `as` on wrong type crashes. Must handle each concrete type explicitly. - `#pragma optimize` in AOT-generated code must be wrapped in `#ifdef _MSC_VER` — Clang warns on unknown pragmas -- **Macro-generated struct variables** need `default<$t(st)>` initialization (not `var x : $t(st)`) — avoids "uninitialized variable" errors for structs without field defaults +- **Macro-generated `var x : $t(st)`** (no init) trips `error[31016]` "uninitialized variable is unsafe" for **any** struct result/local type — field defaults do **not** exempt it. Fixes: `= default<$t(st)>` when the type is default-constructible; but for handled/backend types `default<>` is `error[50503] unsupported variable type` — then set `td.flags.safeWhenUninitialized = true` on the (cloned) decl type when the uninitialized read is intentional and discarded (canonical: the `[flatten]` return accumulator in `daslib/flatten.das`) +- `print` is fine for application scripts. In library/tool code prefer `to_log(LOG_INFO|LOG_WARNING|LOG_ERROR)` — same stdout, but level-tagged and filterable ### Code style — prefer idiomatic forms @@ -227,12 +254,37 @@ For migrating older code that uses `var inscope`/`<-` on AST types, see `skills/ | `get_ptr(x) == null` / `get_ptr(x).field` | `x == null` / `x.field` | AST pointers auto-dereference; `get_ptr` is smart_ptr-era residue | | `string(das_str) == "lit"`, `!empty(string(das_str))` | drop the `string(...)` cast | `das_string` compares with `string` directly; `empty()` works on it | | `let v = string(x.name); $i(v)` / `var copy = val; $v(copy)` | `$i(x.name)` / `$v(val)` | qmacro tags accept `das_string`, `let` vars, loop vars directly | +| 6 qmacro arms differing only in the call target (`if isTry { qmacro(_::try_run_select(…)) } elif … { … }`) | `let fname = (isTry ? "try_run_select" : "run_select") + suffix; qmacro($c(fname)(…))` | `$c(stringVar)` splices a function name; resolution at splice site uses user's `require` chain. Note: `_::$c(…)` is a parse error — drop `_::` | | `if (true) { ... }` | `{ ... }` | bare blocks create lexical scope in gen2 | | `var inscope r <- expr; return <- r` | `return <- expr` | direct return avoids intermediate | | `unsafe { (reinterpret blk).list }` / `unsafe(reinterpret x)` | make param `var` + plain `x.list` | `var` param gives non-const field access without reinterpret | +| `if (cond) { return X }` (or `{ break }` / `{ continue }`) | `if (cond) return X` or postfix `return X if (cond)` | STYLE005: braces around a single-statement early-exit are noise | +| `for (i in range(length(arr))) { ... arr[i] ... }` where `i` is used only as `arr[i]` | `for (c in arr) { ... c ... }` | PERF018: direct iteration drops the index variable | +| `from_JV(v, type, 13)` | `v ?? 13` | STYLE020: json_boost provides `operator ??` for every scalar `from_JV` overload | +| `var args : table; args \|> insert("k1", JV(v1)); args \|> insert("k2", JV(v2))` | `var args = JV((k1=v1, k2=v2))` | STYLE021: named-tuple JV form (json_boost.das:638) is one line instead of N | +| `int(BfT.a) \| int(BfT.b)` (same bitfield, or enum with `operator \|`) | `int(BfT.a \| BfT.b)` | PERF019: collapse two int casts to one. Const-foldable forms only surface under lint policies | +| `int64(a)` where `a : int64` (or any of the 15 workhorse casts: `int*`/`uint*`/`float`/`double`/`string`/`bitfield*`) | `a` | PERF020: same-type workhorse cast is a no-op `ExprCall`. Match is `baseType`-strict, so widening/narrowing/signedness/float↔int still fire as genuine work. User-named bitfield/enum ctors (`MyBitfield(x)`) and vector ctors (`int2(x,y)`) are out of scope | +| `foo \|= BfT.m` / `foo &= ~BfT.m` (bitfield `foo`, single named bit) | `foo.m = true` / `foo.m = false` | STYLE022: bitfield-as-field assignment reads bit-name-first, drops the `~` for clears | +| `uint(bf & BfT.m) != 0u` / `int(bf & BfT.m) == 0` (bitfield `bf`, single named bit) | `bf.m` / `!bf.m` | STYLE023: bitfield-as-field read; drop the int cast + `!= 0` / `== 0` compare | +| `unsafe(x + y)` / `unsafe { let d = x + y }` where nothing inside requires unsafe | drop the wrap | STYLE024: redundant `unsafe` — flagged when no descendant matches a known inherently-unsafe shape (reinterpret/upcast cast, `delete`, `addr`, table-index, variant-write, ExprCallFunc with `unsafeOperation`). Macro-generated subtrees (`genFlags.generated == true`) skipped per design | +| `unsafe { stmt1; stmt2; stmt_needing_unsafe }` (only ONE stmt actually needs unsafe) | `stmt1; stmt2; unsafe()` | STYLE025: narrow block-form unsafe to expression-form on the single unsafe-needing statement. Silent when ≥2 statements need unsafe (block is justified) | +| `unsafe { ...; unsafe { ... }; ... }` (nested `unsafe { }` block) | drop the inner wrap | STYLE026: outer `unsafe` already covers the whole inner scope, so the inner block is pure noise. Closure / lambda / generator bodies are NOT nested for this rule — they execute in a separate context where the outer wrap does not propagate | +| `for (s in A) { B \|> push(s) }` / `push_clone(s)` (iter-var only) | `B \|> push_from(A)` / `push_clone_from(A)` | PERF022: the bulk overload in builtin.das reserves combined capacity up front. Single name `push`/`push_clone` is overloaded between single-element and bulk (ambiguous when destination is `array`); the `_from` suffix names the bulk intent. Source must be `array` or C-array — range/iterator sources are not flagged. `emplace` is out of scope (const iter-var can't be moved) | +| `var a : array; for (x in SRC) { if (COND) { a \|> push(EXPR) } }` (or `table` + `insert`/`a[k]=v`) | `var a <- [for (x in SRC); EXPR; where COND]` (or `\{for (...); k => v; where ...\}`) | STYLE027: var with empty default-init followed by a for-loop that only push/insert into it. Accepts depth ≤ 2 nested fors and if-filters at any depth. `emplace` excluded — move-source-zeroing differs from comprehension element-construction. Iterator-comprehension form (`[$f ...]`) NOT suggested | +| `var X = clone_expression(E); ... $e(X) ...` (only-uses-are-qmacro-splice) | drop the pre-clone, inline `$e(E)` at each splice site | PERF023: `qmacro`/`qmacro_block`/`qmacro_expr`/`qmacro_block_to_array` go through `apply_template` (templates_boost.das:251), which calls `clone_expression` on every substitution input. Pre-cloning is wasted work. Detection: post-expansion `$e(X)` becomes `add_ptr_ref(X)` inside an `ExprMakeBlock`; visitor tracks splice-wrapper depth via preVisitExprCall/visitExprCall counter on `add_ptr_ref`, classifies each candidate `ExprVar` reference as "safe" when depth>0. Fires only when ALL uses are safe AND ≥1 is observed. Multi-clone-of-same-source flagged too — apply_template clones each substitution independently | For path/filename ops use `fio` helpers (`base_name`/`dir_name`/`path_join`/etc.) — see `skills/filesystem.md`. Never hand-roll `rfind("/")` / slice — misses Windows separators. +**Minimize `unsafe`:** Most `unsafe(reinterpret)` in macro code exists to strip `const` from raw-pointer field access. Fix the root cause: make the function parameter `var` so field access returns non-const pointers. Reserve `unsafe` for genuinely unsafe operations (pointer arithmetic, `reinterpret` across unrelated types). + +**Comment hygiene.** Comments are 1–2 lines max. Strict rules: + +1. **No banner comments above a documented function.** When a function carries `//!` inside its body, drop the `// ===== name — desc =====` block above. The banner duplicates the doc. +2. **No multi-paragraph architectural prose at the head of a section.** Don't write 10–30 line preambles explaining design decisions, surface examples, NULL handling, panic semantics, etc. above `// Section name`. Code reads well; design docs carry the WHY. If a reader genuinely needs that context, it goes in those docs, not the source. +3. **Private functions and types don't get public-style docs.** `//!` / `//!<` is for tooling-visible public API. On `def private`, `struct private`, `enum private`, `variant private`, drop the docstring entirely — the symbol isn't exported, so no doc generator ever sees it, and the docstring just restates the function name / field name to a reader who already has them. If a function or field genuinely needs a 1-line WHY (non-obvious invariant, surprising behavior), write a plain `// ...` line, not `//!`. The bar for keeping any comment on a private symbol is "a maintainer reading the symbol alone would be surprised." +4. **What stays:** terse 2-line section dividers (`// ===== Section name =====`), `//!` docstrings on PUBLIC functions/types (visible to tooling), and inline `//` comments that flag a non-obvious WHY at the *exact* line — a workaround for a specific bug, a subtle invariant, behavior that would surprise a reader. Don't restate what the code says. +5. **When in doubt:** delete. If reading the code + the relevant docstring(s) doesn't make the WHY clear, the comment was load-bearing. Otherwise it was noise. + ## SDK Directory Layout - `bin/` — Compiler binaries (`daslang`, `daslang-live`, `das-fmt`, `clang-cl`, `sqlite_shell`) diff --git a/install/skills.list b/install/skills.list index a926ec031..a7acf9902 100644 --- a/install/skills.list +++ b/install/skills.list @@ -13,6 +13,8 @@ decs.md detect_dupe.md dynamic_modules.md filesystem.md +gc_migration.md +glob.md jobque_debugging.md json.md linq.md @@ -20,6 +22,8 @@ mcp_tools.md memory_leak_detection.md project_overview.md regex.md +sql.md +strings.md strudel_port.md writing_tests.md xml.md diff --git a/skills/das_macros.md b/skills/das_macros.md index e0f6f1f9f..8599639c6 100644 --- a/skills/das_macros.md +++ b/skills/das_macros.md @@ -112,6 +112,16 @@ Note adapters can still *emit* code referencing the contributor's symbols by nam - **`typeDecl.argTypes`** — array of `TypeDeclPtr` representing function-type arguments (indices: 0 = first parameter). For interface method fields, `argTypes[0]` is the `self` parameter — check `.isConst` to determine if the method is const. - **Interior mutability pattern** — when a const getter needs to lazily mutate a cache: declare param as `self : T ==const`, then `var pS = unsafe(reinterpret(addr(self)))` to strip const for cache mutation. Used in `daslib/interfaces.das` for const-only interface proxy caching. +### Fixed arrays in the AST (tFixedArray, since 0.6.3) + +`int[3][4]` is a chain of `TypeDecl` nodes, NOT a `dim` vector on the element — the `dim`/`dimExpr` fields are **deleted**: + +- One node per dimension: `baseType == Type.tFixedArray`, element in `firstType`, size in `fixedDim`, **outermost first** (`int[3][4]` = FA(3, FA(4, int))). Operate on the head's `fixedDim`/`firstType` and recurse — never assume one node covers all dims (the one-peel rule). +- `fixedDim` sentinels pre-inference: `TypeDecl.dimAuto` (-1) for `[]`, `TypeDecl.dimConst` (-2) while `fixedDimExpr` awaits constant folding. Post-inference both are resolved; `fixedDim <= 0` reaching final verify is an error. +- ref/const/temporary qualifiers live on the **chain head only**. Build chains with `make_fixed_array_type(total, element)` from `daslib/ast_boost` — it hoists the element's qualifiers onto the new head for you. +- **Typemacro payloads moved**: `$mytag(args...)` argument expressions are in `typeMacroExpr`, not `dimExpr`. Update any pre-0.6.3 macro that read `t.dimExpr` for tag payloads. +- Walking to the element: `var leaf = t; while (leaf.baseType == Type.tFixedArray && leaf.firstType != null) { leaf = leaf.firstType; }` — collect `fixedDim` per level if you need the flattened dims (runtime `TypeInfo.dim[]` stays flattened; only the AST is structural). + ## Shared AST-match helpers `daslib/ast_match.das` exposes a small set of public helpers harvested from `linq_fold` + `sqlite_linq` during the 2026-05 refactor. Reach for these BEFORE writing a new `is X / as X` cascade — they capture the exact semantics each pattern was hand-rolling, with module-gating and generic-instantiation transparency baked in. diff --git a/skills/gc_migration.md b/skills/gc_migration.md index 763c833b1..7ab9e79cb 100644 --- a/skills/gc_migration.md +++ b/skills/gc_migration.md @@ -328,8 +328,8 @@ Without this, the leak detector reports "GC APP LEAK" at exit. After migrating `.das` code: -1. **Compile-only check:** `bin/Release/daslang.exe -compile-only your_file.das` -2. **Run tests:** `bin/Release/daslang.exe dastest/dastest.das -- --test your_tests/` +1. **Compile-only check:** `bin/daslang -compile-only your_file.das` +2. **Run tests:** `bin/daslang dastest/dastest.das -- --test your_tests/` 3. **Check for GC leaks:** look for "GC COMPILE LEAK" / "GC APP LEAK" in output 4. **Common compile errors after migration:** - `"can only move to from a reference"` → change `<-` to `=` diff --git a/skills/glob.md b/skills/glob.md index 1bd1b8c25..3a3b08a0c 100644 --- a/skills/glob.md +++ b/skills/glob.md @@ -94,7 +94,7 @@ parse_file_list("zzz.das,assets/**/*.png,aaa.das", files) // the glob's matches sit between them, sorted within their slice ``` -If you're rolling your own expansion (don't, unless `parse_file_list` doesn't fit), the same pattern: per-glob `sort()` + `append`, never a final global sort. See the regression test in [tests/fio/expand_glob_test.das](tests/fio/expand_glob_test.das) (`test_parse_file_list_order_preserved`). +If you're rolling your own expansion (don't, unless `parse_file_list` doesn't fit), the same pattern: per-glob `sort()` + `append`, never a final global sort. ## Performance and correctness gotchas diff --git a/skills/linq.md b/skills/linq.md index 8be49b8db..4858105a4 100644 --- a/skills/linq.md +++ b/skills/linq.md @@ -131,8 +131,8 @@ The rule applies to any sequence-consuming high-order primitive, including strin Despite all the above, sometimes the right answer is `for`: ```das -for (itd in typeDecl.dim) { - write(writer, ",{itd}>") +for (argT in typeDecl.argTypes) { + write(writer, ",{describe(argT)}>") } ``` @@ -152,7 +152,7 @@ Do NOT shoehorn side effects into `_select` lambdas — they're meant to be pure let xs <- to_array(map(each(arr), @(x) { return f(x); })) // Functional + iterator surgery in one expression: -let args <- to_array(map(each(typeDecl.dim).reverse(), @(itd) { return ",{itd}>"; })) +let args <- to_array(map(each(typeDecl.argTypes).reverse(), @(argT) { return ",{describe(argT)}>"; })) reverse(args) write(writer, "{join(args, "")}") ``` @@ -170,8 +170,8 @@ let xs <- arr._where(_.flag)._select(_.name).to_array()._fold() let csv = (arr._select(_.name).to_array()._fold()) |> join(", ") // Plain for-loop — the body has a side effect: -for (itd in typeDecl.dim) { - write(writer, ",{itd}>") +for (argT in typeDecl.argTypes) { + write(writer, ",{describe(argT)}>") } ``` diff --git a/skills/sql.md b/skills/sql.md index a1561aecd..3c1bb7511 100644 --- a/skills/sql.md +++ b/skills/sql.md @@ -219,7 +219,7 @@ to_log(LOG_INFO, "SQL: {_sql_text(db |> select_from(type) |> _where(_.Price ### Composability -Inner `_where` / `_select` compose freely. User-defined `[call_macro]` wrappers cascade through the analyzer — write `[call_macro(name="when_price_lt")]` that expands to `_where(it, _.Price < val)` and use `_sql(db |> select_from(type) |> when_price_lt(200))`. The wrapper's body becomes the same `where_` shape the analyzer sees for a hand-written `_where`. Tested by `tests/dasSQLITE/test_07_sql_composability.das`. +Inner `_where` / `_select` compose freely. User-defined `[call_macro]` wrappers cascade through the analyzer — write `[call_macro(name="when_price_lt")]` that expands to `_where(it, _.Price < val)` and use `_sql(db |> select_from(type) |> when_price_lt(200))`. The wrapper's body becomes the same `where_` shape the analyzer sees for a hand-written `_where`. ## `_each_sql` — streaming @@ -697,8 +697,6 @@ Strings produced by `query` / `_sql` are allocated on the calling context's heap ## Reference -- Tutorials — every shipped feature has a runnable file under [tutorials/sql/](../tutorials/sql/) (45 files). Teaching order is documented in [modules/dasSQLITE/TUTORIALS.md](../modules/dasSQLITE/TUTORIALS.md). -- Implementation — [daslib/sqlite_boost.das](../modules/dasSQLITE/daslib/sqlite_boost.das), [daslib/sqlite_linq.das](../modules/dasSQLITE/daslib/sqlite_linq.das), [daslib/sqlite_migrate.das](../modules/dasSQLITE/daslib/sqlite_migrate.das). -- Design notes — [API_REWORK.md](../modules/dasSQLITE/API_REWORK.md) (the master plan; per-chunk decision log), [API_MIGRATION.md](../modules/dasSQLITE/API_MIGRATION.md) (migrations design walk), [API_CHECKED.md](../modules/dasSQLITE/API_CHECKED.md) (parity audit), [API_MISSING.md](../modules/dasSQLITE/API_MISSING.md) (deferred-feature list). -- Tests — `tests/dasSQLITE/` — every operator + macro has a focused test, plus `failed_*.das` files for compile-error cases. +- Tutorials — every shipped feature has a runnable file under `tutorials/sql/` (45 files). +- Implementation — `modules/dasSQLITE/daslib/sqlite_boost.das`, `sqlite_linq.das`, `sqlite_migrate.das`. - Related skills — `skills/json.md` (`@sql_json` columns), `skills/linq.md` (`_sql` is LINQ-shaped), `skills/das_macros.md` (`[sql_table]` / `_sql` are macros), `skills/gc_migration.md` (`SqlRunner` is one of the residual smart_ptr types).