Raise deterministic-core test coverage to 95%+ (Models 97.1%, Support 96.0%) by FuJacob · Pull Request #678 · FuJacob/cotabby

FuJacob · 2026-06-12T00:37:35Z

Summary

Builds the regression net the suggestion pipeline has been missing: 362 new tests (1,049 to 1,411) across 32 new and 31 extended suites, taking whole-target line coverage from 31% to 42.4% and the deterministic core past the 95% bar. Coverage was driven from a per-file xccov gap analysis of a baseline run, so every suite targets lines that were actually unexercised, with behavioral assertions (exact outputs, persisted state, which boundary got poked) rather than call-only coverage.

Area	Before	After
`Cotabby/Models`	58.4%	97.1%
`Cotabby/Support`	82.8%	96.0%
`Cotabby/App` (coordinators)	44.1%	69.5%
`Cotabby/Services`	21.4%	35.5%
`Cotabby/UI` (SwiftUI)	2.9%	2.9% (out of scope)
Whole target	31.0%	42.4%

What the new nets actually lock:

The coordinator state machine end to end through a shared recording rig (SuggestionCoordinatorTestSupport): debounce, request build, engine dispatch, every freshness gate in apply (stale generation, empty result, selected text, acceptance echo), the typo gate's suppress/offer/auto-fix routes, input-event routing, focus/permission/settings reactions, and lifecycle teardown.
The settings facade (SuggestionSettingsModel): a full setter persistence round-trip through a fresh model on the same defaults suite, keybinding conflict rules, power-profile seeding, normalization funnels, clamps, and the snapshot publisher's dedupe.
Live AX against the test host's own process: AXHelper (4.6% to 71.9%) and FocusSnapshotResolver (0.7% and now exercised end to end on real AppKit text fields, secure fields included). These suites XCTSkip cleanly where self-process AX is unavailable, so CI stays green either way.
Engine routing (SuggestionEngineRouter): per-engine dispatch, the Apple Intelligence locale-failure fallback to llama, cancellation passthrough, and performance-metric recording.
The JSONL debug sinks: record shape (the jq contract in CLAUDE.md), metadata flattening, and the 10 MB one-step rotation that preserves history.

Remaining gaps are deliberate, not silent: UI/ (13.6k lines of SwiftUI, needs UI-test infrastructure), live-system boundaries in Services/ (CGEvent taps, screenshots, the llama runtime, window controllers), AXHelper's Chromium text-marker internals (need a live browser), and the OS-gated SystemAvailabilityProvider.

Validation

xcodebuild test -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' \
  -skip-testing:CotabbyTests/FoundationModelDriftEvalTests \
  CODE_SIGNING_ALLOWED=NO CODE_SIGNING_REQUIRED=NO
# ** TEST SUCCEEDED ** 1411 tests, 0 failures, 0 crashes

# Coverage measured with -enableCodeCoverage YES on the same run; table above is from
# xcrun xccov view --report on the result bundle.

swiftlint lint --quiet
# 0 findings on the files this PR touches

The five new tests that touch live AX run for real on a trusted machine (verified locally) and skip on untrusted/headless runners.

Linked issues

None filed; this is proactive hardening so regressions like the recent caret/acceptance ones get caught at test time.

Risk / rollout notes

Production changes are test seams only (+37/-4 across 5 files): FileLogWriter/LLMIOFileWriter gain an optional fileURL: init parameter (default behavior byte-identical), and five classes gain nonisolated deinit {} (SuggestionDebugLogger, FileLogWriter, LLMIOFileWriter, FieldStyleCache, DeepGeometryWalkThrottle).
Why the deinits matter beyond tests: the app target's SWIFT_DEFAULT_ACTOR_ISOLATION: MainActor makes even unannotated classes main-actor-isolated, so their deallocation routes through the back-deployment executor shim, which double-frees in StopLookupScope on macOS 26 (confirmed via crash reports during this work; same root cause as the existing InputSuppressionController fix). Production never hit it only because these objects are app-lifetime singletons. Any future code that deallocates one would crash; now it cannot.
New tests follow the repo's determinism rules: suite-scoped UserDefaults, temp-dir file I/O, no network, no machine-dependent assertions (system samplers assert sane bounds; spell-checker tests use words NSSpellChecker classifies stably).
Suite wall time grows by roughly 30 seconds locally; the live-AX suites build one offscreen-ish window each, once per suite.
project.pbxproj regenerated by XcodeGen (test-file additions only).

Greptile Summary

This PR adds 362 new tests (1,049 → 1,411) to lock the deterministic core of the suggestion pipeline, raising Cotabby/Models coverage to 97.1% and Cotabby/Support to 96.0%. Production changes are five nonisolated deinit {} additions and two fileURL: init parameters that are strictly test-seam work with byte-identical default behaviour.

Source changes (+37/-4 across 5 files): FileLogWriter and LLMIOFileWriter gain an optional fileURL: override for testable I/O; SuggestionDebugLogger, FieldStyleCache, and DeepGeometryWalkThrottle gain empty nonisolated deinit {} to prevent the macOS 26 back-deployment executor shim double-free when test-scoped instances are released.
New coordinator test harness (SuggestionCoordinatorTestSupport): a shared CoordinatorRig struct wires all major protocol doubles so the prediction, input, and lifecycle suites each assert which boundary was poked rather than just that a call happened.
Live AX suites (AXHelperTests, FocusSnapshotResolverLiveTests): run against the test host's own AppKit windows and XCTSkip cleanly on untrusted/headless runners, keeping CI green.

Confidence Score: 5/5

Safe to merge. The five production-file changes are minimal test seams with no behavioural effect on the default code paths, and the test suite grew by 362 assertions that all pass.

Production changes are limited to two optional init parameters (default value preserves existing behaviour) and five empty nonisolated deinit {} bodies that only affect deallocation order for objects that are app-lifetime singletons in production. All new tests are deterministic, use isolated UserDefaults suites or temp directories, and the live-AX cases skip cleanly on headless CI. No logic changes to the suggestion pipeline itself.

CotabbyTests/SuggestionEngineRouterTests.swift — UserDefaults domains created per rig are never removed in tearDown, unlike the analogous pattern in PerformanceMetricsStoreTests. Worth a follow-up cleanup but does not affect correctness.

Important Files Changed

Filename	Overview
Cotabby/Support/FileLogHandler.swift	Adds optional `fileURL:` init param (nil → existing default path) and a `nonisolated deinit {}` to prevent the macOS 26 back-deployment executor shim double-free; production behaviour is byte-identical.
Cotabby/Support/LLMIOFileHandler.swift	Mirrors FileLogHandler changes: optional `fileURL:` init param and `nonisolated deinit {}`. Default production path unchanged.
Cotabby/Services/Focus/DeepGeometryWalkThrottle.swift	Adds `nonisolated deinit {}` only; no logic change. Prevents macOS 26 crash when test-scoped resolver instances are released.
Cotabby/Services/Focus/FieldStyleCache.swift	Adds `nonisolated deinit {}` only; all stored state is value types, safe to release off-actor.
Cotabby/Services/Suggestion/SuggestionDebugLogger.swift	Adds `nonisolated deinit {}` only. Stored state is a `Bool` and an optional `String`, both safe to release from any thread.
CotabbyTests/SuggestionCoordinatorTestSupport.swift	Introduces shared coordinator rig doubles and the `waitUntil` polling helper. All rig types are explicit `@MainActor` (safe deallocation). `waitUntil` calls `XCTFail` on timeout but does not stop execution, so a single debounce regression can cascade into multiple misleading assertion failures.
CotabbyTests/FileLogHandlerTests.swift	Good coverage of JSONL shape, metadata collision, rotation, and append-across-instances. The append test relies on implicit ARC deallocation ordering for a shared file path (noted in thread).
CotabbyTests/SuggestionSettingsModelTests.swift	Comprehensive facade coverage: persistence round-trips, keybinding conflicts, power-profile seeding, disabled-app rules, and snapshot deduplication. Uses isolated UserDefaults suites with proper tearDown cleanup.
CotabbyTests/SuggestionEngineRouterTests.swift	Good routing, fallback, cancellation, and metric coverage. Uses process-lifetime static retain to avoid macOS 26 dealloc crash, but UserDefaults domains created per rig are never removed (unlike PerformanceMetricsStoreTests which cleans up in tearDown).
CotabbyTests/FocusSnapshotResolverLiveTests.swift	End-to-end live AX resolver coverage against real AppKit fields. `XCTSkip` properly guards unavailable AX environments. Tests each set their required state independently, so order sensitivity is not an issue.
CotabbyTests/SuggestionCoordinatorPredictionTests.swift	Exercises debounce, freshness gates, engine dispatch, and overlay teardown. Rig instances are stored in an instance-scoped array cleared in tearDown; coordinator and work controller are explicitly @mainactor so deallocation is safe.
CotabbyTests/RuntimeBootstrapModelTests.swift	Selection persistence and reconciliation tested against throwaway model directories. Correctly uses process-lifetime retain for @MainActor-isolated manager and properly cleans up both temp directories and UserDefaults suites in tearDown.
CotabbyTests/AXHelperTests.swift	Live AX bridging tests against the test host's own process. Window is built once per suite (static class state), XCTSkip guards headless environments. RunLoop spinning pattern is correct for waiting on AX tree registration.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Production["Production changes (test seams only)"]
        FLW["FileLogWriter\n+ fileURL: init param\n+ nonisolated deinit {}"]
        LLMIOW["LLMIOFileWriter\n+ fileURL: init param\n+ nonisolated deinit {}"]
        SDL["SuggestionDebugLogger\n+ nonisolated deinit {}"]
        FSC["FieldStyleCache\n+ nonisolated deinit {}"]
        DGWT["DeepGeometryWalkThrottle\n+ nonisolated deinit {}"]
    end

    subgraph TestHarness["CoordinatorRig (new shared rig)"]
        CRig["makeCoordinatorRig()"]
        CRig --> PP["RigPermissionProvider"]
        CRig --> FP["RigFocusProvider"]
        CRig --> IM["RigInputMonitor"]
        CRig --> OC["RigOverlayController"]
        CRig --> ENG["RigSuggestionEngine"]
        CRig --> COORD["SuggestionCoordinator"]
    end

    subgraph TestSuites["New test suites"]
        PRED["SuggestionCoordinatorPredictionTests\n(debounce / freshness gates)"]
        INPUT["SuggestionCoordinatorInputTests\n(key routing / focus / permissions)"]
        LIFE["SuggestionCoordinatorLifecycleTests\n(start / stop / settings reaction)"]
        ROUTER["SuggestionEngineRouterTests\n(per-engine dispatch / fallback / metrics)"]
        SETTINGS["SuggestionSettingsModelTests\n(persistence round-trips / policies)"]
        AX["AXHelperTests + FocusSnapshotResolverLiveTests\n(live AppKit AX; XCTSkip on headless)"]
        LOGS["FileLogHandlerTests + LLMIOFileHandlerTests\n(JSONL shape / rotation)"]
    end

    CRig -->|drives| PRED
    CRig -->|drives| INPUT
    CRig -->|drives| LIFE
    FLW -->|injected via fileURL| LOGS
    LLMIOW -->|injected via fileURL| LOGS
    COORD --> ROUTER
    COORD --> SETTINGS

_{Reviews (2): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile}

…96%) Adds 362 tests (1049 to 1411) across 32 new and 31 extended suites, lifting whole-target line coverage from 31% to 42.4%: Models 58% to 97.1%, Support 83% to 96.0%, App 44% to 69.5%, Services 21% to 35.5%. The coordinator state machine now has dedicated prediction/input/lifecycle suites built on a shared recording rig, the settings facade has a full persistence round-trip lock, and AXHelper plus FocusSnapshotResolver run live against the test host's own AX tree (XCTSkip-gated where self-process AX is unavailable). Production changes are test seams only: injectable log-file URLs for the two JSONL writers (defaults unchanged), and nonisolated deinit on five classes that the app target's default MainActor isolation otherwise routes through the back-deployment executor shim, which double-frees in StopLookupScope on macOS 26 the first time anything deallocates them (tests do; the production singletons never did).

greptile-apps · 2026-06-12T00:45:53Z

+    while !condition() {
+        guard Date() < deadline else {
+            XCTFail(message(), file: file, line: line)
+            return
+        }
+        try? await Task.sleep(nanoseconds: 5_000_000)
+    }
+}


waitUntil does not halt execution after timeout

XCTFail marks the test as failed but does not stop execution. In any caller that follows an await waitUntil(...) call with further assertions on state that was never reached, those assertions will run against a default/nil/zero value and either silently pass (masking the real failure) or produce a second, confusing XCTFail whose message gives no indication that a timeout was the root cause. The coordinator tests use this helper extensively, so a single debounce regression could produce 5–10 unrelated-looking assertion failures instead of one clear timeout message.

Consider returning Bool and having callers XCTAssert the result, or throwing a custom error so callers can try-chain and early-exit.

greptile-apps · 2026-06-12T00:45:54Z

+
+        let url = try XCTUnwrap(writer.fileURL)
+        let rotatedURL = url.deletingPathExtension().appendingPathExtension("jsonl.1")
+        XCTAssertEqual(lines(of: url), ["fresh"])
+        XCTAssertEqual(lines(of: rotatedURL).count, 2)
+        XCTAssertTrue(lines(of: rotatedURL).allSatisfy { $0 == String(repeating: "a", count: 40) })
+    }
+
+    func test_writer_appendsAcrossInstancesLikeARelaunch() throws {
+        let url = directory.appendingPathComponent("cotabby.jsonl")
+        FileLogWriter(sizeCapBytes: nil, fileURL: url).write("first\n")
+


Append test relies on implicit ARC deallocation timing

FileLogWriter(sizeCapBytes: nil, fileURL: url).write("first\n") creates a temporary whose FileHandle must be closed before the second writer calls openHandle() and seeks to end. With nonisolated deinit {}, Swift's ARC releases the handle synchronously at the end of the statement — reliable today. However, if this test ever becomes async or runs inside a Task, the runtime can extend temporary lifetimes across suspension points, leaving the first handle open when the second writer opens the same file. The second seekToEnd() would then return the offset already at "first\n" and the append assertion would still pass, but a subsequent rotation-boundary test on the same file could silently see wrong data. An explicit do { let w = ...; w.write(...) } scope makes the lifetime contract visible and future-proof.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

# Conflicts: # Cotabby.xcodeproj/project.pbxproj

greptile-apps Bot reviewed Jun 12, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into test-coverage

c52659a

# Conflicts: # Cotabby.xcodeproj/project.pbxproj

FuJacob merged commit c458da3 into main Jun 12, 2026
4 checks passed

FuJacob mentioned this pull request Jun 12, 2026

Eliminate all 68 Swift 6 forward-compat build warnings #684

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Raise deterministic-core test coverage to 95%+ (Models 97.1%, Support 96.0%)#678

Raise deterministic-core test coverage to 95%+ (Models 97.1%, Support 96.0%)#678
FuJacob merged 2 commits into
mainfrom
test-coverage

FuJacob commented Jun 12, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

greptile-apps Bot Jun 12, 2026

Uh oh!

greptile-apps Bot Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FuJacob commented Jun 12, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Linked issues

Risk / rollout notes

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FuJacob commented Jun 12, 2026 •

edited by greptile-apps Bot

Loading