Skip to content

Raise deterministic-core test coverage to 95%+ (Models 97.1%, Support 96.0%)#678

Merged
FuJacob merged 2 commits into
mainfrom
test-coverage
Jun 12, 2026
Merged

Raise deterministic-core test coverage to 95%+ (Models 97.1%, Support 96.0%)#678
FuJacob merged 2 commits into
mainfrom
test-coverage

Conversation

@FuJacob

@FuJacob FuJacob commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

Builds the regression net the suggestion pipeline has been missing: 362 new tests (1,049 to 1,411) across 32 new and 31 extended suites, taking whole-target line coverage from 31% to 42.4% and the deterministic core past the 95% bar. Coverage was driven from a per-file xccov gap analysis of a baseline run, so every suite targets lines that were actually unexercised, with behavioral assertions (exact outputs, persisted state, which boundary got poked) rather than call-only coverage.

Area Before After
Cotabby/Models 58.4% 97.1%
Cotabby/Support 82.8% 96.0%
Cotabby/App (coordinators) 44.1% 69.5%
Cotabby/Services 21.4% 35.5%
Cotabby/UI (SwiftUI) 2.9% 2.9% (out of scope)
Whole target 31.0% 42.4%

What the new nets actually lock:

  • The coordinator state machine end to end through a shared recording rig (SuggestionCoordinatorTestSupport): debounce, request build, engine dispatch, every freshness gate in apply (stale generation, empty result, selected text, acceptance echo), the typo gate's suppress/offer/auto-fix routes, input-event routing, focus/permission/settings reactions, and lifecycle teardown.
  • The settings facade (SuggestionSettingsModel): a full setter persistence round-trip through a fresh model on the same defaults suite, keybinding conflict rules, power-profile seeding, normalization funnels, clamps, and the snapshot publisher's dedupe.
  • Live AX against the test host's own process: AXHelper (4.6% to 71.9%) and FocusSnapshotResolver (0.7% and now exercised end to end on real AppKit text fields, secure fields included). These suites XCTSkip cleanly where self-process AX is unavailable, so CI stays green either way.
  • Engine routing (SuggestionEngineRouter): per-engine dispatch, the Apple Intelligence locale-failure fallback to llama, cancellation passthrough, and performance-metric recording.
  • The JSONL debug sinks: record shape (the jq contract in CLAUDE.md), metadata flattening, and the 10 MB one-step rotation that preserves history.

Remaining gaps are deliberate, not silent: UI/ (13.6k lines of SwiftUI, needs UI-test infrastructure), live-system boundaries in Services/ (CGEvent taps, screenshots, the llama runtime, window controllers), AXHelper's Chromium text-marker internals (need a live browser), and the OS-gated SystemAvailabilityProvider.

Validation

xcodebuild test -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' \
  -skip-testing:CotabbyTests/FoundationModelDriftEvalTests \
  CODE_SIGNING_ALLOWED=NO CODE_SIGNING_REQUIRED=NO
# ** TEST SUCCEEDED ** 1411 tests, 0 failures, 0 crashes

# Coverage measured with -enableCodeCoverage YES on the same run; table above is from
# xcrun xccov view --report on the result bundle.

swiftlint lint --quiet
# 0 findings on the files this PR touches

The five new tests that touch live AX run for real on a trusted machine (verified locally) and skip on untrusted/headless runners.

Linked issues

None filed; this is proactive hardening so regressions like the recent caret/acceptance ones get caught at test time.

Risk / rollout notes

  • Production changes are test seams only (+37/-4 across 5 files): FileLogWriter/LLMIOFileWriter gain an optional fileURL: init parameter (default behavior byte-identical), and five classes gain nonisolated deinit {} (SuggestionDebugLogger, FileLogWriter, LLMIOFileWriter, FieldStyleCache, DeepGeometryWalkThrottle).
  • Why the deinits matter beyond tests: the app target's SWIFT_DEFAULT_ACTOR_ISOLATION: MainActor makes even unannotated classes main-actor-isolated, so their deallocation routes through the back-deployment executor shim, which double-frees in StopLookupScope on macOS 26 (confirmed via crash reports during this work; same root cause as the existing InputSuppressionController fix). Production never hit it only because these objects are app-lifetime singletons. Any future code that deallocates one would crash; now it cannot.
  • New tests follow the repo's determinism rules: suite-scoped UserDefaults, temp-dir file I/O, no network, no machine-dependent assertions (system samplers assert sane bounds; spell-checker tests use words NSSpellChecker classifies stably).
  • Suite wall time grows by roughly 30 seconds locally; the live-AX suites build one offscreen-ish window each, once per suite.
  • project.pbxproj regenerated by XcodeGen (test-file additions only).

Greptile Summary

This PR adds 362 new tests (1,049 → 1,411) to lock the deterministic core of the suggestion pipeline, raising Cotabby/Models coverage to 97.1% and Cotabby/Support to 96.0%. Production changes are five nonisolated deinit {} additions and two fileURL: init parameters that are strictly test-seam work with byte-identical default behaviour.

  • Source changes (+37/-4 across 5 files): FileLogWriter and LLMIOFileWriter gain an optional fileURL: override for testable I/O; SuggestionDebugLogger, FieldStyleCache, and DeepGeometryWalkThrottle gain empty nonisolated deinit {} to prevent the macOS 26 back-deployment executor shim double-free when test-scoped instances are released.
  • New coordinator test harness (SuggestionCoordinatorTestSupport): a shared CoordinatorRig struct wires all major protocol doubles so the prediction, input, and lifecycle suites each assert which boundary was poked rather than just that a call happened.
  • Live AX suites (AXHelperTests, FocusSnapshotResolverLiveTests): run against the test host's own AppKit windows and XCTSkip cleanly on untrusted/headless runners, keeping CI green.

Confidence Score: 5/5

Safe to merge. The five production-file changes are minimal test seams with no behavioural effect on the default code paths, and the test suite grew by 362 assertions that all pass.

Production changes are limited to two optional init parameters (default value preserves existing behaviour) and five empty nonisolated deinit {} bodies that only affect deallocation order for objects that are app-lifetime singletons in production. All new tests are deterministic, use isolated UserDefaults suites or temp directories, and the live-AX cases skip cleanly on headless CI. No logic changes to the suggestion pipeline itself.

CotabbyTests/SuggestionEngineRouterTests.swift — UserDefaults domains created per rig are never removed in tearDown, unlike the analogous pattern in PerformanceMetricsStoreTests. Worth a follow-up cleanup but does not affect correctness.

Important Files Changed

Filename Overview
Cotabby/Support/FileLogHandler.swift Adds optional fileURL: init param (nil → existing default path) and a nonisolated deinit {} to prevent the macOS 26 back-deployment executor shim double-free; production behaviour is byte-identical.
Cotabby/Support/LLMIOFileHandler.swift Mirrors FileLogHandler changes: optional fileURL: init param and nonisolated deinit {}. Default production path unchanged.
Cotabby/Services/Focus/DeepGeometryWalkThrottle.swift Adds nonisolated deinit {} only; no logic change. Prevents macOS 26 crash when test-scoped resolver instances are released.
Cotabby/Services/Focus/FieldStyleCache.swift Adds nonisolated deinit {} only; all stored state is value types, safe to release off-actor.
Cotabby/Services/Suggestion/SuggestionDebugLogger.swift Adds nonisolated deinit {} only. Stored state is a Bool and an optional String, both safe to release from any thread.
CotabbyTests/SuggestionCoordinatorTestSupport.swift Introduces shared coordinator rig doubles and the waitUntil polling helper. All rig types are explicit @MainActor (safe deallocation). waitUntil calls XCTFail on timeout but does not stop execution, so a single debounce regression can cascade into multiple misleading assertion failures.
CotabbyTests/FileLogHandlerTests.swift Good coverage of JSONL shape, metadata collision, rotation, and append-across-instances. The append test relies on implicit ARC deallocation ordering for a shared file path (noted in thread).
CotabbyTests/SuggestionSettingsModelTests.swift Comprehensive facade coverage: persistence round-trips, keybinding conflicts, power-profile seeding, disabled-app rules, and snapshot deduplication. Uses isolated UserDefaults suites with proper tearDown cleanup.
CotabbyTests/SuggestionEngineRouterTests.swift Good routing, fallback, cancellation, and metric coverage. Uses process-lifetime static retain to avoid macOS 26 dealloc crash, but UserDefaults domains created per rig are never removed (unlike PerformanceMetricsStoreTests which cleans up in tearDown).
CotabbyTests/FocusSnapshotResolverLiveTests.swift End-to-end live AX resolver coverage against real AppKit fields. XCTSkip properly guards unavailable AX environments. Tests each set their required state independently, so order sensitivity is not an issue.
CotabbyTests/SuggestionCoordinatorPredictionTests.swift Exercises debounce, freshness gates, engine dispatch, and overlay teardown. Rig instances are stored in an instance-scoped array cleared in tearDown; coordinator and work controller are explicitly @mainactor so deallocation is safe.
CotabbyTests/RuntimeBootstrapModelTests.swift Selection persistence and reconciliation tested against throwaway model directories. Correctly uses process-lifetime retain for @MainActor-isolated manager and properly cleans up both temp directories and UserDefaults suites in tearDown.
CotabbyTests/AXHelperTests.swift Live AX bridging tests against the test host's own process. Window is built once per suite (static class state), XCTSkip guards headless environments. RunLoop spinning pattern is correct for waiting on AX tree registration.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Production["Production changes (test seams only)"]
        FLW["FileLogWriter\n+ fileURL: init param\n+ nonisolated deinit {}"]
        LLMIOW["LLMIOFileWriter\n+ fileURL: init param\n+ nonisolated deinit {}"]
        SDL["SuggestionDebugLogger\n+ nonisolated deinit {}"]
        FSC["FieldStyleCache\n+ nonisolated deinit {}"]
        DGWT["DeepGeometryWalkThrottle\n+ nonisolated deinit {}"]
    end

    subgraph TestHarness["CoordinatorRig (new shared rig)"]
        CRig["makeCoordinatorRig()"]
        CRig --> PP["RigPermissionProvider"]
        CRig --> FP["RigFocusProvider"]
        CRig --> IM["RigInputMonitor"]
        CRig --> OC["RigOverlayController"]
        CRig --> ENG["RigSuggestionEngine"]
        CRig --> COORD["SuggestionCoordinator"]
    end

    subgraph TestSuites["New test suites"]
        PRED["SuggestionCoordinatorPredictionTests\n(debounce / freshness gates)"]
        INPUT["SuggestionCoordinatorInputTests\n(key routing / focus / permissions)"]
        LIFE["SuggestionCoordinatorLifecycleTests\n(start / stop / settings reaction)"]
        ROUTER["SuggestionEngineRouterTests\n(per-engine dispatch / fallback / metrics)"]
        SETTINGS["SuggestionSettingsModelTests\n(persistence round-trips / policies)"]
        AX["AXHelperTests + FocusSnapshotResolverLiveTests\n(live AppKit AX; XCTSkip on headless)"]
        LOGS["FileLogHandlerTests + LLMIOFileHandlerTests\n(JSONL shape / rotation)"]
    end

    CRig -->|drives| PRED
    CRig -->|drives| INPUT
    CRig -->|drives| LIFE
    FLW -->|injected via fileURL| LOGS
    LLMIOW -->|injected via fileURL| LOGS
    COORD --> ROUTER
    COORD --> SETTINGS
Loading

Fix All in Codex Fix All in Claude Code

Reviews (2): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile

…96%)

Adds 362 tests (1049 to 1411) across 32 new and 31 extended suites, lifting
whole-target line coverage from 31% to 42.4%: Models 58% to 97.1%, Support
83% to 96.0%, App 44% to 69.5%, Services 21% to 35.5%. The coordinator state
machine now has dedicated prediction/input/lifecycle suites built on a shared
recording rig, the settings facade has a full persistence round-trip lock,
and AXHelper plus FocusSnapshotResolver run live against the test host's own
AX tree (XCTSkip-gated where self-process AX is unavailable).

Production changes are test seams only: injectable log-file URLs for the two
JSONL writers (defaults unchanged), and nonisolated deinit on five classes
that the app target's default MainActor isolation otherwise routes through
the back-deployment executor shim, which double-frees in StopLookupScope on
macOS 26 the first time anything deallocates them (tests do; the production
singletons never did).
Comment on lines +278 to +285
while !condition() {
guard Date() < deadline else {
XCTFail(message(), file: file, line: line)
return
}
try? await Task.sleep(nanoseconds: 5_000_000)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 waitUntil does not halt execution after timeout

XCTFail marks the test as failed but does not stop execution. In any caller that follows an await waitUntil(...) call with further assertions on state that was never reached, those assertions will run against a default/nil/zero value and either silently pass (masking the real failure) or produce a second, confusing XCTFail whose message gives no indication that a timeout was the root cause. The coordinator tests use this helper extensively, so a single debounce regression could produce 5–10 unrelated-looking assertion failures instead of one clear timeout message.

Consider returning Bool and having callers XCTAssert the result, or throwing a custom error so callers can try-chain and early-exit.

Fix in Codex Fix in Claude Code

Comment on lines +102 to +113

let url = try XCTUnwrap(writer.fileURL)
let rotatedURL = url.deletingPathExtension().appendingPathExtension("jsonl.1")
XCTAssertEqual(lines(of: url), ["fresh"])
XCTAssertEqual(lines(of: rotatedURL).count, 2)
XCTAssertTrue(lines(of: rotatedURL).allSatisfy { $0 == String(repeating: "a", count: 40) })
}

func test_writer_appendsAcrossInstancesLikeARelaunch() throws {
let url = directory.appendingPathComponent("cotabby.jsonl")
FileLogWriter(sizeCapBytes: nil, fileURL: url).write("first\n")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Append test relies on implicit ARC deallocation timing

FileLogWriter(sizeCapBytes: nil, fileURL: url).write("first\n") creates a temporary whose FileHandle must be closed before the second writer calls openHandle() and seeks to end. With nonisolated deinit {}, Swift's ARC releases the handle synchronously at the end of the statement — reliable today. However, if this test ever becomes async or runs inside a Task, the runtime can extend temporary lifetimes across suspension points, leaving the first handle open when the second writer opens the same file. The second seekToEnd() would then return the offset already at "first\n" and the append assertion would still pass, but a subsequent rotation-boundary test on the same file could silently see wrong data. An explicit do { let w = ...; w.write(...) } scope makes the lifetime contract visible and future-proof.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex Fix in Claude Code

# Conflicts:
#	Cotabby.xcodeproj/project.pbxproj
@FuJacob FuJacob merged commit c458da3 into main Jun 12, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant