Raise deterministic-core test coverage to 95%+ (Models 97.1%, Support 96.0%)#678
Conversation
…96%) Adds 362 tests (1049 to 1411) across 32 new and 31 extended suites, lifting whole-target line coverage from 31% to 42.4%: Models 58% to 97.1%, Support 83% to 96.0%, App 44% to 69.5%, Services 21% to 35.5%. The coordinator state machine now has dedicated prediction/input/lifecycle suites built on a shared recording rig, the settings facade has a full persistence round-trip lock, and AXHelper plus FocusSnapshotResolver run live against the test host's own AX tree (XCTSkip-gated where self-process AX is unavailable). Production changes are test seams only: injectable log-file URLs for the two JSONL writers (defaults unchanged), and nonisolated deinit on five classes that the app target's default MainActor isolation otherwise routes through the back-deployment executor shim, which double-frees in StopLookupScope on macOS 26 the first time anything deallocates them (tests do; the production singletons never did).
| while !condition() { | ||
| guard Date() < deadline else { | ||
| XCTFail(message(), file: file, line: line) | ||
| return | ||
| } | ||
| try? await Task.sleep(nanoseconds: 5_000_000) | ||
| } | ||
| } |
There was a problem hiding this comment.
waitUntil does not halt execution after timeout
XCTFail marks the test as failed but does not stop execution. In any caller that follows an await waitUntil(...) call with further assertions on state that was never reached, those assertions will run against a default/nil/zero value and either silently pass (masking the real failure) or produce a second, confusing XCTFail whose message gives no indication that a timeout was the root cause. The coordinator tests use this helper extensively, so a single debounce regression could produce 5–10 unrelated-looking assertion failures instead of one clear timeout message.
Consider returning Bool and having callers XCTAssert the result, or throwing a custom error so callers can try-chain and early-exit.
|
|
||
| let url = try XCTUnwrap(writer.fileURL) | ||
| let rotatedURL = url.deletingPathExtension().appendingPathExtension("jsonl.1") | ||
| XCTAssertEqual(lines(of: url), ["fresh"]) | ||
| XCTAssertEqual(lines(of: rotatedURL).count, 2) | ||
| XCTAssertTrue(lines(of: rotatedURL).allSatisfy { $0 == String(repeating: "a", count: 40) }) | ||
| } | ||
|
|
||
| func test_writer_appendsAcrossInstancesLikeARelaunch() throws { | ||
| let url = directory.appendingPathComponent("cotabby.jsonl") | ||
| FileLogWriter(sizeCapBytes: nil, fileURL: url).write("first\n") | ||
|
|
There was a problem hiding this comment.
Append test relies on implicit ARC deallocation timing
FileLogWriter(sizeCapBytes: nil, fileURL: url).write("first\n") creates a temporary whose FileHandle must be closed before the second writer calls openHandle() and seeks to end. With nonisolated deinit {}, Swift's ARC releases the handle synchronously at the end of the statement — reliable today. However, if this test ever becomes async or runs inside a Task, the runtime can extend temporary lifetimes across suspension points, leaving the first handle open when the second writer opens the same file. The second seekToEnd() would then return the offset already at "first\n" and the append assertion would still pass, but a subsequent rotation-boundary test on the same file could silently see wrong data. An explicit do { let w = ...; w.write(...) } scope makes the lifetime contract visible and future-proof.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
# Conflicts: # Cotabby.xcodeproj/project.pbxproj
Summary
Builds the regression net the suggestion pipeline has been missing: 362 new tests (1,049 to 1,411) across 32 new and 31 extended suites, taking whole-target line coverage from 31% to 42.4% and the deterministic core past the 95% bar. Coverage was driven from a per-file
xccovgap analysis of a baseline run, so every suite targets lines that were actually unexercised, with behavioral assertions (exact outputs, persisted state, which boundary got poked) rather than call-only coverage.Cotabby/ModelsCotabby/SupportCotabby/App(coordinators)Cotabby/ServicesCotabby/UI(SwiftUI)What the new nets actually lock:
SuggestionCoordinatorTestSupport): debounce, request build, engine dispatch, every freshness gate inapply(stale generation, empty result, selected text, acceptance echo), the typo gate's suppress/offer/auto-fix routes, input-event routing, focus/permission/settings reactions, and lifecycle teardown.SuggestionSettingsModel): a full setter persistence round-trip through a fresh model on the same defaults suite, keybinding conflict rules, power-profile seeding, normalization funnels, clamps, and the snapshot publisher's dedupe.AXHelper(4.6% to 71.9%) andFocusSnapshotResolver(0.7% and now exercised end to end on real AppKit text fields, secure fields included). These suitesXCTSkipcleanly where self-process AX is unavailable, so CI stays green either way.SuggestionEngineRouter): per-engine dispatch, the Apple Intelligence locale-failure fallback to llama, cancellation passthrough, and performance-metric recording.jqcontract in CLAUDE.md), metadata flattening, and the 10 MB one-step rotation that preserves history.Remaining gaps are deliberate, not silent:
UI/(13.6k lines of SwiftUI, needs UI-test infrastructure), live-system boundaries inServices/(CGEvent taps, screenshots, the llama runtime, window controllers), AXHelper's Chromium text-marker internals (need a live browser), and the OS-gatedSystemAvailabilityProvider.Validation
The five new tests that touch live AX run for real on a trusted machine (verified locally) and skip on untrusted/headless runners.
Linked issues
None filed; this is proactive hardening so regressions like the recent caret/acceptance ones get caught at test time.
Risk / rollout notes
FileLogWriter/LLMIOFileWritergain an optionalfileURL:init parameter (default behavior byte-identical), and five classes gainnonisolated deinit {}(SuggestionDebugLogger,FileLogWriter,LLMIOFileWriter,FieldStyleCache,DeepGeometryWalkThrottle).SWIFT_DEFAULT_ACTOR_ISOLATION: MainActormakes even unannotated classes main-actor-isolated, so their deallocation routes through the back-deployment executor shim, which double-frees inStopLookupScopeon macOS 26 (confirmed via crash reports during this work; same root cause as the existingInputSuppressionControllerfix). Production never hit it only because these objects are app-lifetime singletons. Any future code that deallocates one would crash; now it cannot.UserDefaults, temp-dir file I/O, no network, no machine-dependent assertions (system samplers assert sane bounds; spell-checker tests use words NSSpellChecker classifies stably).project.pbxprojregenerated by XcodeGen (test-file additions only).Greptile Summary
This PR adds 362 new tests (1,049 → 1,411) to lock the deterministic core of the suggestion pipeline, raising
Cotabby/Modelscoverage to 97.1% andCotabby/Supportto 96.0%. Production changes are fivenonisolated deinit {}additions and twofileURL:init parameters that are strictly test-seam work with byte-identical default behaviour.+37/-4across 5 files):FileLogWriterandLLMIOFileWritergain an optionalfileURL:override for testable I/O;SuggestionDebugLogger,FieldStyleCache, andDeepGeometryWalkThrottlegain emptynonisolated deinit {}to prevent the macOS 26 back-deployment executor shim double-free when test-scoped instances are released.SuggestionCoordinatorTestSupport): a sharedCoordinatorRigstruct wires all major protocol doubles so the prediction, input, and lifecycle suites each assert which boundary was poked rather than just that a call happened.AXHelperTests,FocusSnapshotResolverLiveTests): run against the test host's own AppKit windows andXCTSkipcleanly on untrusted/headless runners, keeping CI green.Confidence Score: 5/5
Safe to merge. The five production-file changes are minimal test seams with no behavioural effect on the default code paths, and the test suite grew by 362 assertions that all pass.
Production changes are limited to two optional init parameters (default value preserves existing behaviour) and five empty
nonisolated deinit {}bodies that only affect deallocation order for objects that are app-lifetime singletons in production. All new tests are deterministic, use isolated UserDefaults suites or temp directories, and the live-AX cases skip cleanly on headless CI. No logic changes to the suggestion pipeline itself.CotabbyTests/SuggestionEngineRouterTests.swift — UserDefaults domains created per rig are never removed in tearDown, unlike the analogous pattern in PerformanceMetricsStoreTests. Worth a follow-up cleanup but does not affect correctness.
Important Files Changed
fileURL:init param (nil → existing default path) and anonisolated deinit {}to prevent the macOS 26 back-deployment executor shim double-free; production behaviour is byte-identical.fileURL:init param andnonisolated deinit {}. Default production path unchanged.nonisolated deinit {}only; no logic change. Prevents macOS 26 crash when test-scoped resolver instances are released.nonisolated deinit {}only; all stored state is value types, safe to release off-actor.nonisolated deinit {}only. Stored state is aBooland an optionalString, both safe to release from any thread.waitUntilpolling helper. All rig types are explicit@MainActor(safe deallocation).waitUntilcallsXCTFailon timeout but does not stop execution, so a single debounce regression can cascade into multiple misleading assertion failures.XCTSkipproperly guards unavailable AX environments. Tests each set their required state independently, so order sensitivity is not an issue.Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD subgraph Production["Production changes (test seams only)"] FLW["FileLogWriter\n+ fileURL: init param\n+ nonisolated deinit {}"] LLMIOW["LLMIOFileWriter\n+ fileURL: init param\n+ nonisolated deinit {}"] SDL["SuggestionDebugLogger\n+ nonisolated deinit {}"] FSC["FieldStyleCache\n+ nonisolated deinit {}"] DGWT["DeepGeometryWalkThrottle\n+ nonisolated deinit {}"] end subgraph TestHarness["CoordinatorRig (new shared rig)"] CRig["makeCoordinatorRig()"] CRig --> PP["RigPermissionProvider"] CRig --> FP["RigFocusProvider"] CRig --> IM["RigInputMonitor"] CRig --> OC["RigOverlayController"] CRig --> ENG["RigSuggestionEngine"] CRig --> COORD["SuggestionCoordinator"] end subgraph TestSuites["New test suites"] PRED["SuggestionCoordinatorPredictionTests\n(debounce / freshness gates)"] INPUT["SuggestionCoordinatorInputTests\n(key routing / focus / permissions)"] LIFE["SuggestionCoordinatorLifecycleTests\n(start / stop / settings reaction)"] ROUTER["SuggestionEngineRouterTests\n(per-engine dispatch / fallback / metrics)"] SETTINGS["SuggestionSettingsModelTests\n(persistence round-trips / policies)"] AX["AXHelperTests + FocusSnapshotResolverLiveTests\n(live AppKit AX; XCTSkip on headless)"] LOGS["FileLogHandlerTests + LLMIOFileHandlerTests\n(JSONL shape / rotation)"] end CRig -->|drives| PRED CRig -->|drives| INPUT CRig -->|drives| LIFE FLW -->|injected via fileURL| LOGS LLMIOW -->|injected via fileURL| LOGS COORD --> ROUTER COORD --> SETTINGSReviews (2): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile