Skip to content

feat(settings): make token-by-token streaming reveal opt-in (default off)#692

Merged
FuJacob merged 2 commits into
mainfrom
fix/suggestion-streaming-char-by-char
Jun 12, 2026
Merged

feat(settings): make token-by-token streaming reveal opt-in (default off)#692
FuJacob merged 2 commits into
mainfrom
fix/suggestion-streaming-char-by-char

Conversation

@FuJacob

@FuJacob FuJacob commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

Suggestions were appearing token-by-token (read as "character by character") because PR #687 streams ghost text live as the model decodes. This adds a "Stream Suggestions While Generating" toggle (Appearance → Display), defaulting off, so suggestions appear once, fully formed, after generation finishes. Power users can opt back into the live streaming reveal.

The gate is at the prediction dispatch: when streaming is off, no onPartial handler is passed to the engine, so the engine skips its per-token main-actor hops entirely and the suggestion is presented once through apply(). When on, the existing streamed-partial behavior (each partial rendered as an acceptable session you can Tab into early) is preserved unchanged.

Validation

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build -derivedDataPath build/DerivedData
# ** BUILD SUCCEEDED **

xcodebuild ... test -only-testing:CotabbyTests/SettingsIndexTests \
  -only-testing:CotabbyTests/SuggestionSettingsStoreTests \
  -only-testing:CotabbyTests/SuggestionSettingsModelTests \
  -only-testing:CotabbyTests/SuggestionCoordinatorPredictionTests \
  -only-testing:CotabbyTests/StreamedGhostTextPolicyTests \
  -only-testing:CotabbyTests/LlamaSuggestionEngineStreamingTests \
  CODE_SIGNING_ALLOWED=NO CODE_SIGNING_REQUIRED=NO
# ** TEST SUCCEEDED **  (all suites, 0 failures)

swiftlint lint --quiet   # exit 0

UI: new toggle in Appearance → Display, under "Suggestion Display".

Linked issues

Risk / rollout notes

  • New persisted setting: UserDefaults key cotabbyStreamSuggestionsWhileGenerating, defaults to false. This is a behavior change vs. current main (perf(stream): render ghost text while the model is still decoding #687 streamed by default) — that's the intent: revert the default to all-at-once and make streaming opt-in.
  • Threaded through the full settings stack (data → store load/write-back/save → model @Published/setter/snapshot/Combine publisher → snapshot struct → Appearance UI toggle → search index). The new Combine upstream is grouped into the existing acceptance-toggle slot (CombineLatest3) to stay under Combine's four-input cap.
  • The streaming code path (queueStreamedPartial / applyStreamedPartial / StreamedGhostTextPolicy) is untouched and fully exercised when the toggle is on; the gate only decides whether onPartial is wired up.
  • No project.yml/pbxproj changes.

Greptile Summary

This PR makes token-by-token suggestion streaming opt-in (defaulting off), reversing the behavior introduced in #687. A new "Stream Suggestions While Generating" toggle is added to Appearance → Display, and the gate is implemented by conditionally omitting onPartial from the engine call.

  • Settings stack: streamSuggestionsWhileGenerating is threaded through all layers — SuggestionSettingsData, SuggestionSettingsStore (UserDefaults key cotabbyStreamSuggestionsWhileGenerating, default false), SuggestionSettingsModel (@Published + setter + Combine publisher via CombineLatest3), and SuggestionSettingsSnapshot.
  • Dispatch gate: dispatchGeneration reads the flag from the snapshot on the main actor before creating the work closure, capturing a plain Bool so the streaming decision is stable for the duration of a single generation.
  • UI & search: New toggle in the Display section with accurate "token-by-token" copy; SettingsIndex case with comprehensive search keywords for discoverability.

Confidence Score: 5/5

Safe to merge. The change is purely additive — a new opt-in toggle that defaults off — and the existing streaming code path is untouched.

The gate is read once on the main actor before a generation is dispatched, so the streaming decision is stable for the duration of any single generation and cannot change mid-flight. The new setting travels through every layer (data, store, model, snapshot, UI) in a pattern identical to adjacent settings. CombineLatest3 correctly replaces the prior CombineLatest without exceeding Combine's four-input cap, and the test fixture default matches the production default of false.

No files require special attention.

Important Files Changed

Filename Overview
Cotabby/App/Coordinators/SuggestionCoordinator+Prediction.swift Adds a shouldStreamPartials Bool captured from the snapshot on the main actor before dispatching work; wires it into the onPartial handler decision. Clean gate that preserves the streaming path fully when on.
Cotabby/Models/SuggestionEngineModels.swift Adds streamSuggestionsWhileGenerating: Bool to SuggestionSettingsSnapshot. Well-documented and consistent with adjacent fields.
Cotabby/Models/SuggestionSettingsData.swift Adds streamSuggestionsWhileGenerating: Bool to the durable data value type. Single initialization site updated in SuggestionSettingsStore; no default needed.
Cotabby/Models/SuggestionSettingsModel.swift Threads the new @Published property through init, snapshot construction, setter, and Combine publisher (replacing CombineLatest with CombineLatest3 to stay within the four-input cap). Correct and consistent with the existing pattern.
Cotabby/Support/SuggestionSettingsStore.swift Adds streamWhileGeneratingDefaultsKey, load/resolve, and save logic. UserDefaults key cotabbyStreamSuggestionsWhileGenerating defaults to false. Consistent with adjacent settings.
Cotabby/UI/Settings/Panes/AppearancePaneView.swift Adds streamWhileGeneratingBinding and its Toggle into the Display section. Description text says "token-by-token" which is accurate.
Cotabby/UI/Settings/SettingsIndex.swift Adds streamWhileGenerating case with correct pane routing, icon, and search keywords. Thorough keyword set for discoverability.
CotabbyTests/CotabbyTestFixtures.swift Adds streamSuggestionsWhileGenerating: Bool = false default parameter to the snapshot fixture factory. Default matches production default.

Sequence Diagram

sequenceDiagram
    participant User
    participant AppearancePaneView
    participant SuggestionSettingsModel
    participant SuggestionSettingsStore
    participant SuggestionCoordinator
    participant SuggestionEngine

    User->>AppearancePaneView: Toggle "Stream Suggestions While Generating"
    AppearancePaneView->>SuggestionSettingsModel: setStreamSuggestionsWhileGenerating(enabled)
    SuggestionSettingsModel->>SuggestionSettingsStore: saveStreamSuggestionsWhileGenerating(enabled)
    SuggestionSettingsStore->>SuggestionSettingsStore: userDefaults.set(enabled, forKey:)
    SuggestionSettingsModel->>SuggestionSettingsModel: Combine publisher fires → snapshot updated

    Note over SuggestionCoordinator: On next keystroke / focus event
    SuggestionCoordinator->>SuggestionCoordinator: dispatchGeneration() reads settingsSnapshot.streamSuggestionsWhileGenerating

    alt "streamSuggestionsWhileGenerating == true"
        SuggestionCoordinator->>SuggestionEngine: generateSuggestion(onPartial: queueStreamedPartial)
        SuggestionEngine-->>SuggestionCoordinator: onPartial(partial) per token
        SuggestionCoordinator->>SuggestionCoordinator: queueStreamedPartial → render ghost text live
        SuggestionEngine-->>SuggestionCoordinator: final result
        SuggestionCoordinator->>SuggestionCoordinator: apply(result)
    else "streamSuggestionsWhileGenerating == false (default)"
        SuggestionCoordinator->>SuggestionEngine: generateSuggestion(onPartial: nil)
        SuggestionEngine-->>SuggestionCoordinator: final result (no per-token hops)
        SuggestionCoordinator->>SuggestionCoordinator: apply(result) — one-shot reveal
    end
Loading

Reviews (2): Last reviewed commit: "docs(settings): describe streaming revea..." | Re-trigger Greptile

…off)

PR #687 added streaming ghost text that reveals a suggestion token-by-token as
the model decodes. Some users read the incremental reveal as the suggestion
"coming out character by character" and prefer it to appear once, fully formed.

Add a "Stream Suggestions While Generating" toggle (Appearance > Display),
defaulting off. When off, the prediction path passes no onPartial handler, so
the engine skips its per-token main-actor hops and the suggestion is presented
once through apply(). When on, the existing streamed-partial behavior (each
partial an acceptable session you can Tab into early) is preserved.
Comment thread Cotabby/UI/Settings/Panes/AppearancePaneView.swift Outdated
…-by-word

Address Greptile P2: LLM decoding is token-by-token (sub-word fragments), so
the toggle copy should not promise word granularity.
@FuJacob FuJacob merged commit 646ad6e into main Jun 12, 2026
4 checks passed
@FuJacob FuJacob deleted the fix/suggestion-streaming-char-by-char branch June 12, 2026 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant