Make generate() async to serialize back-to-back turns by stikves · Pull Request #80 · apple/coreai-models

stikves · 2026-07-02T20:37:44Z

Summary

Fixes a crash when reusing an engine for rapid multi-turn conversations:

CoreAIPipelinedEngine.swift:82: Fatal error: Trying to acquire engine when it's already in use

The InferenceEngine.generate() method is now async throws (was throws). The pipelined engine awaits the prior generation's Task before starting a new one, preventing the fatalError on rapid back-to-back calls.

Root cause

PR #64 removed the per-turn engine.reset() call to enable multi-turn KV cache reuse. That reset() also served as the serialization point between consecutive turns (it called drain() internally). Without it, a new generate() call can race with the prior generation's GPU pipeline drain.

Fix

The pipelined engine now cancels and awaits any in-flight generation at the top of generate():

if let priorTask = _generationTask.withLock({ $0 }) {
    _activeToken.withLock { $0?.cancel() }
    await priorTask.value
}

This preserves KV cache state — prefix caching handles reuse automatically across turns.

Test plan

Existing CancelAPITests pass
Existing UnifiedGenerationAPITests pass (prefix caching still works)
New: back-to-back generate() calls do not crash
New: rapid-fire 10-turn stress test (zero delay between turns)
New: generate() after cancel() works cleanly
Manual: multi-turn LanguageModelSession with Qwen3-0.6B pipelined engine

InferenceEngine.generate() is now async throws instead of throws. The pipelined engine awaits the prior generation Task before starting a new one, preventing the fatalError on rapid multi-turn conversations. The serialization preserves KV cache state -- prefix caching handles reuse automatically across turns. No data is lost; the engine just waits for the GPU pipeline to drain before restarting. Adds stress tests: back-to-back turns, rapid-fire 10-turn, and generate-after-cancel sequences.

carinapeng · 2026-07-03T00:20:09Z

Thanks for looking into this! The root cause is helpful tracing :)

stikves and others added 2 commits July 2, 2026 13:08

Merge branch 'main' into sukru/engine-serialize-turns

a922c59

Merge branch 'main' into sukru/engine-serialize-turns

244ace5

stikves marked this pull request as ready for review July 3, 2026 01:36

stikves requested review from alejandro-isaza, carinapeng, kevchengcodes and tjia1818 July 3, 2026 01:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make generate() async to serialize back-to-back turns#80

Make generate() async to serialize back-to-back turns#80
stikves wants to merge 3 commits into
apple:mainfrom
stikves:sukru/engine-serialize-turns

stikves commented Jul 2, 2026

Uh oh!

carinapeng commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

stikves commented Jul 2, 2026

Summary

Root cause

Fix

Test plan

Uh oh!

carinapeng commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants