Streaming stops visually in chat UI while generation continues in background (OpenAI-compatible, Ollama, vLLM)

When using  OpenAI-compatible and other local providers API endpoint (llama.cpp, ik_llama.cpp, llama-swap), streaming responses to a chat, the visible text output in the Cortex chat randomly freezes during generation. However, the underlying model continues generating tokens, and the full response is eventually received once the generation completes — it just isn’t displayed incrementally during the stall. I'm getting the "sendLLM: firing error"

This creates a misleading user experience, as it appears the model has stopped responding, while in fact it’s still working.


- VS Code Version: latest
- OS Version: arch linux



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming stops visually in chat UI while generation continues in background (OpenAI-compatible, Ollama, vLLM) #37

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Streaming stops visually in chat UI while generation continues in background (OpenAI-compatible, Ollama, vLLM) #37

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions