When using OpenAI-compatible and other local providers API endpoint (llama.cpp, ik_llama.cpp, llama-swap), streaming responses to a chat, the visible text output in the Cortex chat randomly freezes during generation. However, the underlying model continues generating tokens, and the full response is eventually received once the generation completes — it just isn’t displayed incrementally during the stall. I'm getting the "sendLLM: firing error"
This creates a misleading user experience, as it appears the model has stopped responding, while in fact it’s still working.
- VS Code Version: latest
- OS Version: arch linux