For production applications that support many concurrent users, sometimes having too fast or even instant text generation is bad in the sense that it allows users to mindlessly ask more stuff to the IA, burning tokens in the process.
Maybe an optional argument to cap text streaming speed would be useful to limit text speed to a high-enough rate that invites users to read while text is being generated, therefore leading users to actually read answers while they stream instead of waiting for the full answer to appear. This would have the effect of spacing out new messages, as users will have to wait an acceptable amount of time before getting the full answer.
This would be only a cosmetic or UI change.
For production applications that support many concurrent users, sometimes having too fast or even instant text generation is bad in the sense that it allows users to mindlessly ask more stuff to the IA, burning tokens in the process.
Maybe an optional argument to cap text streaming speed would be useful to limit text speed to a high-enough rate that invites users to read while text is being generated, therefore leading users to actually read answers while they stream instead of waiting for the full answer to appear. This would have the effect of spacing out new messages, as users will have to wait an acceptable amount of time before getting the full answer.
This would be only a cosmetic or UI change.