Is your feature request related to a problem? Please describe.
Currently the config is usable wrt text-generation use case only. It does not support audio, image etc.
Describe the solution you'd like
To make the config modality-agnostic. Like the config should have fields mentioning the task for which its being used i.e text, stt and tts
Additional context
Unified API Solution Doc