Description
The benchmark currently passes the temperature parameter to all models by default.
However, reasoning models such as gpt-5 do not support custom temperature values and only allow the default value. As a result, evaluation fails with errors like:
Error code: 400 - Unsupported value: 'temperature' does not support 0.0 with this model.
gpt-5 is used as model by default.
Expected behavior
Do not send temperature (and similar sampling parameters) to reasoning models.