-
Notifications
You must be signed in to change notification settings - Fork 61
Add Qwen3 235B A22B Instruct 2507 model for Vertex AI #388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: addc493296
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| "max_input_tokens": 262144, | ||
| "max_output_tokens": 16384 | ||
| }, | ||
| "publishers/qwen/models/qwen3-235b-a22b-instruct-2507-maas": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Route new Qwen MaaS model via Vertex OpenAPI path
This model is registered as format: "openai" on Vertex, so calls will go through fetchOpenAI, but that function only routes publishers/meta/... models to .../endpoints/openapi/chat/completions and sends all other publishers to :rawPredict after rewriting the model name. Because this new entry is another *-maas model (publishers/qwen/...-maas), it will follow the non-Meta branch and be invoked with the wrong Vertex path/model rewrite for chat-completions requests, causing runtime request failures for users selecting this model.
Useful? React with 👍 / 👎.
ibolmo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. i just had a small reservation. did you happen to load this locally and hit the vertex api via the proxy?
packages/proxy/src/proxy.ts
Outdated
| // Use the OpenAPI endpoint. | ||
| fullURL = new URL( | ||
| `${baseURL}/v1beta1/projects/${project}/locations/${location}/endpoints/openapi/chat/completions`, | ||
| `${baseURL}/v1/projects/${project}/locations/${location}/endpoints/openapi/chat/completions`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about this change. are there any models that still require v1betav1? should we keep the old behavior but add another statement? to be safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! yes I did test this with a meta model with the following with this fix:
% curl -s --max-time 5 -X POST http://localhost:8001/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: $BRAINTRUST_API_KEY" -d '{"model": "publishers/meta/models/llama-3.3-70b-instruct-maas", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}
'
{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Hello. How can I assist you today?","role":"assistant"}}],"created":1770834581,"id":"lcqMafL4OtD52PgP9qvZwQs","model":"meta/llama-3.3-70b-instruct-maas","object":"chat.completion","system_fingerprint":"","usage":{"completion_tokens":10,"extra_properties":{"google":{"traffic_type":"ON_DEMAND"}},"prompt_tokens":37,"total_tokens":47}}
I also did some digging and it seems like v1 is stable and v1beta1 is no longer needed but I'm happy to split them out to keep sending publishers/meta to v1beta1 to not introduce something new here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can leave it, but maybe try to use the old models and see if we're good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up making it conditional to be on the safe side, so meta will remain on v1beta1 and qwen3 will be on v1
|
Deployment failed with the following error: View Documentation: https://vercel.com/docs/two-factor-authentication |
Adding model Qwen3 235B A22B Instruct 2507 model for Vertex AI using details in: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/maas/qwen/qwen3-235b. Per the Vertex AI docs, supported regions are
globalandus-south1, and I added that to the model as well.