Skip to content

[fix] http_options leaking to model init & NPU tensor serialization failure over HTTP#109

Draft
kevssim wants to merge 3 commits intomodelscope:mainfrom
kevssim:main
Draft

[fix] http_options leaking to model init & NPU tensor serialization failure over HTTP#109
kevssim wants to merge 3 commits intomodelscope:mainfrom
kevssim:main

Conversation

@kevssim
Copy link
Collaborator

@kevssim kevssim commented Mar 6, 2026

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Fix two bugs in the HTTP serving path:

1. http_options leaking into model __init__

http_options was unconditionally injected into args for all app types in ServerLauncher._deploy_app(), causing it to be forwarded to
Qwen3ForCausalLM.__init__() and triggering a TypeError. Fixed by guarding the injection with import_path == 'server'.

2. NPU tensor serialization failure on CPU-only client

The /forward_backward HTTP endpoint returned raw NPU tensors, which Ray could not deserialize on the client side (no NPU available). Fixed by adding
forward_backward_http, which converts all tensors/arrays to plain Python scalars/lists via _to_cpu_safe_output() before returning.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses two critical bugs in the HTTP serving path. It resolves an issue where http_options were inadvertently passed to model initialization, causing TypeError exceptions. Additionally, it fixes a problem with NPU tensor serialization, ensuring that outputs from the /forward_backward HTTP endpoint are correctly converted to CPU-compatible Python objects, preventing deserialization failures on client machines without NPU hardware.

Highlights

  • http_options injection fix: Prevented http_options from being incorrectly passed to model __init__ methods by guarding its injection to only 'server' app types, resolving TypeError issues.
  • NPU tensor serialization for HTTP: Implemented a mechanism to convert NPU tensors and NumPy arrays to CPU-safe Python scalars/lists before returning them from the /forward_backward HTTP endpoint, addressing deserialization failures on CPU-only clients.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • cookbook/client/twinkle/self_congnition.py
    • Updated loss logging to extract 'loss' specifically from the output dictionary.
  • src/twinkle/model/transformers/transformers.py
    • Added a static method _to_cpu_safe_output to convert nested data structures containing PyTorch tensors or NumPy arrays into CPU-safe Python types.
    • Introduced forward_backward_http as a new remote function that wraps forward_backward and applies _to_cpu_safe_output to its results.
  • src/twinkle/server/launcher.py
    • Modified the logic for injecting http_options into application arguments, ensuring it only occurs for 'server' type applications.
  • src/twinkle/server/twinkle/model.py
    • Updated the /forward_backward HTTP endpoint to call the new forward_backward_http method, ensuring outputs are CPU-safe.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two important bug fixes: preventing http_options from leaking into model initializers and resolving a tensor serialization issue over HTTP by introducing a new forward_backward_http method. While these bug fixes are well-implemented, a critical security vulnerability was identified in the /upload_to_hub endpoint, allowing for arbitrary directory exfiltration due to a lack of path validation when the twinkle:// prefix is not used. This requires immediate attention by strictly validating and restricting all user-supplied paths to authorized directories. Additionally, there is a minor suggestion to improve code conciseness in the new _to_cpu_safe_output helper function.

Comment on lines +541 to +547
@remote_function(dispatch='slice_dp', collect='mean')
def forward_backward_http(self, *, inputs: Union[InputFeature, List[InputFeature], Trajectory, List[Trajectory]],
**kwargs):
"""HTTP-safe forward/backward that materializes outputs before they leave the worker."""
outputs = self.forward_backward(inputs=inputs, **kwargs)
return self._to_cpu_safe_output(outputs)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forward_backward_http broke the consistency of the interface and cannot be directly modified in the transformers.py file. For special operations required by the server, you need to inherit this class and override the method. You can refer to the writing style of tinker:

super().backward(adapter_name=adapter_name, **kwargs)
# shape (batch_size, seq_len, vocab_size)
logits = outputs['logits'].detach()
logps = outputs.get('logps', None)
if logps is not None:
logps = logps.detach().cpu()
results = self._get_forward_output(inputs, logits, logps)
return [results, loss]

@kevssim kevssim marked this pull request as draft March 6, 2026 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants