update yutori template to n1.5#159
Conversation
- bump model id from `n1-latest` to `n1.5-latest` - send `extra_body.tool_set: browser_tools_core-20260403` to use the coordinate-based tool set - send `extra_body.disable_tools` to explicitly exclude the DOM/Playwright tools (`extract_elements`, `find`, `set_element_value`, `execute_js`) since this template runs computer-use only - rename `hover` to `mouse_move` - rename `key_press` parameter `key_comb` to `key` - drop `press_enter_after` and `clear_before_typing` from `type` (n1.5 emits separate `key_press` actions instead) - add new actions: `middle_click`, `mouse_down`, `mouse_up`, `hold_key`, `go_forward` - support optional `modifier` parameter on click actions via Kernel's `hold_keys` Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
n1.5's scroll action accepts an optional `modifier` (e.g., shift) that on browsers translates a vertical wheel into a horizontal scroll. Plumb it into Kernel's `ComputerScrollParams.hold_keys` so the OS-level modifier+wheel event is dispatched correctly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirror yutori-sdk-python's reference loop: deep-copy a request-only view of the messages and strip old image_url blocks once the JSON payload exceeds ~9.5 MB, while always preserving the most recent 6 screenshots and the very latest one. The caller's full history is left intact for the return value. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
300ms after every action was conservative — at 50 iterations that is ~15s of pure wall-clock waiting before model calls. 150ms still gives the page enough time to settle for typical interactions while halving the per-step overhead. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
key_press / hold_key compound keys already consulted MODIFIER_MAP, but
the new `modifier` parameter on click and scroll actions passes a bare
modifier name ("control", "meta", "command") that bypassed the lookup
and went out as-is. Kernel's hold_keys wants "ctrl" and "super" — so
ctrl-click and cmd-click silently dropped the modifier.
Unify the per-part mapping into a single helper applied to both the
compound and single-key paths in TS and Python.
|
Firetiger deploy monitoring skipped This PR didn't match the auto-monitor filter configured on your GitHub connection:
Reason: PR updates a Yutori template with model changes and action signature updates, but does not modify kernel API endpoints (packages/api/cmd/api/) or Temporal workflows (packages/api/lib/temporal) in the codebase. To monitor this PR anyway, reply with |
The openai-node SDK does not have a Python-style `extra_body` kwarg —
it serializes the body as-is. Passing `extra_body: {...}` as a body
field made Yutori receive a literal `{"extra_body": ...}` and silently
drop the tool_set pin and disable_tools defense-in-depth.
Hoist both fields and apply them via a typed spread (mirrors the
anthropic-computer-use loop pattern).
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit d2affba. Configure here.
|
|
||
| self.kernel.browsers.computer.move_mouse( | ||
| mapped_key = self._map_key(key) | ||
| duration = action.get("duration") or 1000 |
There was a problem hiding this comment.
Python hold_key duration guard inconsistent with other handlers
Low Severity
The Python _handle_hold_key uses action.get("duration") or 1000 to default the duration, but this lets negative values pass through to press_key. Every other duration handler in both the Python and TypeScript templates — _handle_wait, handleHoldKey, and handleWait — uses an explicit duration > 0 guard. A negative duration from the model would cause the Kernel SDK to receive a negative millisecond value.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit d2affba. Configure here.


Summary
Updates the Yutori CUA template (TS + Python) from
n1-latestton1.5-latest, plus speed and request-size improvements.This is more than a model-name swap — n1.5 changes several action signatures, adds new actions, and introduces a tool-set selector. The template is computer-use only (no Playwright page exposed to the model), so we explicitly exclude the DOM/Playwright "expanded" tools.
Loop / API changes
n1-latestton1.5-latestextra_body.tool_set: "browser_tools_core-20260403"to select the coordinate-based tool set (the default, but pinned for stability)extra_body.disable_tools: ["extract_elements", "find", "set_element_value", "execute_js"]as defense-in-depth — these are the expanded tools that need a Playwright pageAction / handler changes
hover→mouse_movekey_press: parameterkey_comb→keytype: droppress_enter_after/clear_before_typing(n1.5 emits separatekey_pressactions instead)middle_click,mouse_down,mouse_up,hold_key,go_forwardmodifierparameter wired into Kernel'shold_keysfor shift/ctrl-clickswait: optionaldurationparameter honoredhold_key: optionaldurationparameter honored viapressKey'sdurationRequest-size trimming
Mirrors the official
yutori-sdk-pythonpayload.pyalgorithm to keep the per-request payload under the API ceiling on long sessions:MAX_REQUEST_BYTES = 9_500_000,KEEP_RECENT_SCREENSHOTS = 6(Yutori's published defaults)conversation_messagesis preserved (deep-copied before mutation); only the trimmed copy is sent to the API"[screenshot omitted to fit request size]"placeholder is inserted so the message stays validLatency
SCREENSHOT_DELAYhalved from 300ms → 150ms (the settle wait after click/type/scroll before grabbing the next screenshot)ACTION_DELAY(the focus-settling wait insidegoto_url's ctrl+L → type → Enter flow) left at 300ms — halving it risks silent input misrouting if the address bar hasn't taken focus yetFiles
pkg/create/templates.go— display name + descriptionpkg/templates/{typescript,python}/yutori/{loop,index|main,tools/computer,README}.*— implementation, trim helpers, types, docsTest plan
make buildandmake testpasskernel create -t yutori -l {typescript,python}) and deploy end-to-end againstn1.5-latestin the Default projectkey_pressEnter works on in-page search forms (DuckDuckGo, Google) and Wikipediascrollaction works on results pagesgoto_url(ctrl+L address-bar flow) still works post-delay changesNote
Medium Risk
Medium risk because it changes the model version and the tool/action contract (new/renamed actions, new request parameters), which can alter automation behavior. Also adds request-trimming logic that could affect model context in long runs if mis-tuned.
Overview
Updates the Yutori CUA templates to use
n1.5-latest(and renames the template to Yutori n1.5 Computer Use) across both TypeScript and Python.Aligns the sampling loops and tool adapters with n1.5 by pinning
tool_settobrowser_tools_core-20260403, explicitly disabling DOM/Playwright “expanded” tools viadisable_tools, and updating action schemas/handlers (e.g.,hover→mouse_move,key_comb→key, new actions likemouse_down/up,hold_key,go_forward, plusmodifiersupport for clicks/scroll).Adds request-size protection by trimming older screenshot
image_urlparts from the messages sent to the API (while preserving full local history) and reduces per-step screenshot settle delay from 300ms to 150ms; READMEs are updated accordingly.Reviewed by Cursor Bugbot for commit d2affba. Bugbot is set up for automated code reviews on this repo. Configure here.