Skip to content

update yutori template to n1.5#159

Open
dprevoznik wants to merge 8 commits into
mainfrom
hypeship/yutori-n15-template
Open

update yutori template to n1.5#159
dprevoznik wants to merge 8 commits into
mainfrom
hypeship/yutori-n15-template

Conversation

@dprevoznik
Copy link
Copy Markdown
Contributor

@dprevoznik dprevoznik commented May 6, 2026

Summary

Updates the Yutori CUA template (TS + Python) from n1-latest to n1.5-latest, plus speed and request-size improvements.

This is more than a model-name swap — n1.5 changes several action signatures, adds new actions, and introduces a tool-set selector. The template is computer-use only (no Playwright page exposed to the model), so we explicitly exclude the DOM/Playwright "expanded" tools.

Loop / API changes

  • bump model id from n1-latest to n1.5-latest
  • send extra_body.tool_set: "browser_tools_core-20260403" to select the coordinate-based tool set (the default, but pinned for stability)
  • send extra_body.disable_tools: ["extract_elements", "find", "set_element_value", "execute_js"] as defense-in-depth — these are the expanded tools that need a Playwright page

Action / handler changes

  • rename hovermouse_move
  • key_press: parameter key_combkey
  • type: drop press_enter_after / clear_before_typing (n1.5 emits separate key_press actions instead)
  • new actions: middle_click, mouse_down, mouse_up, hold_key, go_forward
  • click actions: optional modifier parameter wired into Kernel's hold_keys for shift/ctrl-clicks
  • wait: optional duration parameter honored
  • hold_key: optional duration parameter honored via pressKey's duration

Request-size trimming

Mirrors the official yutori-sdk-python payload.py algorithm to keep the per-request payload under the API ceiling on long sessions:

  • MAX_REQUEST_BYTES = 9_500_000, KEEP_RECENT_SCREENSHOTS = 6 (Yutori's published defaults)
  • size-triggered (not count-triggered) — only kicks in when the estimated request exceeds the threshold
  • two-phase strip: drop images outside the protected window first; if still too large, walk into the protected window but always preserve the latest screenshot
  • dual-list pattern: the caller's full conversation_messages is preserved (deep-copied before mutation); only the trimmed copy is sent to the API
  • when an image_url block is stripped from a tool result with no remaining text, a "[screenshot omitted to fit request size]" placeholder is inserted so the message stays valid

Latency

  • post-action SCREENSHOT_DELAY halved from 300ms → 150ms (the settle wait after click/type/scroll before grabbing the next screenshot)
  • ACTION_DELAY (the focus-settling wait inside goto_url's ctrl+L → type → Enter flow) left at 300ms — halving it risks silent input misrouting if the address bar hasn't taken focus yet

Files

  • pkg/create/templates.go — display name + description
  • pkg/templates/{typescript,python}/yutori/{loop,index|main,tools/computer,README}.* — implementation, trim helpers, types, docs

Test plan

  • make build and make test pass
  • scaffold the template (kernel create -t yutori -l {typescript,python}) and deploy end-to-end against n1.5-latest in the Default project
  • smoke test: navigate + interact on example.com (both TS and Python)
  • verify key_press Enter works on in-page search forms (DuckDuckGo, Google) and Wikipedia
  • verify scroll action works on results pages
  • verify goto_url (ctrl+L address-bar flow) still works post-delay changes
  • confirm trim helpers preserve the full caller history while shrinking only the request copy (verified with synthetic 20-step × 600KB history: 12.9MB → 9.2MB, last 6 screenshots byte-identical, tool_call_ids intact)

Note

Medium Risk
Medium risk because it changes the model version and the tool/action contract (new/renamed actions, new request parameters), which can alter automation behavior. Also adds request-trimming logic that could affect model context in long runs if mis-tuned.

Overview
Updates the Yutori CUA templates to use n1.5-latest (and renames the template to Yutori n1.5 Computer Use) across both TypeScript and Python.

Aligns the sampling loops and tool adapters with n1.5 by pinning tool_set to browser_tools_core-20260403, explicitly disabling DOM/Playwright “expanded” tools via disable_tools, and updating action schemas/handlers (e.g., hovermouse_move, key_combkey, new actions like mouse_down/up, hold_key, go_forward, plus modifier support for clicks/scroll).

Adds request-size protection by trimming older screenshot image_url parts from the messages sent to the API (while preserving full local history) and reduces per-step screenshot settle delay from 300ms to 150ms; READMEs are updated accordingly.

Reviewed by Cursor Bugbot for commit d2affba. Bugbot is set up for automated code reviews on this repo. Configure here.

dprevoznik and others added 6 commits May 6, 2026 22:47
- bump model id from `n1-latest` to `n1.5-latest`
- send `extra_body.tool_set: browser_tools_core-20260403` to use the
  coordinate-based tool set
- send `extra_body.disable_tools` to explicitly exclude the DOM/Playwright
  tools (`extract_elements`, `find`, `set_element_value`, `execute_js`)
  since this template runs computer-use only
- rename `hover` to `mouse_move`
- rename `key_press` parameter `key_comb` to `key`
- drop `press_enter_after` and `clear_before_typing` from `type` (n1.5
  emits separate `key_press` actions instead)
- add new actions: `middle_click`, `mouse_down`, `mouse_up`, `hold_key`,
  `go_forward`
- support optional `modifier` parameter on click actions via Kernel's
  `hold_keys`

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
n1.5's scroll action accepts an optional `modifier` (e.g., shift) that
on browsers translates a vertical wheel into a horizontal scroll. Plumb
it into Kernel's `ComputerScrollParams.hold_keys` so the OS-level
modifier+wheel event is dispatched correctly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirror yutori-sdk-python's reference loop: deep-copy a request-only
view of the messages and strip old image_url blocks once the JSON
payload exceeds ~9.5 MB, while always preserving the most recent
6 screenshots and the very latest one. The caller's full history
is left intact for the return value.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
300ms after every action was conservative — at 50 iterations that is
~15s of pure wall-clock waiting before model calls. 150ms still gives
the page enough time to settle for typical interactions while halving
the per-step overhead.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
key_press / hold_key compound keys already consulted MODIFIER_MAP, but
the new `modifier` parameter on click and scroll actions passes a bare
modifier name ("control", "meta", "command") that bypassed the lookup
and went out as-is. Kernel's hold_keys wants "ctrl" and "super" — so
ctrl-click and cmd-click silently dropped the modifier.

Unify the per-part mapping into a single helper applied to both the
compound and single-key paths in TS and Python.
@dprevoznik dprevoznik marked this pull request as ready for review May 12, 2026 02:39
@firetiger-agent
Copy link
Copy Markdown

Firetiger deploy monitoring skipped

This PR didn't match the auto-monitor filter configured on your GitHub connection:

Any PR that changes the kernel API. Monitor changes to API endpoints (packages/api/cmd/api/) and Temporal workflows (packages/api/lib/temporal) in the kernel repo

Reason: PR updates a Yutori template with model changes and action signature updates, but does not modify kernel API endpoints (packages/api/cmd/api/) or Temporal workflows (packages/api/lib/temporal) in the codebase.

To monitor this PR anyway, reply with @firetiger monitor this.

@dprevoznik dprevoznik requested a review from masnwilliams May 12, 2026 02:42
dprevoznik and others added 2 commits May 11, 2026 22:42
The openai-node SDK does not have a Python-style `extra_body` kwarg —
it serializes the body as-is. Passing `extra_body: {...}` as a body
field made Yutori receive a literal `{"extra_body": ...}` and silently
drop the tool_set pin and disable_tools defense-in-depth.

Hoist both fields and apply them via a typed spread (mirrors the
anthropic-computer-use loop pattern).
Comment thread pkg/templates/typescript/yutori/loop.ts Outdated
Comment thread pkg/templates/typescript/yutori/tools/computer.ts
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d2affba. Configure here.


self.kernel.browsers.computer.move_mouse(
mapped_key = self._map_key(key)
duration = action.get("duration") or 1000
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python hold_key duration guard inconsistent with other handlers

Low Severity

The Python _handle_hold_key uses action.get("duration") or 1000 to default the duration, but this lets negative values pass through to press_key. Every other duration handler in both the Python and TypeScript templates — _handle_wait, handleHoldKey, and handleWait — uses an explicit duration > 0 guard. A negative duration from the model would cause the Kernel SDK to receive a negative millisecond value.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d2affba. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant