Skip to content

Commit 69d66b4

Browse files
masnwilliamsclaude
andauthored
fix(cua): convert Gemini scroll magnitude pixels to wheel notches (#154)
## Summary Stacks on top of the unified CUA template branch. Fixes the Gemini scroll handler bug Danny flagged in review. Gemini's computer-use API reports scroll `magnitude` in **pixels** (default ~400), but `computer.scroll`'s `delta_x` / `delta_y` expects **wheel notches**. The cua adapter was passing `magnitude` through unchanged, so a default Gemini scroll fired ~400 notches instead of ~7. The standalone `gemini-computer-use` template already does the right thing — this just brings the unified adapter in line: - default magnitude: `3` → `400` (pixels, matching Gemini's spec) - divide by `PX_PER_NOTCH` (60) and clamp to `MAX_NOTCHES_PER_ACTION` (17) - applied symmetrically in TS (`providers/gemini.ts`) and Python (`providers/gemini.py`) The `anthropic` and `openai` adapters already match their standalone equivalents — no changes needed there. ## Test plan - [ ] `go build ./...` passes (verified locally) - [ ] `go test ./pkg/create/...` passes (verified locally) - [ ] Deploy CUA template with Gemini provider, ask it to scroll a long page; confirm scroll distance is page-sized, not catastrophic - [ ] Repeat for Python template Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 822ebd0 commit 69d66b4

2 files changed

Lines changed: 21 additions & 6 deletions

File tree

pkg/templates/python/cua/providers/gemini.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,11 @@
2424
DEFAULT_WIDTH = 1200
2525
DEFAULT_HEIGHT = 800
2626

27+
# Gemini reports scroll magnitude in pixels; computer.scroll expects wheel
28+
# notches. Convert with a per-notch pixel budget and clamp to a sane max.
29+
PX_PER_NOTCH = 60
30+
MAX_NOTCHES_PER_ACTION = 17
31+
2732
def _system_prompt() -> str:
2833
date = datetime.now().strftime("%A, %B %d, %Y")
2934
return (
@@ -169,10 +174,11 @@ async def _execute_action(
169174
y = self._denorm(args.get("y"), height)
170175
else:
171176
x, y = width // 2, height // 2
172-
magnitude = args.get("magnitude", 3)
177+
magnitude_px = args.get("magnitude", 400)
178+
notches = min(MAX_NOTCHES_PER_ACTION, max(1, round(magnitude_px / PX_PER_NOTCH)))
173179
direction = args.get("direction", "down")
174-
dy = -magnitude if direction == "up" else magnitude if direction == "down" else 0
175-
dx = -magnitude if direction == "left" else magnitude if direction == "right" else 0
180+
dy = -notches if direction == "up" else notches if direction == "down" else 0
181+
dx = -notches if direction == "left" else notches if direction == "right" else 0
176182
await asyncio.to_thread(
177183
computer.scroll, options.session_id, x=x, y=y, delta_x=dx, delta_y=dy,
178184
)

pkg/templates/typescript/cua/providers/gemini.ts

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,11 @@ const COORDINATE_SCALE = 1000;
1818
const DEFAULT_WIDTH = 1200;
1919
const DEFAULT_HEIGHT = 800;
2020

21+
// Gemini reports scroll magnitude in pixels; computer.scroll expects wheel
22+
// notches. Convert with a per-notch pixel budget and clamp to a sane max.
23+
const PX_PER_NOTCH = 60;
24+
const MAX_NOTCHES_PER_ACTION = 17;
25+
2126
const PREDEFINED_ACTIONS = [
2227
'click_at', 'hover_at', 'type_text_at', 'scroll_document',
2328
'scroll_at', 'wait_5_seconds', 'go_back', 'go_forward',
@@ -180,10 +185,14 @@ export class GeminiProvider implements CuaProvider {
180185
case 'scroll_at': {
181186
const x = name === 'scroll_at' ? this.denormalize(args.x, width) : width / 2;
182187
const y = name === 'scroll_at' ? this.denormalize(args.y, height) : height / 2;
183-
const magnitude = args.magnitude ?? 3;
188+
const magnitudePx = args.magnitude ?? 400;
189+
const notches = Math.min(
190+
MAX_NOTCHES_PER_ACTION,
191+
Math.max(1, Math.round(magnitudePx / PX_PER_NOTCH)),
192+
);
184193
const dir = args.direction ?? 'down';
185-
const deltaY = dir === 'up' ? -magnitude : dir === 'down' ? magnitude : 0;
186-
const deltaX = dir === 'left' ? -magnitude : dir === 'right' ? magnitude : 0;
194+
const deltaY = dir === 'up' ? -notches : dir === 'down' ? notches : 0;
195+
const deltaX = dir === 'left' ? -notches : dir === 'right' ? notches : 0;
187196
await computer.scroll(sessionId, { x, y, delta_x: deltaX, delta_y: deltaY });
188197
break;
189198
}

0 commit comments

Comments
 (0)