Skip to content

Screenshot tool fails from background processes (systemd services, MCP parent shells) on GNOME Wayland #20

@1RB

Description

@1RB

Environment

  • OS: Ubuntu 24.04.4 LTS (Noble Numbat)
  • Desktop: GNOME (Wayland session)
  • computer-use-linux version: v0.2.5 (prebuilt binary, also reproducible via npm install)
  • Context: MCP server spawned by Hermes gateway (systemd user service / background shell)

Problem
When computer-use-linux runs as a child process from a background service (e.g., systemd --user, non-interactive shell), the screenshot tool consistently fails with:

Error: GNOME Shell screenshot failed: GNOME Shell Screenshot call failed; XDG portal screenshot failed: XDG portal screenshot was denied or cancelled with response code 2

The same command works perfectly when run from an interactive gnome-terminal window.

Root Cause Analysis
I investigated the failure path and found two issues:

  1. GNOME Shell API rejects computer-use-linux via explicit allowlist: GNOME Shell's org.gnome.Shell.Screenshot service uses a DBusSenderChecker with a hardcoded allowlist of trusted bus names. The allowlist contains only:

    • org.gnome.SettingsDaemon.MediaKeys
    • org.freedesktop.impl.portal.desktop.gtk
    • org.freedesktop.impl.portal.desktop.gnome
    • org.gnome.Screenshot (the bus name claimed by gnome-screenshot)

    Source: gnome-shell/js/ui/screenshot.js (GNOME 46) and gnome-shell/js/misc/util.js

    computer-use-linux does not own any of these bus names, so its DBus call is rejected with Gio.DBusError.ACCESS_DENIED and the message "Screenshot is not allowed". The DBusSenderChecker logic is:

    async checkInvocation(invocation) {
        if (global.context.unsafe_mode)
            return;
        if (await this._isSenderAllowed(invocation.get_sender()))
            return;
        throw new GLib.Error(Gio.DBusError,
            Gio.DBusError.ACCESS_DENIED,
            `${invocation.get_method_name()} is not allowed`);
    }

    From an interactive terminal, gnome-screenshot succeeds because it claims the org.gnome.Screenshot bus name, which is in the allowlist. computer-use-linux does not claim this bus name, so it is rejected even when calling the exact same DBus method with the same arguments.

  2. XDG portal returns response code 2 for background processes: The fallback capture_with_portal() uses org.freedesktop.portal.Screenshot. When computer-use-linux passes interactive: false and an empty parent window (""), the GNOME portal backend (xdg-desktop-portal-gnome) attempts to create a screenshot_dialog_new with a fake_parent window. However, from a background process with no focused window, the portal returns response code 2 (cancelled/dismissed). The exact mechanism may vary by GNOME version, but the practical result is that the portal fallback also fails for background services.

    Source: xdg-desktop-portal-gnome/src/screenshot.c

    gnome-screenshot bypasses the portal entirely and uses the trusted Shell API, which is why it works from the same background shell context.

Reproduction Steps

  1. Start computer-use-linux mcp from a systemd user service or a background shell.
  2. Send a tools/call request for the screenshot tool.
  3. Observe the error.
  4. Run gnome-screenshot -f /tmp/test.png from the same shell — it succeeds.

Suggested Fixes

  1. Add gnome-screenshot as a third fallback (best): When both capture_with_gnome_shell() and capture_with_portal() fail, try spawning gnome-screenshot as a subprocess and reading its output. This is pragmatic, matches the existing fallback pattern, and avoids GNOME's DBus security model entirely. This is the most robust fix because gnome-screenshot is already present on virtually all GNOME systems and already handles the allowlist/portal complexity correctly.

  2. Environment variable to force backend (secondary): Allow COMPUTER_USE_LINUX_SCREENSHOT_BACKEND=gnome-screenshot to skip the DBus/portal attempts entirely. Useful for debugging or for users who know their environment requires it.

Verified Workaround
I implemented a Python proxy wrapper that intercepts screenshot tool calls (both CLI and MCP JSON-RPC) and delegates to gnome-screenshot. The proxy has been tested and works:

  • CLI computer-use-linux screenshot returns the correct JSON payload from gnome-screenshot
  • MCP tools/call for screenshot returns a valid CallToolResult with ImageContent (matching the real binary's format: image content first, then text metadata)
  • All other tools (e.g., list_windows, get_app_state) are transparently forwarded to the real binary
#!/usr/bin/env python3
"""Proxy wrapper that falls back to gnome-screenshot for background contexts."""
import json
import os
import subprocess
import sys
import struct
import base64
import threading
import tempfile

REAL_BINARY = os.path.join(
    os.path.dirname(os.path.abspath(__file__)),
    "computer-use-linux-real"
)


def get_png_dimensions(path):
    with open(path, "rb") as f:
        header = f.read(24)
    if header[:8] != b"\x89PNG\r\n\x1a\n":
        return None, None
    w, h = struct.unpack(">II", header[16:24])
    return w, h


def take_gnome_screenshot():
    fd, path = tempfile.mkstemp(suffix=".png", prefix="cul-screenshot-")
    os.close(fd)
    try:
        result = subprocess.run(
            ["gnome-screenshot", "-f", path],
            capture_output=True,
            text=True,
            timeout=15,
        )
        if result.returncode != 0:
            err = result.stderr.strip() or result.stdout.strip() or "unknown error"
            raise RuntimeError(f"gnome-screenshot failed: {err}")
        if not os.path.exists(path):
            raise RuntimeError("gnome-screenshot did not create output file")
        with open(path, "rb") as f:
            png_bytes = f.read()
        w, h = get_png_dimensions(path)
        return png_bytes, w or 0, h or 0
    finally:
        try:
            os.remove(path)
        except Exception:
            pass


def build_screenshot_result(png_bytes, width, height):
    encoded = base64.b64encode(png_bytes).decode("ascii")
    data_url = f"data:image/png;base64,{encoded}"
    metadata = {
        "mime_type": "image/png",
        "data_url": data_url,
        "source": "gnome-screenshot",
        "width": width,
        "height": height,
        "coordinate_width": width,
        "coordinate_height": height,
        "scale": 1.0,
        "resized": False,
        "bytes": len(png_bytes),
        "original_bytes": len(png_bytes),
        "max_bytes": 2 * 1024 * 1024,
        "format": "png",
        "quality": None,
        "cropped_to_window": False,
        "window_title": None,
    }
    return encoded, metadata


def handle_cli_screenshot():
    try:
        png_bytes, w, h = take_gnome_screenshot()
        _, metadata = build_screenshot_result(png_bytes, w, h)
        print(json.dumps(metadata, indent=2))
        return 0
    except Exception as e:
        print(json.dumps({"error": str(e)}), file=sys.stderr)
        return 1


def handle_mcp_screenshot(request_id, params=None):
    try:
        png_bytes, w, h = take_gnome_screenshot()
        encoded, metadata = build_screenshot_result(png_bytes, w, h)
        # Return matching real binary's MCP CallToolResult format:
        # image content first, then text metadata
        response = {
            "jsonrpc": "2.0",
            "id": request_id,
            "result": {
                "content": [
                    {"type": "image", "data": encoded, "mimeType": "image/png"},
                    {"type": "text", "text": json.dumps(metadata)}
                ],
                "isError": False,
            }
        }
        print(json.dumps(response), flush=True)
    except Exception as e:
        response = {
            "jsonrpc": "2.0",
            "id": request_id,
            "error": {
                "code": -32603,
                "message": f"screenshot proxy error: {str(e)}",
            }
        }
        print(json.dumps(response), flush=True)


def forward_stream(src, dst):
    try:
        for line in src:
            dst.write(line)
            dst.flush()
    except Exception:
        pass


def run_mcp_proxy():
    env = os.environ.copy()
    cosmic_helper = os.path.join(
        os.path.dirname(REAL_BINARY),
        "computer-use-linux-cosmic"
    )
    if os.path.exists(cosmic_helper) and not env.get("COMPUTER_USE_LINUX_COSMIC_HELPER"):
        env["COMPUTER_USE_LINUX_COSMIC_HELPER"] = cosmic_helper

    if not os.path.exists(REAL_BINARY):
        print(
            json.dumps({
                "jsonrpc": "2.0",
                "id": None,
                "error": {
                    "code": -32603,
                    "message": f"real binary not found: {REAL_BINARY}",
                }
            }),
            file=sys.stderr,
            flush=True,
        )
        sys.exit(127)

    proc = subprocess.Popen(
        [REAL_BINARY, "mcp"],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        text=True,
        bufsize=1,
        env=env,
    )

    t_out = threading.Thread(
        target=forward_stream, args=(proc.stdout, sys.stdout), daemon=True
    )
    t_err = threading.Thread(
        target=forward_stream, args=(proc.stderr, sys.stderr), daemon=True
    )
    t_out.start()
    t_err.start()

    try:
        for line in sys.stdin:
            line = line.strip()
            if not line:
                continue
            try:
                msg = json.loads(line)
            except json.JSONDecodeError:
                proc.stdin.write(line + "\n")
                proc.stdin.flush()
                continue

            if (
                msg.get("method") == "tools/call"
                and msg.get("params", {}).get("name") == "screenshot"
            ):
                handle_mcp_screenshot(msg.get("id"))
            else:
                proc.stdin.write(line + "\n")
                proc.stdin.flush()
    except KeyboardInterrupt:
        pass
    finally:
        try:
            proc.stdin.close()
        except Exception:
            pass
        proc.wait(timeout=5)


def main():
    if len(sys.argv) >= 2 and sys.argv[1] == "mcp":
        run_mcp_proxy()
        return 0
    elif len(sys.argv) >= 2 and sys.argv[1] == "screenshot":
        return handle_cli_screenshot()
    else:
        os.execv(REAL_BINARY, sys.argv)


if __name__ == "__main__":
    sys.exit(main())

To use the workaround:

  1. Rename computer-use-linux to computer-use-linux-real
  2. Place the proxy script at computer-use-linux and make it executable
  3. The proxy transparently intercepts screenshot calls and falls back to gnome-screenshot, while forwarding all other requests to the real binary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions