Skip to content

suqi-tech/cloudphone

 
 

Repository files navigation

CloudPhone Plugin for OpenClaw

Chinese README

OpenClaw CloudPhone is a plugin that gives AI agents cloud phone automation capabilities through natural language.

With a single instruction, an agent can submit any cloud phone task to the backend AI Agent, which handles the full execution loop — screen observation, LLM planning, and UI actions — and streams the result back in real time.

Quick Start

1. Install the plugin

openclaw plugins install @suqiai/cloudphone

To update the plugin later, run:

openclaw plugins update @suqiai/cloudphone

2. Configure the plugin

Set apikey in plugins.entries.cloudphone.config. The plugin uses built-in defaults for other optional settings.

Option A: Configuration file (openclaw.json)

Add the following configuration to openclaw.json:

  • apikey: Obtain your API Key by logging in or signing up at https://ai.suqi.tech, then add it in your account/settings.
{
  "plugins": {
    "entries": {
      "cloudphone": {
        "enabled": true,
        "config": {
          "apikey": "the apikey you can get from the user center of this website"
        }
      }
    }
  }
}

Option B: OpenClaw Console UI

  1. Open the OpenClaw console in your browser.
  2. Go to the Plugins section, find CloudPhone and enable it.
  3. Set apikey (from https://ai.suqi.tech after login or sign-up, in your account/settings).

Screenshots:

OpenClaw Console — Plugins

OpenClaw Console — CloudPhone config

3. Restart the Gateway

openclaw gateway restart

How It Works

This plugin exposes the CloudPhone backend AI Agent as three high-level tools:

  1. cloudphone_execute — Submit a natural language instruction to the backend. The backend handles LLM interpretation, cloud phone UI automation (observe → plan → act loop), and dispatches all actions automatically. Returns a task_id immediately.

  2. cloudphone_execute_and_wait — Auto-chain call: execute cloudphone_execute, then automatically run one cloudphone_task_result poll and return the first 10-second window result.

  3. cloudphone_task_result — Subscribe to SSE for a task; each call consumes one 10-second window and returns the thinking delta for that window until terminal status.

The agent no longer needs to directly control UI coordinates, manage screenshots, or call individual tap/swipe/input tools. The backend AI Agent handles the full automation loop.

Configuration

Field Type Required Default Description
apikey string Yes - Authorization credential (ApiKey)

Obtain your API Key by logging in or signing up at https://ai.suqi.tech, then find it in your account/settings.

Optional fields such as baseUrl and timeout are documented in openclaw.plugin.json and use built-in defaults when omitted.

Tool Overview

After the plugin is installed, the agent automatically gets the following tools.

User and device management

Tool Description
cloudphone_get_user_profile Get the current user's basic information
cloudphone_list_devices List cloud phone devices with pagination, keyword search, and status filters
cloudphone_get_device_info Get detailed information for a specific device

AI Agent task execution

Tool Description
cloudphone_execute Submit a natural language instruction; returns task_id immediately
cloudphone_execute_and_wait Auto-chain execute + first task_result poll
cloudphone_task_result Return 10s-window thinking delta and current task status

Usage Examples

After installation and configuration, you can control cloud phones through natural language prompts.

Run a UI automation task

Open WeChat on the cloud phone, search for the "OpenClaw" public account, and follow it

The agent will:

  1. Call cloudphone_list_devices to get the device ID
  2. Call cloudphone_execute_and_wait to submit and trigger the first poll automatically
  3. If status is running, continue calling cloudphone_task_result every ~10 seconds until success/done/error

Check device status

Show me my cloud phone devices

The agent will call cloudphone_list_devices and return the device list.

Submit a task and wait for completion

Agent: cloudphone_execute_and_wait
  instruction: "打开抖音,搜索美食视频并点赞第一条"
  device_id: "abc123"
→ returns: { ok: false, task_result: { status: "running", thinking: [...] } }

Agent: cloudphone_task_result
  task_id: 42
→ returns 10s-window delta until terminal: { ok: true, status: "done", result: {...} }

Tool Parameters

cloudphone_execute

instruction    : string  - Natural language task instruction (required)
device_id      : string  - Device unique ID (recommended)
user_device_id : number  - User device ID (compatibility, device_id takes priority)
session_id     : string  - Optional session ID for streaming persistence
lang           : string  - Language hint: "cn" (default) or "en"

cloudphone_task_result

task_id    : number - Task ID from cloudphone_execute (required)

Response fields:

ok         : boolean - Whether the operation succeeded
task_id    : number  - Echo of the input task_id
status     : string  - "done" | "success" | "error" | "timeout"
thinking   : string[] - New thinking lines from the current 10-second polling window (delta)
result     : object  - Final task result from the backend
message    : string  - Error message (when status is "error" or "timeout")

cloudphone_list_devices

keyword : string  - Search keyword (device name or device ID)
status  : string  - Status filter: "online" | "offline"
page    : integer - Page number, default 1
size    : integer - Items per page, default 20

cloudphone_get_device_info

user_device_id : number - User device ID (required)

FAQ

Q: The agent cannot find the CloudPhone tools after installation.

Make sure plugins.entries.cloudphone.enabled is set to true in openclaw.json, then restart the Gateway.

Q: Why does cloudphone_task_result return running?

This is expected when the current 10-second polling window has not reached terminal status. Keep calling cloudphone_task_result every ~10 seconds until success/done/error.

Q: A tool call fails with a request error or authorization failure.

  • Check whether apikey is valid and that you restarted the Gateway after changing config
  • Check network connectivity and whether the CloudPhone service is reachable
  • 401 errors indicate an invalid or expired apikey

Q: How do I get an apikey?

Log in or sign up at https://ai.suqi.tech and get your API Key from your account/settings.

Q: Does cloudphone_execute support concurrent tasks?

No, not for the same agent context. The plugin enforces serial execution per agent key (session_id, then device_id, then user_device_id, otherwise default).
If you call cloudphone_execute before the previous task reaches terminal status in cloudphone_task_result, it returns code: "AGENT_BUSY" with blocking_task_id.

Required call order:

  1. cloudphone_execute_and_wait (auto-runs the first poll)
  2. cloudphone_task_result (if status is running, continue polling until terminal: success/done/error)
  3. Next cloudphone_execute

Changelog

Current version: v2026.4.2

v2026.4.2

  • Enforced per-device/session serial task execution: cloudphone_execute returns AGENT_BUSY when a task is still in flight until cloudphone_task_result reaches a terminal status
  • Improved SSE parsing for both standard event:/data: framing and backend JSON-embedded event shapes
  • Strengthened tool descriptions with explicit guardrails (no autonomous extra steps, no screenshot-only requests, bounded retries)
  • Renamed npm package scope to @suqiai/cloudphone (commit 3c50f95)
  • Added src/tools.serial-gating.test.ts (Node test runner); exclude *.test.ts from tsc output so dist/ stays publish-clean
  • Updated built-in skill docs and README guidance for the execute → poll workflow
  • Synced package/plugin/doc version references to v2026.4.2

v2026.4.1

  • Added cloudphone_execute_and_wait to auto-chain task submission and the first result polling
  • Clarified tool behavior and call sequence documentation for task execution and polling
  • Updated .gitignore with docs/ and openspec/ entries for cleaner project management
  • Synced package/plugin/doc version references to v2026.4.1

v2026.3.31

  • Enhanced task execution and result handling flow in plugin tools
  • Improved task-related documentation and reference examples in built-in skills
  • Synced package/plugin/doc version references to v2026.3.31

v2026.3.30

  • Replaced 12 fine-grained UI automation tools (tap, swipe, snapshot, etc.) with 2 high-level backend-delegated tools
  • Added cloudphone_execute: submit natural language instructions to the backend AI Agent
  • Added cloudphone_task_result: stream agent thinking and final result via SSE
  • Removed AutoGLM direct integration (backend now handles the full observe → plan → act loop)
  • Simplified plugin config: removed all autoglm* fields, only apikey, baseUrl, timeout remain
  • Updated skills, README, and reference docs to reflect new architecture

v2026.3.27

  • Summarized and aligned release notes based on target commit 1da1031
  • Synced package/plugin/doc version references to v2026.3.27

v1.1.0

  • Enhanced screenshot handling in cloudphone_render_image for improved compatibility
  • Added the cloudphone-snapshot-url skill

v1.0.6

  • Added the built-in basic-skill skill distributed with the plugin
  • Added reference.md as a tool parameter quick reference

License

This plugin follows the license terms of the repository it belongs to.

About

OpenClaw CloudPhone is a plugin that gives AI agents device management and UI automation capabilities for cloud phones. With natural language instructions, an agent can list devices, power them on or off, capture screenshots, tap, swipe, type text, and perform other UI actions without writing manual scripts.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • TypeScript 100.0%