Skip to content

Commit c93bc5b

Browse files
committed
release: bump version to v2026.3.26
- summarize changes from 2e81fa1 - sync plugin/package/doc version references - add changelog updates for v2026.3.26 Made-with: Cursor
1 parent 2e81fa1 commit c93bc5b

6 files changed

Lines changed: 185 additions & 18 deletions

File tree

README.md

Lines changed: 60 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ OpenClaw CloudPhone is a plugin that gives AI agents device management and UI au
66

77
With natural language instructions, an agent can list devices, power them on or off, capture screenshots, tap, swipe, type text, and perform other UI actions without writing manual scripts.
88

9-
Starting from `v1.0.7`, the package also ships with a built-in skill, `basic-skill`, which helps agents combine these tools in a more reliable way.
9+
Starting from `v1.1.0`, the package ships with built-in skills (including `basic-skill`) that help agents combine these tools in a more reliable way.
1010

1111
## Quick Start
1212

@@ -75,7 +75,7 @@ Once the plugin is loaded successfully, the agent can use all CloudPhone tools.
7575

7676
This repository is first and foremost an **OpenClaw plugin**. Its job is to expose the CloudPhone OpenAPI as tools that an agent can call.
7777

78-
Starting from `v1.0.7`, the package also includes an **OpenClaw skill**:
78+
Starting from `v1.1.0`, the package includes **OpenClaw skills**:
7979

8080
- Plugin: defines **what the agent can do** by providing `cloudphone_*` tools
8181
- Skill: defines **how the agent should do it reliably** by teaching call order, recovery steps, and safer workflows
@@ -164,6 +164,56 @@ After the plugin is installed, the agent automatically gets the following capabi
164164
| `cloudphone_snapshot` | Capture a screenshot or UI tree snapshot from the device |
165165
| `cloudphone_render_image` | Render a screenshot URL as an image directly in chat |
166166

167+
## planActionTool (`cloudphone_plan_action`)
168+
169+
`planActionTool` maps to `cloudphone_plan_action`. It lets the agent call an AutoGLM model to analyze the current screenshot and goal, then return a structured next-action plan for CloudPhone UI automation.
170+
171+
Typical scenarios:
172+
- uncertain next step on a dynamic UI
173+
- deciding tap/swipe/input intent before execution
174+
- recovering when repeated direct actions fail
175+
176+
### Prerequisites
177+
178+
Configure these plugin fields before using `cloudphone_plan_action`:
179+
- required: `autoglmBaseUrl`, `autoglmApiKey`, `autoglmModel`
180+
- optional: `autoglmMaxTokens` (default `3000`), `autoglmLang` (default `cn`)
181+
182+
Example (`plugins.entries.cloudphone.config`):
183+
184+
```json
185+
{
186+
"autoglmBaseUrl": "https://open.bigmodel.cn/api/paas/v4",
187+
"autoglmApiKey": "your-api-key",
188+
"autoglmModel": "autoglm-phone",
189+
"autoglmMaxTokens": 3000,
190+
"autoglmLang": "cn"
191+
}
192+
```
193+
194+
### Parameters and minimal example
195+
196+
Core input:
197+
- `device_id`: target cloud phone device ID
198+
- `goal`: natural language task goal
199+
200+
Minimal example:
201+
202+
```text
203+
device_id: "your-device-id"
204+
goal: "Open WeChat and enter the search page"
205+
```
206+
207+
Expected output:
208+
- model reasoning summary for the current screen
209+
- a suggested next action that can be executed with `cloudphone_*` tools
210+
211+
### Notes
212+
213+
- If required `autoglm*` fields are missing, the tool returns a config error.
214+
- Recommended flow: `cloudphone_snapshot` -> `cloudphone_plan_action` -> execute with `cloudphone_tap`/`cloudphone_swipe`/`cloudphone_input_text` -> verify with new snapshot.
215+
- Keep each goal focused to one immediate UI objective for better planning quality.
216+
167217
## Usage Examples
168218

169219
After installation and configuration, you can control cloud phones through natural language prompts.
@@ -283,7 +333,7 @@ Make sure `plugins.entries.cloudphone.enabled` is set to `true` in `openclaw.jso
283333

284334
**Q: The tools work, but the agent is not very stable when operating a cloud phone UI.**
285335

286-
Starting from `v1.0.7`, the package ships with the `basic-skill` skill. It teaches the agent to use the tools in a short loop: observe -> act -> verify -> observe again. Make sure you installed a recent version and restarted the Gateway so the latest skill was loaded.
336+
Starting from `v1.1.0`, the package ships with built-in skills such as `basic-skill`. They teach the agent to use the tools in a short loop: observe -> act -> verify -> observe again. Make sure you installed a recent version and restarted the Gateway so the latest skills were loaded.
287337

288338
**Q: A tool call fails with a request error or timeout.**
289339

@@ -301,7 +351,13 @@ The agent should call `cloudphone_render_image` automatically to turn that URL i
301351

302352
## Changelog
303353

304-
Current version: **v1.1.0**
354+
Current version: **v2026.3.26**
355+
356+
### v2026.3.26
357+
358+
- Added verbose step-by-step logs for cloudphone_plan_action to improve debugging and failure tracing
359+
- Expanded planActionTool documentation with prerequisites, usage flow, and safety notes in both English and Chinese README
360+
- Synced built-in skills wording and release docs to align with the current v1.1.0+ behavior
305361

306362
### v1.1.0
307363

README.zh-CN.md

Lines changed: 60 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ OpenClaw 云手机插件,让 AI Agent 具备云手机的设备管理与 UI 自
66

77
通过自然语言对话即可完成云手机的查询、开关机、截图、点击、滑动、输入等操作,无需手动编写脚本。
88

9-
`v1.0.7` 开始,插件会一并发布内置 skill `basic-skill`,用于教 Agent 更稳定地组合使用这些工具。
9+
`v1.1.0` 开始,插件会一并发布内置 skills(包含 `basic-skill`,用于教 Agent 更稳定地组合使用这些工具。
1010

1111
## 快速开始
1212

@@ -75,7 +75,7 @@ openclaw gateway restart
7575

7676
这个仓库首先是一个 **OpenClaw 插件**,职责是把云手机 OpenAPI 暴露为 Agent 可调用的工具。
7777

78-
`v1.0.7` 开始,仓库还会随包发布一个 **OpenClaw Skill**
78+
`v1.1.0` 开始,仓库会随包发布 **OpenClaw Skills**
7979

8080
- 插件:解决“能做什么”,提供 `cloudphone_*` 工具
8181
- skill:解决“怎样更稳地做”,教 Agent 何时调用工具、如何按顺序调用、失败后如何恢复
@@ -164,6 +164,56 @@ skills/basic-skill/
164164
| `cloudphone_snapshot` | 获取设备截图或 UI 树快照 |
165165
| `cloudphone_render_image` | 将截图 URL 渲染为对话中可直接展示的图片 |
166166

167+
## planActionTool(`cloudphone_plan_action`
168+
169+
`planActionTool` 对应工具名 `cloudphone_plan_action`。它会调用 AutoGLM 模型,结合当前截图与任务目标,产出结构化的下一步操作规划,帮助云手机 UI 自动化更稳地决策。
170+
171+
典型场景:
172+
- 页面状态复杂,不确定下一步动作
173+
- 执行前先判断应点击/滑动/输入什么
174+
- 直接操作多次失败后用于恢复决策
175+
176+
### 前置配置
177+
178+
使用 `cloudphone_plan_action` 前需要在插件配置中设置:
179+
- 必填:`autoglmBaseUrl``autoglmApiKey``autoglmModel`
180+
- 可选:`autoglmMaxTokens`(默认 `3000`)、`autoglmLang`(默认 `cn`
181+
182+
示例(`plugins.entries.cloudphone.config`):
183+
184+
```json
185+
{
186+
"autoglmBaseUrl": "https://open.bigmodel.cn/api/paas/v4",
187+
"autoglmApiKey": "your-api-key",
188+
"autoglmModel": "autoglm-phone",
189+
"autoglmMaxTokens": 3000,
190+
"autoglmLang": "cn"
191+
}
192+
```
193+
194+
### 参数与最小示例
195+
196+
核心入参:
197+
- `device_id`:目标云手机设备 ID
198+
- `goal`:自然语言任务目标
199+
200+
最小示例:
201+
202+
```text
203+
device_id: "your-device-id"
204+
goal: "打开微信并进入搜索页面"
205+
```
206+
207+
预期输出:
208+
- 对当前页面的分析摘要
209+
- 可由 `cloudphone_*` 工具执行的下一步建议动作
210+
211+
### 注意事项
212+
213+
- 缺少必填 `autoglm*` 配置时,工具会返回配置错误。
214+
- 推荐链路:`cloudphone_snapshot` -> `cloudphone_plan_action` -> 用 `cloudphone_tap`/`cloudphone_swipe`/`cloudphone_input_text` 执行 -> 再截图验证。
215+
- 每次 `goal` 尽量聚焦一个短目标,可提升规划质量与稳定性。
216+
167217
## 使用示例
168218

169219
安装配置完成后,可以直接通过自然语言对话操控云手机。
@@ -283,7 +333,7 @@ image_url : string - HTTPS 图片地址(必填)
283333

284334
**Q: 工具能用,但 Agent 不太会稳定操作云手机 UI?**
285335

286-
`v1.0.7` 开始,插件会随包发布 `basic-skill` skill。它会教 Agent 按“观察 -> 行动 -> 验证 -> 再观察”的闭环使用工具。请确认当前安装的是较新版本,并已重启 Gateway 让最新 skill 被加载。
336+
`v1.1.0` 开始,插件会随包发布内置 skills(如 `basic-skill`)。它们会教 Agent 按“观察 -> 行动 -> 验证 -> 再观察”的闭环使用工具。请确认当前安装的是较新版本,并已重启 Gateway 让最新 skills 被加载。
287337

288338
**Q: 调用工具报请求失败或超时?**
289339

@@ -301,7 +351,13 @@ Agent 应该会自动调用 `cloudphone_render_image` 将该 URL 转成可展示
301351

302352
## 更新日志
303353

304-
当前版本:**v1.1.0**
354+
当前版本:**v2026.3.26**
355+
356+
### v2026.3.26
357+
358+
- 为 cloudphone_plan_action 增加详细分步日志,提升调试与失败排查效率
359+
- 完善 planActionTool 文档说明,补充前置配置、调用流程和注意事项(中英文 README 同步)
360+
- 同步内置 skills 相关表述与发布文档,使其与当前 v1.1.0+ 行为保持一致
305361

306362
### v1.1.0
307363

openclaw.plugin.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"id": "cloudphone",
33
"name": "CloudPhone Plugin",
4-
"version": "1.1.0",
4+
"version": "2026.3.26",
55
"description": "OpenClaw CloudPhone plugin that exposes CloudPhone OpenAPI capabilities for user info, device management, and UI automation as agent tools.",
66
"configSchema": {
77
"type": "object",

package-lock.json

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@suqiai/cloudphone",
3-
"version": "1.1.0",
3+
"version": "2026.3.26",
44
"license": "MIT",
55
"description": "OpenClaw CloudPhone plugin that gives AI agents cloud device management and UI automation capabilities through natural language, including device queries, power actions, screenshots, taps, swipes, and text input.",
66
"main": "dist/index.js",

src/tools.ts

Lines changed: 61 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,11 @@ function getApiErrorMessage(body: Record<string, unknown>): string {
8888

8989
const LOG_PREFIX = "[cloudphone]";
9090

91+
function summarizeTextForLog(value: string, limit = 120): string {
92+
if (!value) return "";
93+
return value.length > limit ? `${value.slice(0, limit)}…` : value;
94+
}
95+
9196
/** Safe for logs: origin + pathname only (no query — pre-signed URLs must not be logged in full). */
9297
function safeUrlForLog(url: string): string {
9398
try {
@@ -1311,11 +1316,26 @@ const planActionTool: ToolDefinition = {
13111316
},
13121317
optional: true,
13131318
execute: async (_id, params) => {
1319+
const traceId = `planAction:${Date.now()}:${Math.random().toString(36).slice(2, 8)}`;
1320+
const startedAll = Date.now();
13141321
const autoglmBaseUrl = runtimeConfig.autoglmBaseUrl;
13151322
const autoglmApiKey = runtimeConfig.autoglmApiKey;
13161323
const autoglmModel = runtimeConfig.autoglmModel;
1324+
const deviceId = String(params.device_id ?? "");
1325+
const task = String(params.task ?? "");
1326+
const context = params.context ? String(params.context) : undefined;
1327+
const maxTokens = Number(runtimeConfig.autoglmMaxTokens ?? 3000);
1328+
const lang = String(runtimeConfig.autoglmLang ?? "cn");
1329+
1330+
console.log(
1331+
`${LOG_PREFIX} planAction start trace=${traceId} device_id=${deviceId || "(empty)"} task_len=${task.length} context_len=${context?.length ?? 0} lang=${lang} max_tokens=${maxTokens}`
1332+
);
1333+
console.log(
1334+
`${LOG_PREFIX} planAction config trace=${traceId} has_base_url=${!!autoglmBaseUrl} has_api_key=${!!autoglmApiKey} has_model=${!!autoglmModel}`
1335+
);
13171336

13181337
if (!autoglmBaseUrl || !autoglmApiKey || !autoglmModel) {
1338+
console.error(`${LOG_PREFIX} planAction config missing trace=${traceId}`);
13191339
return toJsonText({
13201340
ok: false,
13211341
message:
@@ -1327,45 +1347,65 @@ const planActionTool: ToolDefinition = {
13271347
});
13281348
}
13291349

1330-
const deviceId = String(params.device_id);
1331-
const task = String(params.task ?? "");
1332-
const context = params.context ? String(params.context) : undefined;
1333-
const maxTokens = Number(runtimeConfig.autoglmMaxTokens ?? 3000);
1334-
const lang = String(runtimeConfig.autoglmLang ?? "cn");
1335-
13361350
// 1. Take snapshot
1351+
const startedSnapshot = Date.now();
1352+
console.log(`${LOG_PREFIX} planAction step1 snapshot start trace=${traceId}`);
13371353
const snapshotResult = await apiRequest("POST", "/devices/snapshot", { device_id: deviceId }, 15000);
1354+
console.log(
1355+
`${LOG_PREFIX} planAction step1 snapshot done trace=${traceId} elapsed=${Date.now() - startedSnapshot}ms content_items=${snapshotResult.content.length}`
1356+
);
13381357
const first = snapshotResult.content[0];
13391358
if (!first || first.type !== "text") {
1359+
console.error(`${LOG_PREFIX} planAction step1 snapshot invalid_content trace=${traceId}`);
13401360
return toJsonText({ ok: false, message: "Snapshot did not return text content" });
13411361
}
13421362

13431363
let snapshotData: Record<string, unknown>;
13441364
try {
13451365
snapshotData = JSON.parse(first.text);
13461366
} catch {
1367+
console.error(`${LOG_PREFIX} planAction step1 snapshot parse_failed trace=${traceId}`);
13471368
return toJsonText({ ok: false, message: "Failed to parse snapshot response" });
13481369
}
13491370

13501371
if (snapshotData.ok === false) {
1372+
console.error(
1373+
`${LOG_PREFIX} planAction step1 snapshot failed trace=${traceId} message=${summarizeTextForLog(String(snapshotData.message ?? ""))}`
1374+
);
13511375
return toJsonText({ ok: false, message: String(snapshotData.message ?? "Snapshot failed") });
13521376
}
13531377

13541378
const screenshotUrl = String(snapshotData.screenshot_url ?? "");
13551379
if (!screenshotUrl) {
1380+
console.error(`${LOG_PREFIX} planAction step1 snapshot missing_url trace=${traceId}`);
13561381
return toJsonText({ ok: false, message: "Snapshot did not return a screenshot_url" });
13571382
}
1383+
console.log(
1384+
`${LOG_PREFIX} planAction step1 snapshot success trace=${traceId} screenshot=${safeUrlForLog(screenshotUrl)}`
1385+
);
13581386

13591387
// 2. Fetch image as base64
1388+
const startedImgFetch = Date.now();
1389+
console.log(`${LOG_PREFIX} planAction step2 image_fetch start trace=${traceId}`);
13601390
const imgResult = await fetchImageAsBase64(screenshotUrl);
13611391
if ("error" in imgResult) {
1392+
console.error(
1393+
`${LOG_PREFIX} planAction step2 image_fetch failed trace=${traceId} elapsed=${Date.now() - startedImgFetch}ms error=${summarizeTextForLog(imgResult.error)}`
1394+
);
13621395
return toJsonText({ ok: false, message: `Image fetch error: ${imgResult.error}` });
13631396
}
1397+
console.log(
1398+
`${LOG_PREFIX} planAction step2 image_fetch success trace=${traceId} elapsed=${Date.now() - startedImgFetch}ms mime=${imgResult.mimeType} base64_len=${imgResult.base64.length} width=${imgResult.width ?? "?"} height=${imgResult.height ?? "?"}`
1399+
);
13641400

13651401
// 3. Call autoglm model for action decision
13661402
let thinking: string;
13671403
let actionStr: string;
13681404
let rawContent: string;
1405+
const startedAutoglm = Date.now();
1406+
console.log(
1407+
`${LOG_PREFIX} planAction step3 autoglm start trace=${traceId} base_url=${safeUrlForLog(autoglmBaseUrl)} model=${autoglmModel} task_preview=${summarizeTextForLog(task, 80)} context_preview=${summarizeTextForLog(context ?? "", 80)}`
1408+
);
13691409
try {
13701410
({ thinking, actionStr, rawContent } = await callAutoglmForAction(
13711411
imgResult.base64,
@@ -1378,15 +1418,26 @@ const planActionTool: ToolDefinition = {
13781418
maxTokens,
13791419
lang
13801420
));
1421+
console.log(
1422+
`${LOG_PREFIX} planAction step3 autoglm success trace=${traceId} elapsed=${Date.now() - startedAutoglm}ms thinking_len=${thinking.length} action_len=${actionStr.length} raw_len=${rawContent.length}`
1423+
);
13811424
} catch (err) {
13821425
const errMsg = err instanceof Error ? err.message : String(err);
1426+
console.error(
1427+
`${LOG_PREFIX} planAction step3 autoglm failed trace=${traceId} elapsed=${Date.now() - startedAutoglm}ms error=${summarizeTextForLog(errMsg)}`
1428+
);
13831429
return toJsonText({ ok: false, message: `AutoGLM call failed: ${errMsg}` });
13841430
}
13851431

13861432
// 4. Parse action string into structured object
1433+
const startedParse = Date.now();
13871434
const action = parseAutoglmAction(actionStr);
1435+
console.log(
1436+
`${LOG_PREFIX} planAction step4 parse_action trace=${traceId} elapsed=${Date.now() - startedParse}ms action_type=${action.type} has_element=${!!action.element} has_start=${!!action.start} has_end=${!!action.end}`
1437+
);
13881438

13891439
// 5. Look up resolution and convert normalized 0-999 coords to device pixels
1440+
const startedConvert = Date.now();
13901441
const resolution = await getDeviceResolutionByDeviceId(deviceId);
13911442

13921443
if (resolution) {
@@ -1409,6 +1460,9 @@ const planActionTool: ToolDefinition = {
14091460
];
14101461
}
14111462
}
1463+
console.log(
1464+
`${LOG_PREFIX} planAction step5 convert_coords trace=${traceId} elapsed=${Date.now() - startedConvert}ms resolution=${resolution ? `${resolution.width}x${resolution.height}` : "unknown"} coord_system=${resolution ? "pixel" : "normalized"}`
1465+
);
14121466

14131467
const out: Record<string, unknown> = {
14141468
ok: true,
@@ -1424,6 +1478,7 @@ const planActionTool: ToolDefinition = {
14241478
out.resolution_height = resolution.height;
14251479
}
14261480

1481+
console.log(`${LOG_PREFIX} planAction done trace=${traceId} elapsed=${Date.now() - startedAll}ms`);
14271482
return toJsonText(out);
14281483
},
14291484
};

0 commit comments

Comments
 (0)