diff --git a/docs/STATUS.md b/docs/STATUS.md
index 699e4c4a..5f31a8ce 100644
--- a/docs/STATUS.md
+++ b/docs/STATUS.md
@@ -1,9 +1,21 @@
 SINGLE SOURCE OF TRUTH for cross-agent handoff.
-Last updated: 2026-06-14 ~12:10 BST, @taOS (active).
+Last updated: 2026-06-14 ~13:30 BST, @taOS (PAUSED for a fresh session).
+
+▶▶ SESSION PAUSED 2026-06-14 ~13:30 BST (Jay asked to pause + update handoff). NEW SESSION START HERE:
+   - master=51837bed, dev=118409a5. Working tree clean. NO uncommitted work anywhere.
+   - TWO PRs IN FLIGHT, both now CLEAN + FULLY GREEN as of ~13:35 BST (all checks + Gitar/Kilo/CodeRabbit SUCCESS) -- READY TO MERGE, left for the fresh session per the pause:
+     • PR #884 feat(agent) agent-controlled image generation. Branch feat/agent-image-gen, tip ddeb1bec. Commits this session: 443e70ff canvas wiring + e94de444 describe_image_capabilities + 165e0b83/10f4732c/a578a870 bot-review hardening + ddeb1bec image-prompting manual. WHEN GREEN: merge to dev, then DEPLOY Pi and drive the storybook flow.
+     • PR #886 fix(store) rkllama install entry (#844). Branch fix/rkllama-store-install, tip d6960af0 (cleanly off origin/dev, 3 code files + tests + manual). WHEN GREEN: merge to dev, then dev->master (Jay wanted #844 fixed for the target audience).
+   - REMAINING BOT NITS on both PRs are MINOR + non-blocking (judged, not yet actioned, left for your call): #884 kilo wants _image_backends_from_worker hardened per-entry (worker-level guard already contains it; symmetric 1-line isinstance guard would fully satisfy). #886 kilo flags the install-rkllama.sh `"models"` short-circuit on `{"models":[]}` (that is CORRECT: an empty-but-running rkllama IS installed; models are a separate concern) and non-string model names in verify (can't false-match a string app_id, safe). Decide per-nit; none block merge.
+   - MERGE GATE (handoff 0f): green CI + Kilo + Qodo + Gitar + author. CodeRabbit is legacy/rate-limited, do not block on it. Check INLINE bot comments, not just the check summary.
+   - Tasks #30 (rkllama/#844, in_progress -> close when #886 merges) and #35 (NEW: ~19 other catalog manifests reference missing install scripts; separate follow-up) capture the store-install debt.
+   - 3060 SD BACKEND UNBLOCKED (task #34): @taOSmd relayed Jay's GO 2026-06-14 ~12:30 -- the Fedora RTX 3060 window is OURS to install the SD backend ourselves (stable-diffusion.cpp or ComfyUI, our pick); @taOSmd manages nothing outside taOSmd, so we own the SD backend + its model + pointing the controller's image_backend_url at it. Do this AFTER #884 merges so the storybook image step has a real GPU backend. Box access = resolve the Fedora node via our own tailscale (NEVER commit the IP / put it on the bus).
+   - Re-arm on arrival: freshness cron (:08/:38), A2A SSE monitor, repo-watch (:23). Resume pair for the 15:40Z window is armed (primary 16:42, retry 17:01 local).
 
 
 ▶ RELEASED TO MASTER 2026-06-14 (#883, master=c9c5b0c9, Jay asked "merge dev to main so all users get updates"): the whole overnight body of work is now on master — agent OS control framework (#877-882), macOS-dark theme + purple purge (#879), App Store/real-desktop/Agents/chat redesigns, mobile chat #880 + chat-pwa theme #881. Merge-commit (history preserved), dev NOT deleted. master strict-mode + behind required an admin merge.
 ▶ IN FLIGHT: PR #884 agent-controlled image generation (the storybook demo's image step): generate_image now returns image_ref (fixed a broken b64-of-JSON bug) + canvas_add_image copies the workspace PNG into the project canvas files so art renders on the board; + describe_image_capabilities (read-only cluster tier/tool awareness, agent picks model by intent, system owns load/unload/queue). 26 tests. Baking -> merge to dev -> deploy Pi -> drive the FULL storybook flow to verify. DEMO BLOCKER TO CHECK: an image backend (sd-cpp on 3060 / rkllama on Pi NPU) must be installed+reachable or generate_image fails. Cross-worker image routing (Pi->3060) is a SEPARATE greenlight (TaskRouter exists, not auto-invoked).
+  -- 2026-06-14 ~12:50: pushed bot-review fixes to #884 (commit 10f4732c): skills.py now REFRESHES builtin skill rows after INSERT OR IGNORE (so the Pi, seeded by #882 with the old file_id canvas schema, converges on image_ref); strict filename type-check in image_tool + honest fallback docstring; project_tools canvas-path slug guard (reject non-slugify slug + assert inside projects_root); cluster_tools includes ram_mb. 67 touched-suite tests green. Re-baking for bot re-review. CAUTION LEARNED: local `dev` had 3 unpushed #884 commits -> a branch cut from it (the rkllama branch) accidentally bundled them; fixed by rebasing #886 --onto origin/dev and resetting local dev to origin/dev. Always branch from origin/dev.
 ▶ X POST (Jay, premium): drafted, honest framing (NOT first-ever; Goose/Open-Interpreter/Self-Operating-Computer exist). Angle = agent-native OS so a 4B local model drives the whole thing offline; post WITH the demo video. Add the win to README+website too (draft for approval). Private reasoning only.
 
 ▶▶ MORNING MUST-DO (Jay overnight ask, asleep): features tested+working by morning; agent OS control DONE simple; **offline agent RESULTS by morning**.
@@ -36,15 +48,17 @@ Last updated: 2026-06-14 ~12:10 BST, @taOS (active).
 4. PROMO HERO PROGRAM (memory [[promo-hero-initiative]]): only the agent CHAT + a demo PROJECT stay mock; build everything else REAL. Hero = multi-window (chat + project canvas + store), 5:2 X-cut on all promo. Needs store (#871), project canvas/mind-map (#16, net-new), demo seed (#17), agent window-mgmt API (#18). Mock data PRIVATE on local `marketing` branch (never push/merge; MARKETING.md).
 5. Also queued: store popularity LIVE stars backend (#13), per-app install telemetry -> the now-secured stats page (#15), widget redesign (#19, NOT in the shot), mobile audit, wallpaper picker #864, island v2 #854, GitHub #858 ph2, live-wallpaper package brainstorm.
 
-Branch tips: master=6394a3ed. dev=67dceb64 (#877-#882 agent OS control + mobile + theme MERGED). Merged overall this session: #867 #868 #869 #870 (theme/wallpaper), #871 (store redesign), #873 (real desktop: dock right-click + inline New Folder + FS-backed icons + rename API), #874 (window.taosDesktop control API + docs/desktop-control.md); taos-website #5 (stats Basic Auth -> main, set STATS_USER/STATS_PASS in Coolify). Local-only `marketing` branch (private, no upstream; NEVER push/merge).
+Branch tips: master=51837bed (#887 released #885 to master), dev=d5c089e9 (#885 mobile branch-dropdown fix merged). Merged overall this session: #867 #868 #869 #870 (theme/wallpaper), #871 (store redesign), #873 (real desktop: dock right-click + inline New Folder + FS-backed icons + rename API), #874 (window.taosDesktop control API + docs/desktop-control.md); taos-website #5 (stats Basic Auth -> main, set STATS_USER/STATS_PASS in Coolify). Local-only `marketing` branch (private, no upstream; NEVER push/merge).
 
-Session state: ACTIVE (autonomous overnight). ALL baking PRs MERGED to dev (tip=4ecc7961): #872 (tsParticles wallpaper + sliders), #873 (real desktop), #874 (agent OS controls). Open-PR queue drained (only draft #476 remains; #846 already CLOSED). #872 SWAPS the animated wallpaper renderer from the hand-rolled canvas NeuralWallpaper (component "neural") to tsParticles ParticlesWallpaper (component "particles"); theme-store registers id "neural-live" w/ component "particles" -- VERIFY the tsParticles look LIVE on Pi (headless can't rasterize it). #25 (tiled double-header) CLOSED: not a bug, was the 32px top-bar chrome. SECURITY: dependabot alert #5 (esbuild RCE < 0.28.1) is STALE -- desktop already pins esbuild 0.28.1 via overrides (lockfile + installed both 0.28.1); leave for dependabot to auto-close, no code change. #19 widget redesign HELD for Jay (taste + depends on the desktop/widget/dash mode-switcher brainstorm [[project_desktop_modes]]). FEDORA MODEL TESTS (Jay 2026-06-14 ~02:00): eval harness + runbook built PRIVATE (~/tinyagentos-private/specs/storybook-demo/storybook_toolcall_eval.py) -- scores local models on the storybook tool-call flow incl ID-threading; A2A sent to @taOSmd (msg 431) to coordinate Fedora box (it's mid E-009 sweep, do NOT interrupt); awaiting its ping + local-model list. tsParticles look + Safari dark<->light + live-wallpaper animation + desktop icons/thumbnails are all best checked LIVE on the Pi (preview has no backend; tsParticles canvas does not rasterize headless).
+Session state: ACTIVE (autonomous overnight). OPEN PRs in flight: #884 (agent image-gen, review fixes pushed, baking), #886 (rkllama store fix #844, off origin/dev, baking), #876 (dependabot SPA deps), draft #476. #885 merged dev->master via #887. #872 SWAPS the animated wallpaper renderer from the hand-rolled canvas NeuralWallpaper (component "neural") to tsParticles ParticlesWallpaper (component "particles"); theme-store registers id "neural-live" w/ component "particles" -- VERIFY the tsParticles look LIVE on Pi (headless can't rasterize it). #25 (tiled double-header) CLOSED: not a bug, was the 32px top-bar chrome. SECURITY: dependabot alert #5 (esbuild RCE < 0.28.1) is STALE -- desktop already pins esbuild 0.28.1 via overrides (lockfile + installed both 0.28.1); leave for dependabot to auto-close, no code change. #19 widget redesign HELD for Jay (taste + depends on the desktop/widget/dash mode-switcher brainstorm [[project_desktop_modes]]). FEDORA MODEL TESTS (Jay 2026-06-14 ~02:00): eval harness + runbook built PRIVATE (~/tinyagentos-private/specs/storybook-demo/storybook_toolcall_eval.py) -- scores local models on the storybook tool-call flow incl ID-threading; A2A sent to @taOSmd (msg 431) to coordinate Fedora box (it's mid E-009 sweep, do NOT interrupt); awaiting its ping + local-model list. tsParticles look + Safari dark<->light + live-wallpaper animation + desktop icons/thumbnails are all best checked LIVE on the Pi (preview has no backend; tsParticles canvas does not rasterize headless).
 
 WEBSITE: taos.my live. All 4 taos-website PRs merged (stats/changelog/nav/accessibility).
 
 CI: test suite parallelized via #839 (xdist -n auto). CodeRabbit may be out of credits -- do not merge on a fake rate-limit pass. Use @coderabbitai full review to retrigger; manual review OK for tiny already-reviewed PRs.
 
 OPEN PRs:
+- #886 fix(store): rkllama service install entry (#844) -- 3 files off origin/dev, baking; merge dev->master when green
+- #884 feat(agent): agent-controlled image generation -- bot-review fixes pushed (10f4732c), baking; merge to dev then deploy Pi
 - #876 chore(deps): dependabot SPA deps group bump (32 updates) -- review and merge when CI green
 
 
@@ -52,7 +66,7 @@ OPEN PRs:
 (#872/#871 MERGED to dev; #846 SUPERSEDED by #849 on dev; taos-website #5 merged to main.)
 
 Notable open issues (bugs first):
-- #844 rkllama store-UI install chain broken (wrong script + non-interactive false-success) -- unresolved
+- #844 rkllama store-UI install chain broken (wrong script + non-interactive false-success) -- FIX IN PR #886 (off origin/dev): adds scripts/install-rkllama.sh (idempotent headless wrapper -> delegates to install-rknpu.sh with TAOS_RKNPU_SETUP=1 so it can't take the false-success exit-0; short-circuits when rkllama already answers 7833/8080) + hardens RkllamaInstaller /api/tags verify (retry then fail, no more swallowed false success) + regression guard test. NOTE found while auditing: ~19 OTHER catalog manifests (stable-diffusion-cpp, wan2gp, dify, agents...) reference install scripts that also don't exist at repo root -- separate follow-up, NOT in #886.
 - #841 update check shows no updates when local branch diverged from origin -- unresolved
 - #825 taOS agent model swap breaks routing (stale per-agent key preferred over master key)
 - #840 chat: per-agent framework slash commands (Telegram-style) in DMs and via @agent /
diff --git a/docs/agent-manual/09-os-control.md b/docs/agent-manual/09-os-control.md
index 69d955f0..5ecbafb7 100644
--- a/docs/agent-manual/09-os-control.md
+++ b/docs/agent-manual/09-os-control.md
@@ -20,10 +20,16 @@ update the open Projects app in real time):
   Returns a `project_id` to use in the next calls.
 - **add_task** — add a to-do task to a project's board. Args: `project_id`, `title`.
 - **canvas_add_image** — place a generated image on a project's ideas board. Args:
-  `project_id`, `file_id` (from `generate_image`), optional `alt`.
-
-A typical flow: open the Projects app, create_project, add a few tasks, generate
-an image, then canvas_add_image it onto the board.
+  `project_id`, `image_ref` (the `image_ref` returned by `generate_image`), optional `alt`.
+- **describe_image_capabilities** — see the hardware tiers (this host + any cluster
+  workers, e.g. an NVIDIA box) and which image tools/models each has loaded. Use it
+  to pick the right model before `generate_image`: an NPU model for a fast draft, a
+  GPU model for a quality cover. The system loads/unloads and queues for you — you
+  just choose the model.
+
+A typical flow: open the Projects app, create_project, add a few tasks, call
+generate_image and keep its `image_ref`, then canvas_add_image(project_id, image_ref)
+to drop it on the board.
 
 These drive the user's own desktop in their session. Use them to make your work
 visible: open the relevant app so the user can watch, then carry out the task with
diff --git a/docs/agent-manual/10-image-prompting.md b/docs/agent-manual/10-image-prompting.md
new file mode 100644
index 00000000..5f7923dd
--- /dev/null
+++ b/docs/agent-manual/10-image-prompting.md
@@ -0,0 +1,89 @@
+<!-- How to write good prompts for the generate_image tool. -->
+
+# Generating good images
+
+When you call `generate_image`, the quality of the result depends mostly on the
+prompt. A vague prompt gives a generic image; a specific, well-ordered one gives
+what the user actually asked for. Spend a sentence getting it right rather than
+regenerating five times.
+
+## Structure a prompt
+
+Lead with the subject, then layer detail. A reliable order:
+
+1. **Subject** — what it is. "a small red sailboat", "a friendly cartoon fox".
+2. **Descriptors** — appearance, colour, material, mood. "weathered wooden hull,
+   bright red sail".
+3. **Setting / background** — where it is. "on a calm blue lake at sunrise".
+4. **Composition** — framing and viewpoint. "wide shot, centred, low angle".
+5. **Style** — the look. "watercolour children's book illustration", "flat vector
+   art", "photorealistic", "oil painting". Naming a concrete style matters more
+   than any other single word.
+6. **Lighting / quality** — "soft warm light, gentle shadows, highly detailed".
+
+Example: `a friendly cartoon fox reading a book under a tree, autumn leaves,
+warm soft light, watercolour children's book illustration, centred, highly detailed`.
+
+## Principles
+
+- **Be specific, not long.** Concrete nouns and adjectives beat a wall of vague
+  words. "golden retriever puppy on grass" beats "a nice cute lovely beautiful
+  amazing dog".
+- **Front-load what matters.** Earlier words carry more weight. Put the subject
+  and the must-have details first.
+- **One clear scene.** Don't pack several unrelated ideas into one prompt; the
+  model blends them into mush. Generate separate images instead.
+- **Name the style explicitly.** If the user wants a storybook look, say
+  "children's book illustration" or "storybook watercolour". If they want a logo,
+  say "flat minimalist vector logo".
+- **Match the user's intent.** Ask yourself what they pictured and describe that,
+  not a generic version of it. For a book cover, say "book cover, title space at
+  the top, central character".
+
+## Use negative_prompt to remove faults
+
+`negative_prompt` lists what to avoid (comma-separated). It is the fix for common
+defects:
+
+- General cleanup: `blurry, low quality, jpeg artifacts, watermark, text, signature`.
+- People/animals: add `deformed hands, extra fingers, extra limbs, mutated`.
+- Keep a clean style: add `cluttered, busy background` if you want simplicity.
+
+Reach for it when a first result has a recurring flaw rather than rewriting the
+whole prompt.
+
+## Parameters (what the tool exposes)
+
+- **size** — `256x256`, `384x384`, or `512x512`. Use 512x512 for the final
+  artwork; a smaller size is only worth it for a quick rough draft.
+- **steps** — 1 to 8 (default 4). These backends are tuned for few-step
+  generation; 4 is a good balance, 6 to 8 for a bit more detail. More is not
+  always better here.
+- **guidance_scale** — 1 to 20 (default 7.5). How strictly the image follows the
+  prompt. Lower (2 to 5) is looser and more artistic; higher (8 to 12) sticks to
+  the prompt harder. Raise it when the model ignores a detail you asked for;
+  lower it if results look over-baked or harsh.
+- **seed** — omit for a fresh random image. To make small edits to an image the
+  user liked, reuse its returned `seed` and tweak the prompt so the composition
+  stays close.
+- **model** — call `describe_image_capabilities` first and pick a model that fits
+  the task: a fast NPU draft model for iterating, a GPU model for the final cover.
+  Omit it to let the scheduler choose.
+
+## Picking a model by intent
+
+Different model families respond to prompts differently:
+
+- **FLUX-style models** follow natural-language sentences well and render text
+  reasonably. Write a full descriptive sentence.
+- **SDXL-style models** respond well to comma-separated descriptive phrases and
+  strong style keywords.
+- **Text in the image** (a title, a sign, a label) is unreliable on most models;
+  prefer a model noted for text if one is loaded, keep the text very short, and
+  put it in quotes, e.g. `a poster with the title "Brave Little Fox"`.
+
+## Iterate deliberately
+
+If the first image is close but not right, change one thing at a time: adjust the
+style word, add a missing detail, or add a negative term for the defect, keeping
+the same seed. Tell the user what you changed so they can steer.
diff --git a/docs/agent-manual/index.md b/docs/agent-manual/index.md
index b756db5f..7b44fa3a 100644
--- a/docs/agent-manual/index.md
+++ b/docs/agent-manual/index.md
@@ -18,3 +18,4 @@ Run `python3 scripts/build-agent-manual.py` to compile these into `docs/taos-age
 | `07-after-update.md` | Breakage-log-first troubleshooting for post-update reports |
 | `08-answer-templates.md` | Canned answer shapes for common questions |
 | `09-os-control.md` | Driving the desktop: open_app / arrange_windows tools |
+| `10-image-prompting.md` | Writing good prompts for the generate_image tool |
diff --git a/docs/taos-agent-manual.md b/docs/taos-agent-manual.md
index 1dbfebb2..5fba10db 100644
--- a/docs/taos-agent-manual.md
+++ b/docs/taos-agent-manual.md
@@ -162,10 +162,16 @@ update the open Projects app in real time):
   Returns a `project_id` to use in the next calls.
 - **add_task** — add a to-do task to a project's board. Args: `project_id`, `title`.
 - **canvas_add_image** — place a generated image on a project's ideas board. Args:
-  `project_id`, `file_id` (from `generate_image`), optional `alt`.
+  `project_id`, `image_ref` (the `image_ref` returned by `generate_image`), optional `alt`.
+- **describe_image_capabilities** — see the hardware tiers (this host + any cluster
+  workers, e.g. an NVIDIA box) and which image tools/models each has loaded. Use it
+  to pick the right model before `generate_image`: an NPU model for a fast draft, a
+  GPU model for a quality cover. The system loads/unloads and queues for you — you
+  just choose the model.
 
-A typical flow: open the Projects app, create_project, add a few tasks, generate
-an image, then canvas_add_image it onto the board.
+A typical flow: open the Projects app, create_project, add a few tasks, call
+generate_image and keep its `image_ref`, then canvas_add_image(project_id, image_ref)
+to drop it on the board.
 
 These drive the user's own desktop in their session. Use them to make your work
 visible: open the relevant app so the user can watch, then carry out the task with
@@ -173,3 +179,92 @@ that app's own tools and your other skills.
 
 Keep it purposeful: open what you need, don't rearrange the user's windows without
 reason, and tell the user what you're doing as you do it.
+---
+
+# Generating good images
+
+When you call `generate_image`, the quality of the result depends mostly on the
+prompt. A vague prompt gives a generic image; a specific, well-ordered one gives
+what the user actually asked for. Spend a sentence getting it right rather than
+regenerating five times.
+
+## Structure a prompt
+
+Lead with the subject, then layer detail. A reliable order:
+
+1. **Subject** — what it is. "a small red sailboat", "a friendly cartoon fox".
+2. **Descriptors** — appearance, colour, material, mood. "weathered wooden hull,
+   bright red sail".
+3. **Setting / background** — where it is. "on a calm blue lake at sunrise".
+4. **Composition** — framing and viewpoint. "wide shot, centred, low angle".
+5. **Style** — the look. "watercolour children's book illustration", "flat vector
+   art", "photorealistic", "oil painting". Naming a concrete style matters more
+   than any other single word.
+6. **Lighting / quality** — "soft warm light, gentle shadows, highly detailed".
+
+Example: `a friendly cartoon fox reading a book under a tree, autumn leaves,
+warm soft light, watercolour children's book illustration, centred, highly detailed`.
+
+## Principles
+
+- **Be specific, not long.** Concrete nouns and adjectives beat a wall of vague
+  words. "golden retriever puppy on grass" beats "a nice cute lovely beautiful
+  amazing dog".
+- **Front-load what matters.** Earlier words carry more weight. Put the subject
+  and the must-have details first.
+- **One clear scene.** Don't pack several unrelated ideas into one prompt; the
+  model blends them into mush. Generate separate images instead.
+- **Name the style explicitly.** If the user wants a storybook look, say
+  "children's book illustration" or "storybook watercolour". If they want a logo,
+  say "flat minimalist vector logo".
+- **Match the user's intent.** Ask yourself what they pictured and describe that,
+  not a generic version of it. For a book cover, say "book cover, title space at
+  the top, central character".
+
+## Use negative_prompt to remove faults
+
+`negative_prompt` lists what to avoid (comma-separated). It is the fix for common
+defects:
+
+- General cleanup: `blurry, low quality, jpeg artifacts, watermark, text, signature`.
+- People/animals: add `deformed hands, extra fingers, extra limbs, mutated`.
+- Keep a clean style: add `cluttered, busy background` if you want simplicity.
+
+Reach for it when a first result has a recurring flaw rather than rewriting the
+whole prompt.
+
+## Parameters (what the tool exposes)
+
+- **size** — `256x256`, `384x384`, or `512x512`. Use 512x512 for the final
+  artwork; a smaller size is only worth it for a quick rough draft.
+- **steps** — 1 to 8 (default 4). These backends are tuned for few-step
+  generation; 4 is a good balance, 6 to 8 for a bit more detail. More is not
+  always better here.
+- **guidance_scale** — 1 to 20 (default 7.5). How strictly the image follows the
+  prompt. Lower (2 to 5) is looser and more artistic; higher (8 to 12) sticks to
+  the prompt harder. Raise it when the model ignores a detail you asked for;
+  lower it if results look over-baked or harsh.
+- **seed** — omit for a fresh random image. To make small edits to an image the
+  user liked, reuse its returned `seed` and tweak the prompt so the composition
+  stays close.
+- **model** — call `describe_image_capabilities` first and pick a model that fits
+  the task: a fast NPU draft model for iterating, a GPU model for the final cover.
+  Omit it to let the scheduler choose.
+
+## Picking a model by intent
+
+Different model families respond to prompts differently:
+
+- **FLUX-style models** follow natural-language sentences well and render text
+  reasonably. Write a full descriptive sentence.
+- **SDXL-style models** respond well to comma-separated descriptive phrases and
+  strong style keywords.
+- **Text in the image** (a title, a sign, a label) is unreliable on most models;
+  prefer a model noted for text if one is loaded, keep the text very short, and
+  put it in quotes, e.g. `a poster with the title "Brave Little Fox"`.
+
+## Iterate deliberately
+
+If the first image is close but not right, change one thing at a time: adjust the
+style word, add a missing detail, or add a negative term for the defect, keeping
+the same seed. Tell the user what you changed so they can steer.
diff --git a/scripts/install-rkllama.sh b/scripts/install-rkllama.sh
new file mode 100755
index 00000000..0ab8b104
--- /dev/null
+++ b/scripts/install-rkllama.sh
@@ -0,0 +1,50 @@
+#!/usr/bin/env bash
+# Store entrypoint for the rkllama (RK3588 NPU LLM) service.
+#
+# This is the script the App Store's `rkllama` service manifest points at
+# (install.method: script). The store's ScriptInstaller invokes it
+# non-interactively as `bash install-rkllama.sh <project_dir>`, so this
+# wrapper must be headless, idempotent, and must never report success
+# without actually installing.
+#
+# It is a thin wrapper over the verified NPU installer (install-rknpu.sh):
+#   1. If rkllama already answers locally, exit 0 (idempotent no-op).
+#   2. Otherwise delegate to install-rknpu.sh in headless mode. We set
+#      TAOS_RKNPU_SETUP=1 explicitly so install-rknpu.sh does NOT take its
+#      "non-interactive shell, nothing to confirm -> exit 0" path, which
+#      would otherwise return success while installing nothing.
+#
+# install-rknpu.sh handles board detection (it dies on non-RK3588 hosts)
+# and uses sudo only for the privileged librknnrt + systemd steps; in a
+# store context without a TTY those sudo calls fail loudly (non-zero),
+# which ScriptInstaller correctly surfaces as an install failure.
+set -euo pipefail
+
+PROJECT_DIR="${1:-$(pwd)}"
+PORT="${TAOS_RKLLAMA_PORT:-7833}"
+LEGACY_PORT=8080
+
+# 1. Idempotent short-circuit: a live rkllama already satisfies the install.
+#    Require an rkllama/Ollama-shaped /api/tags body (a "models" key), not just
+#    any HTTP 200 -- another local service on these ports must not be mistaken
+#    for an installed rkllama. Mirrors _port_responds_with_rkllama() in the
+#    Python installer.
+for p in "$PORT" "$LEGACY_PORT"; do
+    body="$(curl -fsS --max-time 2 "http://localhost:${p}/api/tags" 2>/dev/null || true)"
+    if printf '%s' "$body" | grep -q '"models"'; then
+        echo "rkllama already running on port ${p} — nothing to install"
+        exit 0
+    fi
+done
+
+NPU_SCRIPT="${PROJECT_DIR}/scripts/install-rknpu.sh"
+if [[ ! -f "$NPU_SCRIPT" ]]; then
+    echo "install-rkllama.sh: expected NPU installer at ${NPU_SCRIPT}" >&2
+    exit 1
+fi
+
+# 2. Delegate to the verified installer in headless mode. TAOS_RKNPU_SETUP=1
+#    skips the interactive confirmation AND the false-success exit-0 path.
+echo "rkllama not detected — running NPU installer (${NPU_SCRIPT})"
+exec env TAOS_RKNPU_SETUP=1 TAOS_RKLLAMA_PORT="${PORT}" \
+    bash "$NPU_SCRIPT" --yes
diff --git a/tests/test_cluster_tools.py b/tests/test_cluster_tools.py
new file mode 100644
index 00000000..7270cbd2
--- /dev/null
+++ b/tests/test_cluster_tools.py
@@ -0,0 +1,125 @@
+import types
+
+import pytest
+
+from tinyagentos.tools.cluster_tools import execute_describe_image_capabilities
+
+
+class _Backend:
+    def __init__(self, name, type_, models, lifecycle="running"):
+        self.name = name
+        self.type = type_
+        self.models = models
+        self.lifecycle_state = lifecycle
+
+
+class _Catalog:
+    def __init__(self, backends):
+        self._b = backends
+
+    def backends_with_capability(self, cap):
+        return self._b if cap == "image-generation" else []
+
+
+class _Worker:
+    def __init__(self, name, hardware, backends, status="online"):
+        self.name = name
+        self.hardware = hardware
+        self.backends = backends
+        self.status = status
+
+
+class _Cluster:
+    def __init__(self, workers):
+        self._w = workers
+
+    def get_workers(self):
+        return self._w
+
+
+def _req(catalog=None, cluster=None, hardware=None):
+    state = types.SimpleNamespace(
+        backend_catalog=catalog, cluster_manager=cluster, hardware_profile=hardware
+    )
+    return types.SimpleNamespace(app=types.SimpleNamespace(state=state))
+
+
+@pytest.mark.asyncio
+async def test_local_image_backends_listed_with_tier_and_loaded():
+    catalog = _Catalog([_Backend("sd", "sd-cpp", [{"id": "sdxl"}], "running")])
+    res = await execute_describe_image_capabilities({}, _req(catalog=catalog, hardware={"gpu": "RTX 3060", "vram": "12GB"}))
+    local = res["tiers"][0]
+    assert local["node"] == "local"
+    assert local["hardware"]["gpu"] == "RTX 3060"
+    be = local["image_backends"][0]
+    assert be["type"] == "sd-cpp" and be["tier"] == "cpu/gpu" and be["loaded"] is True
+    assert be["models"] == ["sdxl"]
+
+
+@pytest.mark.asyncio
+async def test_cluster_workers_included():
+    worker = _Worker("nvidia-box", {"gpu": "3060", "vram": "12GB"},
+                     [{"name": "sd", "type": "sd-cpp", "capabilities": ["image-generation"], "models": ["sdxl"]}])
+    res = await execute_describe_image_capabilities({}, _req(cluster=_Cluster([worker])))
+    nodes = [t["node"] for t in res["tiers"]]
+    assert "nvidia-box" in nodes
+    w = next(t for t in res["tiers"] if t["node"] == "nvidia-box")
+    assert w["image_backends"][0]["type"] == "sd-cpp"
+
+
+@pytest.mark.asyncio
+async def test_offline_worker_skipped():
+    worker = _Worker("down", {}, [], status="offline")
+    res = await execute_describe_image_capabilities({}, _req(cluster=_Cluster([worker])))
+    assert all(t["node"] != "down" for t in res["tiers"])
+
+
+@pytest.mark.asyncio
+async def test_empty_state_is_safe():
+    res = await execute_describe_image_capabilities({}, _req())
+    assert res["tiers"][0]["node"] == "local"
+    assert res["tiers"][0]["image_backends"] == []
+
+
+class _BadBackend:
+    """A backend whose .models raises when iterated for ids."""
+    name = "bad"
+    type = "sd-cpp"
+    lifecycle_state = "running"
+
+    @property
+    def models(self):
+        raise RuntimeError("boom")
+
+
+@pytest.mark.asyncio
+async def test_one_malformed_backend_does_not_drop_the_rest():
+    catalog = _Catalog([
+        _BadBackend(),
+        _Backend("good", "sd-cpp", [{"id": "sdxl"}], "running"),
+    ])
+    res = await execute_describe_image_capabilities({}, _req(catalog=catalog))
+    names = [b["name"] for b in res["tiers"][0]["image_backends"]]
+    assert "good" in names  # the healthy backend survives the bad one
+
+
+@pytest.mark.asyncio
+async def test_object_hardware_profile_is_json_safe():
+    """A real hardware_profile is an object with nested objects; the summary must
+    stay JSON-serialisable (else the tool 500s when returned as JSON)."""
+    import json
+
+    class _Gpu:
+        def __repr__(self):
+            return "RTX 3060 12GB"
+
+    class _HW:
+        gpu = _Gpu()
+        npu = None
+        cpu = "x86"
+        vram = 12
+
+    res = await execute_describe_image_capabilities({}, _req(hardware=_HW()))
+    hw = res["tiers"][0]["hardware"]
+    assert hw["gpu"] == "RTX 3060 12GB" and hw["cpu"] == "x86" and hw["vram"] == 12
+    json.dumps(res)  # must not raise
diff --git a/tests/test_image_tool.py b/tests/test_image_tool.py
index 1c272065..8ea6c25e 100644
--- a/tests/test_image_tool.py
+++ b/tests/test_image_tool.py
@@ -161,6 +161,9 @@ async def test_image_generation_forwards_new_params():
     mock_resp.status_code = 200
     mock_resp.content = fake_png
     mock_resp.raise_for_status = MagicMock()
+    # Scheduler route returns JSON metadata (filename + path), which the tool
+    # now requires to honour the image_ref contract.
+    mock_resp.json = MagicMock(return_value={"filename": "gen.png", "path": "/api/images/files/gen.png"})
 
     captured_payload: dict = {}
     captured_url: dict = {}
@@ -205,6 +208,9 @@ async def test_image_generation_default_routes_via_scheduler():
     mock_resp.status_code = 200
     mock_resp.content = fake_png
     mock_resp.raise_for_status = MagicMock()
+    # Scheduler route returns JSON metadata (filename + path), which the tool
+    # now requires to honour the image_ref contract.
+    mock_resp.json = MagicMock(return_value={"filename": "gen.png", "path": "/api/images/files/gen.png"})
 
     captured: dict = {}
 
@@ -245,6 +251,9 @@ async def test_image_generation_blank_model_treated_as_omitted():
     mock_resp.status_code = 200
     mock_resp.content = fake_png
     mock_resp.raise_for_status = MagicMock()
+    # Scheduler route returns JSON metadata (filename + path), which the tool
+    # now requires to honour the image_ref contract.
+    mock_resp.json = MagicMock(return_value={"filename": "gen.png", "path": "/api/images/files/gen.png"})
 
     captured: dict = {}
 
diff --git a/tests/test_project_tools.py b/tests/test_project_tools.py
index e6397ab9..bb6af926 100644
--- a/tests/test_project_tools.py
+++ b/tests/test_project_tools.py
@@ -22,7 +22,7 @@ async def create_project(self, **kw):
     async def get_project(self, project_id):
         if project_id == "missing":
             return None
-        return {"id": project_id, "user_id": self._owner}
+        return {"id": project_id, "user_id": self._owner, "slug": "luna"}
 
 
 class _FakeTaskStore:
@@ -43,18 +43,30 @@ async def add_element(self, **kw):
         return {"id": "el_1"}
 
 
-def _req(user_id="user-1", owner="user-1", is_admin=False):
+def _req(user_id="user-1", owner="user-1", is_admin=False, base=None):
     state = types.SimpleNamespace(
         project_store=_FakeProjectStore(owner=owner),
         project_task_store=_FakeTaskStore(),
         project_canvas_store=_FakeCanvasStore(),
     )
+    if base is not None:
+        # config_path.parent is the data dir; projects live under projects_root.
+        state.config_path = str(base / "config.json")
+        state.projects_root = base / "projects"
     app = types.SimpleNamespace(state=state)
     return types.SimpleNamespace(
         app=app, state=types.SimpleNamespace(user_id=user_id, is_admin=is_admin)
     )
 
 
+def _seed_generated_image(base, name="img_cover.png"):
+    """Create a fake generated image where generate_image would have saved it."""
+    d = base / "workspace" / "images" / "generated"
+    d.mkdir(parents=True, exist_ok=True)
+    (d / name).write_bytes(b"\x89PNG\r\n\x1a\n fake")
+    return name
+
+
 def test_slugify():
     assert _slugify("Luna and the Lighthouse") == "luna-and-the-lighthouse"
     assert _slugify("  ") == "project"
@@ -92,15 +104,26 @@ async def test_add_task_requires_fields():
 
 
 @pytest.mark.asyncio
-async def test_canvas_add_image():
-    req = _req()
-    res = await execute_canvas_add_image({"project_id": "proj_1", "file_id": "img_cover", "alt": "cover"}, req)
+async def test_canvas_add_image(tmp_path):
+    ref = _seed_generated_image(tmp_path)
+    req = _req(base=tmp_path)
+    res = await execute_canvas_add_image({"project_id": "proj_1", "image_ref": ref, "alt": "cover"}, req)
     assert res["ok"] and res["element_id"] == "el_1"
+    # the generated image was copied into the project's canvas files
+    canvas_dir = tmp_path / "projects" / "luna" / "files" / "canvas"
+    copied = list(canvas_dir.glob("*.png"))
+    assert len(copied) == 1
     call = req.app.state.project_canvas_store.calls[0]
-    assert call["project_id"] == "proj_1"
     assert call["author_kind"] == "agent" and call["author_id"] == "user-1"
     el = call["element"]
-    assert el["kind"] == "image" and el["payload"]["file_id"] == "img_cover"
+    assert el["kind"] == "image" and el["payload"]["file_id"] == copied[0].name
+
+
+@pytest.mark.asyncio
+async def test_canvas_add_image_missing_file(tmp_path):
+    req = _req(base=tmp_path)
+    res = await execute_canvas_add_image({"project_id": "proj_1", "image_ref": "nope.png"}, req)
+    assert "error" in res and "not found" in res["error"]
 
 
 @pytest.mark.asyncio
@@ -115,7 +138,7 @@ async def test_add_task_denied_on_other_users_project():
 @pytest.mark.asyncio
 async def test_canvas_add_image_denied_on_other_users_project():
     req = _req(user_id="attacker", owner="victim")
-    res = await execute_canvas_add_image({"project_id": "proj_1", "file_id": "f"}, req)
+    res = await execute_canvas_add_image({"project_id": "proj_1", "image_ref": "f"}, req)
     assert res.get("error") == "not your project"
     assert req.app.state.project_canvas_store.calls == []
 
@@ -137,4 +160,4 @@ async def test_add_task_missing_project():
 async def test_tools_refuse_without_user():
     assert "error" in await execute_create_project({"name": "x"}, _req(user_id=None))
     assert "error" in await execute_add_task({"project_id": "p", "title": "t"}, _req(user_id=None))
-    assert "error" in await execute_canvas_add_image({"project_id": "p", "file_id": "f"}, _req(user_id=None))
+    assert "error" in await execute_canvas_add_image({"project_id": "p", "image_ref": "f"}, _req(user_id=None))
diff --git a/tests/test_rkllama_installer.py b/tests/test_rkllama_installer.py
index 2d743802..527b1c44 100644
--- a/tests/test_rkllama_installer.py
+++ b/tests/test_rkllama_installer.py
@@ -6,9 +6,12 @@
 """
 from __future__ import annotations
 
+import httpx
 import pytest
+import respx
 
 from tinyagentos.installers.rkllama_installer import (
+    RkllamaInstaller,
     parse_hf_resolve_url,
     resolve_rkllama_url,
     rkllama_is_running,
@@ -117,3 +120,128 @@ def test_remote_name_becomes_url_7833(self):
 
     def test_ip_address_7833(self):
         assert resolve_rkllama_url("192.168.1.10") == "http://192.168.1.10:7833"
+
+
+_VARIANT = {
+    "id": "qwen2.5-3b",
+    "download_url": (
+        "https://huggingface.co/c01zaut/Qwen2.5-3B-Instruct-rk3588-1.1.1/"
+        "resolve/main/Qwen2.5-3B-Instruct-rk3588-w8a8.rkllm"
+    ),
+}
+
+
+class TestInstallVerification:
+    """install() must only report success once /api/tags confirms the model.
+
+    A 200 from /api/pull alone is necessary but not sufficient -- a model the
+    agent can't load is worse than a clear error, so an unconfirmable pull
+    fails rather than returning a false success.
+    """
+
+    def _installer(self):
+        # Pass an explicit URL so __init__ doesn't probe the network.
+        return RkllamaInstaller(rkllama_url="http://localhost:7833")
+
+    @respx.mock
+    @pytest.mark.asyncio
+    async def test_success_when_tags_lists_model(self):
+        respx.post("http://localhost:7833/api/pull").mock(
+            return_value=httpx.Response(200, text='{"status":"success"}\n')
+        )
+        respx.get("http://localhost:7833/api/tags").mock(
+            return_value=httpx.Response(200, json={"models": [{"name": "rkllama-x"}]})
+        )
+        res = await self._installer().install("rkllama-x", {}, variant=_VARIANT)
+        assert res["success"] is True
+        assert res["model_name"] == "rkllama-x"
+
+    @respx.mock
+    @pytest.mark.asyncio
+    async def test_failure_when_model_absent_from_tags(self, monkeypatch):
+        monkeypatch.setattr(rkllama_installer.asyncio, "sleep", _no_sleep)
+        respx.post("http://localhost:7833/api/pull").mock(
+            return_value=httpx.Response(200, text='{"status":"success"}\n')
+        )
+        respx.get("http://localhost:7833/api/tags").mock(
+            return_value=httpx.Response(200, json={"models": [{"name": "other"}]})
+        )
+        res = await self._installer().install("rkllama-x", {}, variant=_VARIANT)
+        assert res["success"] is False
+        assert "could not confirm" in res["error"]
+
+    @respx.mock
+    @pytest.mark.asyncio
+    async def test_failure_when_tags_unreachable(self, monkeypatch):
+        # Previously this path returned a false success. Now an unreachable
+        # /api/tags (after retries) is a clean failure.
+        monkeypatch.setattr(rkllama_installer.asyncio, "sleep", _no_sleep)
+        respx.post("http://localhost:7833/api/pull").mock(
+            return_value=httpx.Response(200, text='{"status":"success"}\n')
+        )
+        respx.get("http://localhost:7833/api/tags").mock(
+            side_effect=httpx.ConnectError("refused")
+        )
+        res = await self._installer().install("rkllama-x", {}, variant=_VARIANT)
+        assert res["success"] is False
+        assert "could not confirm" in res["error"]
+
+    @respx.mock
+    @pytest.mark.asyncio
+    async def test_failure_when_tags_returns_non_json(self, monkeypatch):
+        # A 200 with a non-JSON body must be treated as a failed check, not
+        # raise an uncaught JSONDecodeError out of install().
+        monkeypatch.setattr(rkllama_installer.asyncio, "sleep", _no_sleep)
+        respx.post("http://localhost:7833/api/pull").mock(
+            return_value=httpx.Response(200, text='{"status":"success"}\n')
+        )
+        respx.get("http://localhost:7833/api/tags").mock(
+            return_value=httpx.Response(200, text="<html>nginx</html>")
+        )
+        res = await self._installer().install("rkllama-x", {}, variant=_VARIANT)
+        assert res["success"] is False
+        assert "could not confirm" in res["error"]
+
+    @respx.mock
+    @pytest.mark.asyncio
+    async def test_retries_then_succeeds_when_model_appears_late(self, monkeypatch):
+        # Registration can lag the pull's 200; the verify loop retries on an
+        # absent model and succeeds once it appears.
+        monkeypatch.setattr(rkllama_installer.asyncio, "sleep", _no_sleep)
+        respx.post("http://localhost:7833/api/pull").mock(
+            return_value=httpx.Response(200, text='{"status":"success"}\n')
+        )
+        respx.get("http://localhost:7833/api/tags").mock(
+            side_effect=[
+                httpx.Response(200, json={"models": []}),
+                httpx.Response(200, json={"models": [{"name": "rkllama-x"}]}),
+            ]
+        )
+        res = await self._installer().install("rkllama-x", {}, variant=_VARIANT)
+        assert res["success"] is True
+
+
+async def _no_sleep(*_a, **_k):
+    return None
+
+
+class TestRkllamaServiceManifest:
+    """The rkllama service manifest (install.method: script) must point at a
+    script that actually exists. ScriptInstaller resolves install.script
+    relative to the repo root (its cwd), so a missing file means the store
+    install fails with 'script not found'. This is the #844 regression guard.
+    """
+
+    def test_install_script_exists(self):
+        import pathlib
+        import yaml
+
+        repo = pathlib.Path(__file__).resolve().parent.parent
+        manifest = yaml.safe_load(
+            (repo / "app-catalog" / "services" / "rkllama" / "manifest.yaml").read_text()
+        )
+        install = manifest.get("install") or {}
+        assert install.get("method") == "script"
+        script = install.get("script")
+        assert script, "rkllama manifest declares no install.script"
+        assert (repo / script).is_file(), f"rkllama install script missing: {script}"
diff --git a/tinyagentos/installers/rkllama_installer.py b/tinyagentos/installers/rkllama_installer.py
index 73aea082..5c0ee486 100644
--- a/tinyagentos/installers/rkllama_installer.py
+++ b/tinyagentos/installers/rkllama_installer.py
@@ -14,6 +14,7 @@
 """
 from __future__ import annotations
 
+import asyncio
 import logging
 import re
 import socket
@@ -184,26 +185,52 @@ async def install(
             }
 
         # Verify the model now appears in /api/tags so we know rkllama
-        # successfully registered it.
-        try:
-            async with httpx.AsyncClient(timeout=10) as client:
-                tags = await client.get(f"{self.rkllama_url}/api/tags")
-                tags.raise_for_status()
-                names = {m.get("name") for m in tags.json().get("models", [])}
-                if app_id not in names:
-                    return {
-                        "success": False,
-                        "error": (
-                            f"rkllama pull returned 200 but {app_id!r} is not in "
-                            f"/api/tags. Known models: {sorted(names)[:5]}"
-                        ),
-                    }
-        except httpx.HTTPError as exc:
+        # successfully registered it. The pull returning 200 is necessary but
+        # not sufficient: only /api/tags confirms the weight is loadable. We
+        # retry a few times to tolerate a transient blip AND registration lag
+        # (the model can take a moment to appear after pull returns), but if the
+        # check never confirms the model we report failure rather than a false
+        # success -- a model the agent can't actually load is worse than a clear
+        # error. Malformed/unexpected /api/tags bodies are treated as a failed
+        # check, not allowed to raise out of the installer.
+        last_problem = "verification did not run"
+        verified = False
+        for attempt in range(3):
+            try:
+                async with httpx.AsyncClient(timeout=10) as client:
+                    tags = await client.get(f"{self.rkllama_url}/api/tags")
+                    tags.raise_for_status()
+                    payload = tags.json()
+            except (httpx.HTTPError, ValueError) as exc:
+                # ValueError covers a 200 with a non-JSON body (json.JSONDecodeError).
+                last_problem = f"/api/tags unreachable or not JSON: {exc}"
+            else:
+                models = payload.get("models") if isinstance(payload, dict) else None
+                names = (
+                    {m.get("name") for m in models if isinstance(m, dict)}
+                    if isinstance(models, list)
+                    else set()
+                )
+                if app_id in names:
+                    verified = True
+                    break
+                known = sorted(n for n in names if n)[:5]
+                last_problem = f"{app_id!r} not yet in /api/tags (known: {known})"
             logger.warning(
-                "rkllama install: /api/tags verification failed: %s", exc
+                "rkllama install: /api/tags verification attempt %d/3: %s",
+                attempt + 1, last_problem,
             )
-            # Non-fatal -- pull succeeded; verification problem is likely transient.
+            if attempt < 2:
+                await asyncio.sleep(1.0 * (attempt + 1))
 
+        if not verified:
+            return {
+                "success": False,
+                "error": (
+                    f"rkllama pull returned 200 but could not confirm {app_id!r} "
+                    f"installed after 3 checks: {last_problem}. Retry the install."
+                ),
+            }
         return {"success": True, "app_id": app_id, "model_name": app_id}
 
     async def uninstall(self, app_id: str) -> dict:
diff --git a/tinyagentos/routes/skill_exec.py b/tinyagentos/routes/skill_exec.py
index 4e40c9c8..bb46909e 100644
--- a/tinyagentos/routes/skill_exec.py
+++ b/tinyagentos/routes/skill_exec.py
@@ -225,6 +225,16 @@ async def _skill_canvas_add_image(args: dict, request: Request) -> dict:
         return {"error": str(exc)}
 
 
+async def _skill_describe_image_capabilities(args: dict, request: Request) -> dict:
+    """Describe the cluster's image-gen tiers + tools (agent OS control)."""
+    try:
+        from tinyagentos.tools.cluster_tools import execute_describe_image_capabilities
+
+        return await execute_describe_image_capabilities(args, request)
+    except Exception as exc:
+        return {"error": str(exc)}
+
+
 SKILL_IMPLEMENTATIONS = {
     "memory_search": _skill_memory_search,
     "file_read": _skill_file_read,
@@ -239,6 +249,7 @@ async def _skill_canvas_add_image(args: dict, request: Request) -> dict:
     "create_project": _skill_create_project,
     "add_task": _skill_add_task,
     "canvas_add_image": _skill_canvas_add_image,
+    "describe_image_capabilities": _skill_describe_image_capabilities,
 }
 
 
diff --git a/tinyagentos/skills.py b/tinyagentos/skills.py
index d80fd4fb..9e479052 100644
--- a/tinyagentos/skills.py
+++ b/tinyagentos/skills.py
@@ -348,15 +348,15 @@ async def _seed_defaults(self):
                 "description": "Place a generated image on a project's canvas",
                 "tool_schema": {
                     "name": "canvas_add_image",
-                    "description": "Place a generated image (by file_id from generate_image) on a project's ideas board.",
+                    "description": "Place a generated image on a project's ideas board.",
                     "input_schema": {
                         "type": "object",
                         "properties": {
                             "project_id": {"type": "string", "description": "Id from create_project."},
-                            "file_id": {"type": "string", "description": "Image file id from generate_image."},
+                            "image_ref": {"type": "string", "description": "The image_ref returned by generate_image."},
                             "alt": {"type": "string", "description": "Alt text."},
                         },
-                        "required": ["project_id", "file_id"],
+                        "required": ["project_id", "image_ref"],
                     },
                 },
                 "frameworks": {
@@ -367,6 +367,24 @@ async def _seed_defaults(self):
                 "install_method": "builtin",
                 "install_target": "tinyagentos.tools.project_tools",
             },
+            {
+                "id": "describe_image_capabilities",
+                "name": "Describe Image Capabilities",
+                "category": "media",
+                "description": "See the cluster's image-generation tiers and tools (NPU/GPU/CPU)",
+                "tool_schema": {
+                    "name": "describe_image_capabilities",
+                    "description": "List the hardware tiers (this host + cluster workers) and which image-generation tools/models each has loaded, so you can pick the best one before generate_image.",
+                    "input_schema": {"type": "object", "properties": {}},
+                },
+                "frameworks": {
+                    "smolagents": "adapter", "openclaw": "adapter", "pocketflow": "adapter",
+                    "langroid": "adapter", "hermes": "adapter", "agent-zero": "adapter",
+                    "openai-agents-sdk": "adapter", "generic": "adapter",
+                },
+                "install_method": "builtin",
+                "install_target": "tinyagentos.tools.cluster_tools",
+            },
         ]
 
         for skill in defaults:
@@ -382,6 +400,24 @@ async def _seed_defaults(self):
                     time.time(),
                 ),
             )
+            # INSERT OR IGNORE leaves an existing row untouched, so an install
+            # seeded by an earlier release keeps its stale tool_schema (e.g. the
+            # pre-image_ref canvas_add_image contract). Refresh the code-owned
+            # fields for builtin skills so existing installs converge on the
+            # current definition. Scoped to install_method='builtin' so a user's
+            # installed/customised skills are never overwritten.
+            await self._db.execute(
+                """UPDATE skills
+                   SET name = ?, category = ?, description = ?, tool_schema = ?,
+                       frameworks = ?, requires_services = ?, install_target = ?
+                   WHERE id = ? AND install_method = 'builtin'""",
+                (
+                    skill["name"], skill["category"], skill["description"],
+                    json.dumps(skill["tool_schema"]), json.dumps(skill["frameworks"]),
+                    json.dumps(skill.get("requires_services", [])),
+                    skill["install_target"], skill["id"],
+                ),
+            )
         await self._db.commit()
 
     async def list_skills(self, category: str | None = None) -> list[dict]:
diff --git a/tinyagentos/tools/cluster_tools.py b/tinyagentos/tools/cluster_tools.py
new file mode 100644
index 00000000..2cf18292
--- /dev/null
+++ b/tinyagentos/tools/cluster_tools.py
@@ -0,0 +1,129 @@
+"""Agent tool: describe the cluster's image-generation capabilities.
+
+Gives the agent read-only awareness of what hardware tiers exist (this host's
+NPU/GPU/CPU plus any cluster workers like an NVIDIA box) and which image tools
+live on each tier, including what's loaded right now. The agent uses this to
+pick the best tool by intent — a fast NPU draft vs the good GPU model for a
+cover — and to tell the user what it's doing.
+
+The agent does NOT manage queues/load/unload; the scheduler + lifecycle manager
+do that. This tool is the menu, not the controls.
+"""
+from __future__ import annotations
+
+from fastapi import Request
+
+# Map a backend type to the hardware tier it runs on, for the agent's benefit.
+_TIER = {
+    "rkllama": "npu",
+    "rk-llama-cpp": "npu",
+    "ezrknpu": "npu",
+    "sd-cpp": "cpu/gpu",
+    "comfyui": "gpu",
+    "ollama": "gpu/cpu",
+}
+
+
+def _json_safe(v):
+    """Coerce a value to something JSON-serialisable (the tool result is
+    returned as JSON, so nested dataclasses/objects would 500)."""
+    if v is None or isinstance(v, (str, int, float, bool)):
+        return v
+    if isinstance(v, dict):
+        return {str(k): _json_safe(x) for k, x in v.items()}
+    if isinstance(v, (list, tuple)):
+        return [_json_safe(x) for x in v]
+    return str(v)
+
+
+def _hw_summary(hw) -> dict:
+    """Best-effort, JSON-safe summary of a hardware profile (object or dict)."""
+    if hw is None:
+        return {}
+    # HardwareProfile stores total RAM as `ram_mb`; include both that and the
+    # generic `ram`/`vram` keys so dict- and dataclass-shaped profiles both
+    # surface memory, the main tier-selection signal.
+    keys = ("cpu", "gpu", "npu", "vram", "ram", "ram_mb", "tier", "platform")
+    if isinstance(hw, dict):
+        return {k: _json_safe(hw.get(k)) for k in keys if hw.get(k) is not None}
+    return {k: _json_safe(getattr(hw, k)) for k in keys if getattr(hw, k, None) is not None}
+
+
+def _model_id(m):
+    """Best-effort model identifier from a dict-or-str model entry."""
+    if isinstance(m, dict):
+        return m.get("id") or m.get("name")
+    return m
+
+
+def _image_backends_from_catalog(catalog) -> list[dict]:
+    out = []
+    if not catalog:
+        return out
+    try:
+        backends = catalog.backends_with_capability("image-generation")
+    except Exception:
+        return out
+    # Guard each backend independently: one malformed entry must not drop the
+    # whole capability list (the agent relies on this menu to pick a tier).
+    for be in backends or []:
+        try:
+            out.append({
+                "name": be.name,
+                "type": be.type,
+                "tier": _TIER.get(be.type, "unknown"),
+                "loaded": getattr(be, "lifecycle_state", "running") == "running",
+                "models": [_model_id(m) for m in (be.models or [])][:10],
+            })
+        except Exception:
+            continue
+    return out
+
+
+def _image_backends_from_worker(worker) -> list[dict]:
+    out = []
+    for b in (getattr(worker, "backends", None) or []):
+        caps = b.get("capabilities") or []
+        if "image-generation" in caps or b.get("type") in ("sd-cpp", "rkllama", "comfyui"):
+            ls = b.get("lifecycle_state")
+            out.append({
+                "name": b.get("name"),
+                "type": b.get("type"),
+                "tier": _TIER.get(b.get("type"), "unknown"),
+                # mirror the 'loaded' field local backends report; None = unknown
+                "loaded": b.get("loaded") if "loaded" in b else (ls == "running" if ls else None),
+                "models": [_model_id(m) for m in (b.get("models") or [])][:10],
+            })
+    return out
+
+
+async def execute_describe_image_capabilities(args: dict, request: Request) -> dict:
+    state = request.app.state
+    tiers = [{
+        "node": "local",
+        "hardware": _hw_summary(getattr(state, "hardware_profile", None)),
+        "image_backends": _image_backends_from_catalog(getattr(state, "backend_catalog", None)),
+    }]
+    cluster = getattr(state, "cluster_manager", None)
+    if cluster is not None:
+        try:
+            workers = cluster.get_workers()
+        except Exception:
+            workers = []
+        # Guard each worker independently so one bad worker entry doesn't drop
+        # the rest of the cluster from the menu.
+        for w in workers or []:
+            try:
+                if getattr(w, "status", "online") != "online":
+                    continue
+                tiers.append({
+                    "node": w.name,
+                    "hardware": _hw_summary(getattr(w, "hardware", None)),
+                    "image_backends": _image_backends_from_worker(w),
+                })
+            except Exception:
+                continue
+    return {
+        "tiers": tiers,
+        "hint": "Pick a model on the tier that fits the task (npu = fast draft, gpu = best quality), then call generate_image with that model. The system loads/unloads and queues for you.",
+    }
diff --git a/tinyagentos/tools/image_tool.py b/tinyagentos/tools/image_tool.py
index fe0db58e..2264bba3 100644
--- a/tinyagentos/tools/image_tool.py
+++ b/tinyagentos/tools/image_tool.py
@@ -10,7 +10,15 @@
         "properties": {
             "prompt": {
                 "type": "string",
-                "description": "Text description of the image to generate",
+                "description": (
+                    "Text description of the image. Lead with the subject, then "
+                    "layer descriptors, setting, composition, and an explicit style "
+                    "(e.g. 'children's book watercolour', 'flat vector', "
+                    "'photorealistic'). Be specific, not long; front-load what "
+                    "matters; keep to one clear scene. Example: 'a friendly cartoon "
+                    "fox reading under a tree, autumn leaves, warm soft light, "
+                    "watercolour children's book illustration, centred'."
+                ),
             },
             "size": {
                 "type": "string",
@@ -172,10 +180,16 @@ async def execute_image_generation(
     fallback omits the model field so the local backend uses whatever
     checkpoint it has loaded rather than a pinned model name.
 
-    Returns dict with 'success', 'image_b64' (base64 PNG), and 'error'
-    if failed.
+    Returns dict with 'success', 'image_ref' (the saved filename, usable by
+    canvas_add_image), 'url' (web path to the PNG), and 'error' if failed.
+
+    Note: the connect-failure fallback (used only when the controller itself
+    is unreachable, e.g. an LXC agent that can't see localhost:6969) returns
+    'image_b64' instead of 'image_ref' -- it has no controller workspace to
+    save into. In-process tool calls always take the scheduler path above and
+    get an 'image_ref', so canvas_add_image works; a caller relying on the
+    fallback must handle the bytes itself.
     """
-    import base64
     import httpx
     import random
 
@@ -203,14 +217,18 @@ async def execute_image_generation(
         async with httpx.AsyncClient(timeout=120) as client:
             resp = await client.post(target_url, json=payload)
             resp.raise_for_status()
-            # /api/images/generate returns raw PNG bytes
-            image_bytes = resp.content
+            # The scheduler route saves the PNG and returns JSON metadata.
+            data = resp.json()
+            filename = data.get("filename")
+            if not isinstance(filename, str) or not filename:
+                return {"success": False, "error": f"image backend returned no filename: {str(data)[:200]}"}
             return {
                 "success": True,
-                "image_b64": base64.b64encode(image_bytes).decode(),
-                "seed": seed,
-                "model": model or "",
-                "size": size,
+                "image_ref": filename,
+                "url": data.get("path", ""),
+                "seed": data.get("seed", seed),
+                "model": data.get("model", model or ""),
+                "size": data.get("size", size),
             }
     except httpx.ConnectError:
         controller_unreachable = True  # fall through to direct path below
diff --git a/tinyagentos/tools/project_tools.py b/tinyagentos/tools/project_tools.py
index e1697e01..be477605 100644
--- a/tinyagentos/tools/project_tools.py
+++ b/tinyagentos/tools/project_tools.py
@@ -10,6 +10,8 @@
 from __future__ import annotations
 
 import re
+from pathlib import Path
+from uuid import uuid4
 
 from fastapi import Request
 
@@ -18,6 +20,14 @@ def _user_id(request: Request) -> str | None:
     return getattr(request.state, "user_id", None) or None
 
 
+def _data_dir(request: Request) -> Path:
+    """Workspace data dir, resolved the same way images.py does."""
+    config_path = getattr(request.app.state, "config_path", None)
+    if config_path is not None:
+        return Path(config_path).parent
+    return Path(__file__).parent.parent.parent / "data"
+
+
 async def _owned_project(request: Request, project_id: str, user_id: str):
     """Return (project, None) if the caller owns project_id (or is admin), else
     (None, error_dict). Prevents writing tasks/images into another user's project."""
@@ -71,9 +81,10 @@ async def execute_add_task(args: dict, request: Request) -> dict:
 
 async def execute_canvas_add_image(args: dict, request: Request) -> dict:
     project_id = (args or {}).get("project_id")
-    file_id = (args or {}).get("file_id")
-    if not isinstance(project_id, str) or not project_id or not isinstance(file_id, str) or not file_id:
-        return {"error": "canvas_add_image requires 'project_id' and 'file_id' strings"}
+    # `image_ref` is the filename returned by generate_image (a workspace file).
+    image_ref = (args or {}).get("image_ref")
+    if not isinstance(project_id, str) or not project_id or not isinstance(image_ref, str) or not image_ref:
+        return {"error": "canvas_add_image requires 'project_id' and 'image_ref' strings"}
     try:
         x = float((args or {}).get("x", 80))
         y = float((args or {}).get("y", 80))
@@ -82,9 +93,31 @@ async def execute_canvas_add_image(args: dict, request: Request) -> dict:
     user_id = _user_id(request)
     if not user_id:
         return {"error": "no authenticated user"}
-    _, err = await _owned_project(request, project_id, user_id)
+    project, err = await _owned_project(request, project_id, user_id)
     if err:
         return err
+
+    # Copy the generated image (saved by generate_image under the workspace) into
+    # the project's canvas files, where the canvas renders it from
+    # /api/projects/{slug}/files/canvas/{file_id}. `.name` strips any path part.
+    src = _data_dir(request) / "workspace" / "images" / "generated" / Path(image_ref).name
+    if not src.is_file():
+        return {"error": f"image not found: {image_ref}"}
+    # The slug is the on-disk directory AND the key the canvas render route
+    # (/api/projects/{slug}/files/canvas/{file_id}) reads back, so it must stay
+    # the project's real slug. New projects slugify safely, but reject a legacy
+    # row or fallback id that carries separators rather than escape projects_root.
+    slug = project.get("slug") or project_id
+    if slug != _slugify(slug):
+        return {"error": f"unsafe project slug: {slug!r}"}
+    projects_root = Path(request.app.state.projects_root).resolve()
+    canvas_dir = (projects_root / slug / "files" / "canvas").resolve()
+    if not canvas_dir.is_relative_to(projects_root):
+        return {"error": "resolved canvas path escapes projects_root"}
+    canvas_dir.mkdir(parents=True, exist_ok=True)
+    file_id = f"{uuid4().hex}{src.suffix or '.png'}"
+    (canvas_dir / file_id).write_bytes(src.read_bytes())
+
     store = request.app.state.project_canvas_store
     el = await store.add_element(
         project_id=project_id,
@@ -99,4 +132,4 @@ async def execute_canvas_add_image(args: dict, request: Request) -> dict:
         author_kind="agent",
         author_id=user_id,
     )
-    return {"ok": True, "element_id": el["id"]}
+    return {"ok": True, "element_id": el["id"], "file_id": file_id}