Conversation
The demo's image step was broken two ways: - generate_image b64-encoded the route's JSON response as if it were PNG bytes, so 'image_b64' was garbage. It now parses the JSON and returns 'image_ref' (the saved filename) + 'url' (the served path). No consumer used image_b64. - canvas_add_image took a 'file_id' that had to already be a project canvas file, but generate_image saves to the workspace -> the image never rendered. It now takes 'image_ref', copies the workspace PNG into the project's canvas files (projects_root/<slug>/files/canvas/<uuid>.png, where ImageShape renders it), then creates the element. Still ownership-checked; '.name' strips path parts. So the agent flow works end to end: generate_image -> canvas_add_image(image_ref) -> the cover art appears live on the ideas board. Manual + skill schema updated; project_tools/image_tool tests green (19).
Read-only tool so the agent knows what hardware tiers exist (this host's NPU/GPU/CPU + cluster workers like an NVIDIA box) and which image tools/models each has loaded, before calling generate_image. Maps backend type -> tier (rkllama=npu, sd-cpp=cpu/gpu, comfyui=gpu) and reports loaded state from the backend catalog + cluster manager. Defensive when cluster/catalog absent. The agent picks the model by intent (npu draft vs gpu cover) and generate_image routes there; the scheduler + lifecycle manager own load/unload/queue. Seeded as a skill, wired into skill_exec, manual updated; 4 tool tests + 26 related green.
- describe_image_capabilities returned dataclass fields from a real hardware_profile -> JSON-serialise would 500. Coerce hardware values to JSON primitives (_json_safe) + test with an object profile (Gitar). - generate_image now fails instead of false-succeeding when the scheduler response has no filename (CodeRabbit/Kilo). - canvas_add_image: drop the misleading file_id fallback (a real canvas file_id isn't in the workspace); require image_ref (Kilo). - worker image_backends now report a 'loaded' field for parity with local backends (Gitar). 24 tests green.
- skills.py: refresh code-owned (builtin) skill rows after INSERT OR IGNORE so installs seeded by an earlier release (e.g. the Pi, with the pre-image_ref canvas_add_image schema) converge on the current tool_schema. Scoped to install_method='builtin' so user-installed skills are never overwritten. - image_tool.py: type-validate the scheduler 'filename' (not just truthiness) before claiming success; document that the controller-unreachable fallback returns image_b64 (no controller workspace to save into), so the image_ref contract only applies to the in-process scheduler path. - project_tools.py: reject a project slug that isn't its own slugify and verify the resolved canvas dir stays within projects_root (defense-in-depth against a legacy/odd slug escaping the tree). - cluster_tools.py: include HardwareProfile's real 'ram_mb' field in the hardware summary so total RAM (a tier signal) isn't dropped. - tests: scheduler mocks now return real filename/path metadata. Refs #884
The rkllama service manifest declared install.method: script pointing at scripts/install-rkllama.sh, but that file never existed -- only scripts/install-rknpu.sh did. ScriptInstaller resolves install.script relative to the repo root, so clicking Install in the store failed with 'script not found'. rkllama is the NPU LLM backend our RK3588 target audience needs from the first boot, so this broke their primary path. - Add scripts/install-rkllama.sh: an idempotent, headless wrapper the store's ScriptInstaller can run non-interactively. It short-circuits to exit 0 when a live rkllama already answers (7833 or legacy 8080), and otherwise delegates to the verified install-rknpu.sh with TAOS_RKNPU_SETUP=1 set explicitly. That env var is required: without it install-rknpu.sh takes its 'non-interactive shell, nothing to confirm -> exit 0' path and would report success while installing nothing. Keeping the heavy NPU runtime install behind a store-triggered script (cloned at install time, not bundled) keeps the arms-length, source-available posture intact. - Harden RkllamaInstaller's /api/tags verification. A 200 from /api/pull is necessary but not sufficient; only /api/tags confirms the weight is loadable. Previously an unreachable /api/tags was swallowed and the install returned a false success. Now it retries a few times and, if the check never succeeds, returns success: False with an actionable error -- a model the agent can't load is worse than a clear failure. - Tests: install() verification (confirmed / absent / unreachable) plus a regression guard asserting the rkllama manifest's install.script exists.
- RkllamaInstaller verify loop: catch ValueError so a 200 with a non-JSON /api/tags body fails the check instead of raising JSONDecodeError out of install(); validate the response shape (dict with a list of dict models); retry on an absent model too, not just on HTTPError, so registration lag after the pull's 200 no longer reports a false failure. - install-rkllama.sh: the idempotent short-circuit now requires an rkllama/Ollama-shaped /api/tags body (a "models" key) rather than any HTTP 200, so an unrelated service on 7833/8080 can't be mistaken for an installed rkllama. Mirrors _port_responds_with_rkllama(). - tests: non-JSON /api/tags and late-appearing model. Refs #844 #886
One malformed backend or worker entry no longer drops the whole capability menu (CodeRabbit). Guard each backend/worker independently instead of wrapping the entire loop in a single try/except, and extract a shared _model_id helper for dict-or-str model entries. Refs #884
The agent drives generate_image; result quality is mostly the prompt. Add agent-manual/10-image-prompting.md (compiled into taos-agent-manual.md): prompt structure (subject -> descriptors -> setting -> composition -> style -> lighting), be-specific/front-load/one-scene principles, negative_prompt for common defects, the tool's actual parameters (size/steps/guidance_scale/ seed/model with sensible ranges), model-family differences (FLUX sentences vs SDXL phrases, text-in-image caveat), and deliberate iteration. Also enrich the generate_image 'prompt' field description with a compact inline hint so guidance is present at call time. Refs #884
- add a SESSION PAUSED 'NEW SESSION START HERE' block: branch tips, the two in-flight PRs (#884 image-gen tip ddeb1be, #886 rkllama #844 tip d6960af, both baking on tests only), the remaining minor non-blocking bot nits with my assessment, the merge gate, and re-arm reminders - record the 3060 SD-backend GO from @taOSmd (task #34 unblocked; do after #884)
…r tier awareness) feat(agent): agent-controlled image generation (canvas wiring + cluster tier awareness)
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
👋 Thanks for the PR! This one targets See CONTRIBUTING.md for the branch model. |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (16)
📝 WalkthroughWalkthroughThis PR introduces an ChangesImage Generation Workflow (image_ref + describe_image_capabilities)
rkllama Installer Hardening
Sequence Diagram(s)sequenceDiagram
rect rgba(173, 216, 230, 0.5)
Note over Agent,canvas_add_image: Image generation and placement flow
Agent->>execute_describe_image_capabilities: call (no args)
execute_describe_image_capabilities->>backend_catalog: backends_with_capability("image-generation")
execute_describe_image_capabilities->>cluster_manager: get_workers()
execute_describe_image_capabilities-->>Agent: {tiers: [{node, hw, image_backends}…], hint}
end
rect rgba(144, 238, 144, 0.5)
Note over Agent,canvas_add_image: Agent selects model, generates image
Agent->>execute_image_generation: call(prompt, model, size, …)
execute_image_generation->>SchedulerController: POST /api/images/generate
SchedulerController-->>execute_image_generation: JSON {filename, path}
execute_image_generation-->>Agent: {image_ref, url, seed, model, size}
end
rect rgba(255, 222, 173, 0.5)
Note over Agent,canvas_add_image: Agent places image on project board
Agent->>canvas_add_image: call(project_id, image_ref)
canvas_add_image->>project_tools: _data_dir → locate workspace image
canvas_add_image->>project_tools: validate slug + path, copy bytes
canvas_add_image-->>Agent: {element_id, file_id}
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| def _image_backends_from_worker(worker) -> list[dict]: | ||
| out = [] | ||
| for b in (getattr(worker, "backends", None) or []): | ||
| caps = b.get("capabilities") or [] |
There was a problem hiding this comment.
WARNING: Malformed worker backend entries can still drop the entire worker
The local catalog path guards each backend independently, but this worker path only guards at worker level. If worker.backends contains a non-dict entry, b.get(...) raises and the outer try skips the whole worker, so one bad backend can hide otherwise usable worker backends.
Reply with @kilocode-bot fix it to have Kilo Code address this issue.
Code Review SummaryStatus: 3 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Other Observations (not in diff)Issues found in unchanged code that cannot receive inline comments:
Files Reviewed (16 files)
Reviewed by nex-n2-pro:free · 2,438,374 tokens |
…x in flight, 3060 backend progress
Promotes the two needed fixes from dev to master so all users get them.
Both green on dev (CI + Gitar + Kilo + CodeRabbit).
Summary by CodeRabbit
New Features
describe_image_capabilitiestool to inspect available image generation models and hardware tiers across local and cluster resources.Documentation
Bug Fixes