Skip to content

refactor(prompt): ♻️ Overhaul prompt injection with modular layered architecture#222

Merged
jorben merged 31 commits into
masterfrom
refact/prompt-injection-refactor
Jun 6, 2026
Merged

refactor(prompt): ♻️ Overhaul prompt injection with modular layered architecture#222
jorben merged 31 commits into
masterfrom
refact/prompt-injection-refactor

Conversation

@jorben

@jorben jorben commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Refactor the monolithic prompt injection system into a modular, layered architecture with 20+ new modules, template-based content management, and a composable section registry. This establishes the foundation for future prompt customization, safer emergency fallbacks, and cleaner subagent contracts.

Changes

Core Architecture (new modules)

  • composer: Orchestrates prompt assembly from registered sections
  • registry: Manages section provider registration and resolution with priority ordering
  • layer: Supports layered prompts (base, override, compaction, emergency fallback) with section anchoring, ordering, and audit
  • templates + template_sources: Template loading, rendering with Handlebars-style templating, front-matter parsing, and per-role template source abstraction
  • renderer: Markdown/XML renderers for prompt surfaces
  • legacy_adapter: Backward-compatible bridge for existing callers during phased migration

Feature Modules

  • build_context: Unified BuildCx and ModelTarget for passing build-time parameters
  • surface + surface_extensions: Prompt surface types (main agent, subagent, compaction, title) with extension traits
  • runtime_message: Runtime message injection models (current date, compaction policy)
  • section_source + section_id + section_order: Typed section specs with metadata, outcomes, and fatal error handling
  • signals: Event-driven build signals with caching
  • cache_marker: Cache marker arbitration for LLM caching
  • inheritance: Layer inheritance and override resolution
  • exec_policy: Source execution policy configuration
  • feature_set: Prompt feature flag management
  • budget + clock: Token budget estimation and time injection
  • run_mode: Run mode enum (default, plan)
  • redactor: Content redaction for sensitive information
  • emergency_fallback: Graceful degradation when prompt assembly fails
  • error_codes: Centralized error code definitions
  • active_goal_source: Active goal section provider

Template Files

Extracted 19 template files under prompt/templates/ covering:

  • Role definition, behavioral guidelines, shell tooling guide, skills usage
  • Final response structure
  • Project context, workspace location, system environment, sandbox permissions
  • Compaction (compact, merge)
  • Subagents (explore, review with output contracts)
  • Emergency fallback (main agent, subagent, compaction, title)
  • Run modes (default, plan)
  • Title contract
  • Active goal

Caller Updates

  • agent_session.rs: Updated to consume the new composable prompt architecture
  • agent_run_summary.rs: Adapted summary prompt assembly
  • agent_run_title.rs: Adapted title prompt assembly
  • subagent/orchestrator.rs: Refactored to use new prompt infrastructure
  • Tests: Updated agent_run_manager_tests.rs and agent_session_tests.rs

Documentation

  • Added comprehensive design document (docs/prompt-injection-refactor.md, 1816 lines) covering architecture decisions, data flow, and migration strategy

Motivation

The existing prompt assembly was monolithic and tightly coupled to the agent session lifecycle. This refactoring:

  1. Separates concerns — Prompt content, assembly logic, and injection points are now distinct layers
  2. Enables customization — Template-based content allows easier tuning without Rust recompilation
  3. Improves safety — Emergency fallbacks prevent silent prompt corruption
  4. Supports future features — Layered architecture enables runtime prompt customization, A/B testing, and provider-specific adaptations

Testing

  • cargo test --locked --manifest-path src-tauri/Cargo.toml passes all Rust integration tests
  • npm run typecheck passes TypeScript validation
  • All existing prompt-dependent flows work as before (no behavioral regression)
  • Emergency fallback paths activate correctly during simulated failures

Breaking Changes

None — backward compatibility preserved via legacy_adapter and retained legacy module re-exports.

Related Issues

Refs: Internal prompt injection refactoring initiative

Checklist

  • Code follows project style guidelines
  • Tests have been updated for new architecture
  • No new warnings introduced
  • Legacy API compatibility maintained

🤖 Generated with TiyCode

jorben added 5 commits June 5, 2026 11:28
… cache marker arbitration, surface extension traits, schema version rules, and template front-matter
…rchitecture

Introduce a new prompt/ submodule architecture that replaces monolithic
prompt assembly with composable, layered components:

New core modules:
- composer: Orchestrates prompt assembly from registered sections
- registry: Manages section provider registration and resolution
- layer: Supports layered prompts (base, override, compaction, etc.)
- templates: Template loading, rendering, and front-matter parsing
- template_sources: Bespoke per-role template source abstraction
- renderer: Markdown/XML renderers for prompt surfaces
- legacy_adapter: Backward-compatible bridge for existing callers

New feature modules:
- budget, clock, cache_marker, error_codes, exec_policy, feature_set,
  redactor, run_mode, runtime_message, section_id, section_source,
  signals, surface, surface_extensions, inheritance,
  emergency_fallback, active_goal_source

Template files extracted to prompt/templates/ covering roles,
compaction, subagents, emergency fallback, run modes, and contracts.

Callers updated: agent_session, agent_run_summary, agent_run_title,
subagent orchestrator, and related tests.
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

AI Code Review Summary

PR: #222 (refactor(prompt): ♻️ Overhaul prompt injection with modular layered architecture)
Preferred language: English

Overall Assessment

Detected 10 actionable findings, prioritize CRITICAL/HIGH before merge.

Major Findings by Severity

  • CRITICAL (1)
    • src-tauri/src/core/prompt/templates.rs:419 - Template key validation test fails to compile in non-debug test builds
  • HIGH (1)
    • src-tauri/src/core/prompt/composer.rs:234 - Cache marker byte offsets may be incorrect due to missing layer separators
  • MEDIUM (7)
    • src-tauri/src/core/agent_run_manager_tests.rs:584 - Response language whitespace trimming removed in compaction prompts
    • src-tauri/src/core/agent_run_summary.rs:147 - Language normalization edge case not covered for summary prompts
    • src-tauri/src/core/agent_session.rs:571 - Cache arbiter path not verified in tests
    • src-tauri/src/core/prompt/budget.rs:22 - Unused eviction_order field in PromptBudget
    • src-tauri/src/core/prompt/renderer.rs:37 - XmlRenderer does not escape XML special characters
    • src-tauri/src/core/prompt/sources/source_tests.rs:18 - Test helper leaks SqlitePool via Box::leak
    • src-tauri/src/core/subagent/orchestrator.rs:1114 - Replaced helper system prompt tests lack inheritance verification
  • LOW (1)
    • src-tauri/src/core/agent_session_types.rs:172 - Unused cache_arbiter field in AgentSessionSpec

Actionable Suggestions

  • Fix the cache marker byte offset calculation to include layer separators.
  • Remove the unused eviction_order field from PromptBudget or integrate it into the composer's eviction logic.
  • Fix the template key validation test to compile in all test configurations.
  • Replace Box::leak in test_cx with a proper pool lifecycle (e.g., lazy_static or Arc) to avoid resource leaks.
  • Standardize trailing whitespace trimming across all SectionSource implementations by adding trim_end() where missing.
  • Consider adding a warning log when approval policy JSON is malformed to avoid silent fallbacks.
  • Use the same response_language normalization function across all sources to ensure consistency.
  • Re-add trimming/normalization of response language before passing it to the Composer in compaction prompt builders.
  • Restore targeted tests that verify the helper subagent's system prompt contains the expected inherited sections (project context, profile instructions, etc.).
  • Either remove the unused cache_arbiter field or implement the planned cache marker reset after LLM calls to avoid dead code.
  • Add unit tests for render_handoff_template via build_implementation_handoff_prompt.
  • Add integration test for cache marker placement in system prompt.

Potential Risks

  • Cache marker offsets being wrong could silently degrade LLM cache-control hints.
  • The unused eviction_order field may mislead future developers about eviction behavior.
  • The template test compilation failure may allow key mismatch regressions to slip through CI.
  • Leaked SQLite pool in tests may cause CI failures or flaky tests under high concurrency.
  • Inconsistent trimming may lead to slightly different prompt formatting between runs, potentially affecting LLM cache stability.
  • Silent fallbacks on corrupted settings could hide configuration errors.
  • If the Composer does not inject goal context, active goals will be silently missing from the main agent prompt.
  • Response language with leading/trailing whitespace may produce malformed prompt text, confusing the model.
  • Absence of helper prompt coverage could allow future changes to strip necessary context from subagents.
  • Flaky title prompt test.
  • Missing coverage may hide bugs in handoff and caching.
  • Language normalization gap could cause miscommunication.

Test Suggestions

  • Add tests for cache marker offsets vs final text.
  • Add test for eviction order being applied correctly.
  • Ensure template validation test runs in CI.
  • Add integration tests for prompt assembly that verify output with all sources enabled.
  • Test behavior with empty or malformed settings JSON for sandbox_permissions.
  • Test compaction_contract source with both compaction kinds to ensure correct template selection.
  • Benchmark active_plan source performance with large thread histories.
  • Unit test: build_compact_summary_system_prompt with language " zh-CN " should produce "Respond in zh-CN unless..." (trimmed).
  • Integration test: build session spec for a thread with an active goal and assert that the goal context appears in the system prompt.
  • Unit test: call build_helper_system_prompt and assert that the output contains "Project Context (workspace instructions)" and "Profile Instructions" (or whichever sections are tagged for subagents).
  • Mock clock in deterministic title test.
  • Add cache arbiter integration test with real Composer build.

File-Level Coverage Notes

  • src-tauri/src/core/prompt/composer.rs: The composer implements a clear 7-step pipeline. The main issue is incorrect cache marker offsets and the unused eviction_order field.
  • src-tauri/src/core/prompt/templates.rs: Template rendering and validation are well implemented, but the test compilation problem is critical.
  • src-tauri/src/core/prompt/registry.rs: Registry is well-structured with comprehensive tests. No issues found.
  • src-tauri/src/core/prompt/providers.rs: Removed as part of the refactor. No regressions observed; make sure all functionality is covered by new sources. (File no longer exists; confirm that all dependents have been updated.)
  • src-tauri/src/core/prompt/signals.rs: Signal caching with cycle detection works, but concurrent calls may cause false cycle errors (low risk given current sequential build).
  • src-tauri/src/core/prompt/budget.rs: Budget calculation is sound, but eviction_order field is unused.
  • src-tauri/src/core/prompt/sources/mod.rs: Simple module declarations and re-exports. No issues.
  • src-tauri/src/core/prompt/inheritance.rs: Inheritance rules and tests are solid.
  • src-tauri/src/core/prompt/layer.rs: Layer definitions and ordering are correct; anchor validation tests cover cycles and cross-layer anchors.
  • src-tauri/src/core/prompt/section_source.rs: Core trait and data types are well-defined.
  • src-tauri/src/core/prompt/runtime_message.rs: Runtime message injection is straightforward.
  • src-tauri/src/core/prompt/mod.rs: Module reorganization done correctly with all new modules integrated.
  • src-tauri/src/core/prompt/error_codes.rs: Error code registry and consistency test are good.
  • src-tauri/src/core/prompt/build_context.rs: Context struct with helper constructors is functional.
  • src-tauri/src/core/prompt/snapshot_tests.rs: Comprehensive snapshot tests provide a safety net; depends on fixed_clock_for_test from clock module (not in batch). Ensure it exists. (rely on fixed_clock_for_test; verify it is available.)
  • src-tauri/src/core/prompt/redactor.rs: Simple redactor implementation, okay.
  • src-tauri/src/core/prompt/sources/profile_instructions.rs: Logic is correct; response_parts reassignment is redundant but harmless. The response style line is always appended, even if empty, which may produce an extra blank line.
  • src-tauri/src/core/prompt/sources/project_context.rs: Collects and truncates workspace instruction files correctly. Normalization strips blank lines, which may affect user content readability.
  • src-tauri/src/core/prompt/sources/sandbox_permissions.rs: Reads approval policy and writable roots from settings with robust fallbacks. Parsing logic defaults silently on malformed JSON.
  • src-tauri/src/core/prompt/sources/skills.rs: Loads skills from ExtensionsManager and renders them through template. Path handling for skill_file may need normalization in some environments.
  • ... and 37 more file-level entries.

Inline Downgraded Items (processed but not inline)

  • None

Coverage Status

  • Target files: 57
  • Covered files: 57
  • Uncovered files: 0
  • No-patch/binary covered as file-level: 0
  • Findings with unknown confidence (N/A): 0

Uncovered list:

  • None

No-patch covered list:

  • None

Runtime/Budget

  • Rounds used: 2/4
  • Planned batches: 5
  • Executed batches: 5
  • Sub-agent runs: 9
  • Planner calls: 2
  • Reviewer calls: 11
  • Model calls: 13/64
  • Structured-output summary-only degradation: NO

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 7
  • Findings with unknown confidence: 0
  • Inline comments attempted: 5
  • Target files: 33
  • Covered files: 33
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

Comment thread src-tauri/src/core/prompt/composer.rs Outdated

// EmergencyFallback: if no sections were produced, inject hard-coded fallback
let fallback_used = bodies.is_empty();
if fallback_used {

This comment was marked as outdated.


/// Semantic ordering hint for sections within the same layer.
/// Replaces the old bare `u16` with anchor-relative or absolute positioning.
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord)]

This comment was marked as outdated.

let renderer = cx.renderer.as_ref();

for (spec, layer, body, warn, source_elapsed) in &kept {
let rendered = renderer.render_section(&spec.title, &body.markdown);

This comment was marked as outdated.

Ok(v) => v,
Err(e) => {
return Ok(SectionOutcome::SoftFailed {
code: "settings.approval_policy.load_failed",

This comment was marked as outdated.

.filter(|(_, body)| !body.trim().is_empty())
.collect()
}
// Phase 2b: legacy string-parsing functions removed.

This comment was marked as outdated.

jorben added 12 commits June 5, 2026 21:11
…late-based SubagentBodySource

Phase 7 of the prompt-injection refactor: subagent identity, persona, and
shell tooling guide are no longer hardcoded as separate strings in orchestrator
or SubagentProfile::system_prompt(). They are now rendered by SubagentBodySource
via the Composer from template files (explore.md, review.md) or from the
user-provided system_prompt for custom subagents.

Key changes:
- Replace LegacyCustomSubagentBodySource with SubagentBodySource
- Rename SectionId::CustomSubagentBody to SectionId::SubagentBody
- Remove inline helper_shell_tooling_guide() and its test coverage
- Add comprehensive tests for PromptBudget scaling and registry section lists
- Bump registry schema version from 1 to 2
…section enforcement

Remove the compiler-inlined `EmergencyFallback` system that served as a last-resort prompt when all sections failed. This simplifies `Composer::build` by eliminating:

- Per-surface fallback text files and `emergency_fallback_text()` lookup
- Critical section lists that would upgrade `SoftFailed` to `FatalError`
- `SectionAudit::fallback_used` flag and `SectionWarning::EmergencyFallback`
- `SurfaceExtension::critical_sections()` method and its lint tests
- Associated tracing metrics (`prompt.fallback.emergency_total`)

The system trade-off (hardening against total prompt failure) added complexity and was never triggered in production. Removing it reduces maintenance burden and makes the prompt pipeline easier to evolve. Soft failures are already surfaced via warnings and logs.
…irectory

Move individual SectionSource implementations from flat module files into a dedicated `sources/` module directory. This improves code organization and maintainability by grouping related source implementations together and reducing the number of top-level modules.

The refactoring includes:
- Creating `sources/mod.rs` to re-export all source implementations
- Moving ActiveGoalSource, ActivePlanSource, CompactionContractSource, and other sources into separate files within the new directory
- Removing `template_sources.rs` which contained implementations now distributed to individual source files
- Updating `registry.rs` and `mod.rs` imports to use the new module structure
- Adding corresponding test file `source_tests.rs` for idempotency and determinism verification
…hot tests

Replace hardcoded handoff prompt strings in `agent_run_summary.rs` with
template files (`templates/handoff/with_plan.tpl.md` and
`without_plan.tpl.md`) that include YAML front-matter for metadata.

Remove the backward-compat `providers.rs` module and update
`agent_session_tests.rs` to load template bodies directly.

Add comprehensive snapshot tests using `insta` to capture the full
composed prompt text for all prompt surfaces (MainAgent, Subagent,
Compaction, Title). These snapshots serve as an audit trail against
accidental prompt drift.

Add `insta` as a dev dependency for snapshot assertions.
…es and add cache marker arbiter

Replace generic `TemplateSource<F>` instantiations in `registry.rs` for the Role, BehavioralGuidelines, FinalResponseStructure, and ShellToolingGuide sections with first-class `SectionSource` implementations (`RoleSource`, `BehavioralGuidelinesSource`, `FinalResponseStructureSource`, `ShellToolingGuideSource`).  Each new source performs explicit front-matter version validation and template rendering, improving error handling and testability.

Rename `SectionId::SubagentBody` to `SectionId::CustomSubagentBody` to clarify its purpose (applies only to custom subagents, not built-in Explore/Review).  Update all usages in `inheritance.rs`, `budget.rs`, and `registry.rs` accordingly.

Introduce a `cache_arbiter` field in `AgentSessionSpec` and thread a `DefaultCacheMarkerArbiter` through `build_system_prompt` to enable per-request cache marker tracking for future LLM caching support.  Change `PromptBudget::for_model` to accept a `ModelTarget` reference instead of a raw `usize`, deriving the context window from the target model.

Adjust all affected tests and snapshot files to match the new output (minor whitespace/byte differences from the dedicated source renderers).  Add a new snapshot test `snapshot_subagent_custom` to cover the custom subagent surface.

These changes unify section construction, prepare the foundation for prompt caching, and eliminate inline generic template handling in the registry.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 6
  • Findings with unknown confidence: 0
  • Inline comments attempted: 6
  • Target files: 54
  • Covered files: 24
  • Uncovered files: 30
    See the summary comment for detailed analysis and coverage details.

"- Respond in {language} unless the user explicitly asks for a different language."
));
}
pub(crate) async fn build_compact_summary_system_prompt(response_language: Option<&str>) -> String {

This comment was marked as outdated.

Ok(normalize_generated_title(&message.text_content()))
}

/// Build the Title surface system prompt via Composer (Phase 6).

This comment was marked as outdated.

Comment thread src-tauri/Cargo.toml

[dev-dependencies]
tempfile = "3"
insta = { version = "1", features = ["yaml"] }

This comment was marked as outdated.

Comment thread src-tauri/src/core/agent_run_summary.rs Outdated
}

/// Strip YAML front-matter and return the template body.
fn strip_front_matter(tpl: &str) -> &str {

This comment was marked as outdated.

use crate::persistence::repo::provider_repo;

/// Strip YAML front-matter (delimited by ---) from a template string.
fn strip_front_matter(tpl: &str) -> &str {

This comment was marked as outdated.

run_mode,
helper_profile: None,
custom_subagent_slug: None,
target_model: ModelTarget::AnthropicClaude {

This comment was marked as outdated.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 20
  • Findings with unknown confidence: 0
  • Inline comments attempted: 12
  • Target files: 54
  • Covered files: 54
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

supports_cache_control: true,
},
renderer: Arc::new(MarkdownRenderer),
clock: Arc::new(FixedClock::new(chrono::Utc::now())),

This comment was marked as outdated.

@@ -1,555 +0,0 @@
use std::path::Path;

This comment was marked as outdated.

use super::super::super::run_mode::RunMode;
use super::super::super::section_source::SectionSource;
use super::super::super::signals::SignalCache;
use super::super::{RunModeSource, SystemEnvironmentSource};

This comment was marked as outdated.


let cx_plan = BuildCx {
run_mode: RunMode::Plan,
..base_cx.clone()

This comment was marked as outdated.

/// The helper-specific tail (shell-tooling guide + helper-profile body)
/// is appended after the composed prompt; full migration to template-backed
/// sources is future work.
async fn build_helper_system_prompt(

This comment was marked as outdated.

})?;

if tmpl.version != self.spec_version {
return Err(FatalError::new(

This comment was marked as outdated.

@@ -0,0 +1,68 @@
use async_trait::async_trait;

This comment was marked as outdated.

@@ -0,0 +1,64 @@
use async_trait::async_trait;

This comment was marked as outdated.

Comment thread src-tauri/Cargo.toml

[dev-dependencies]
tempfile = "3"
insta = { version = "1", features = ["yaml"] }

This comment was marked as outdated.

})?;

// Cow wraps the const &'static str — clone if borrowed
let _ = Cow::Borrowed(rel_path);

This comment was marked as outdated.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 7
  • Findings with unknown confidence: 1
  • Inline comments attempted: 5
  • Target files: 54
  • Covered files: 54
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

_ => ProfileResponseStyle::Balanced,
}
}

This comment was marked as outdated.

@@ -0,0 +1,76 @@
use async_trait::async_trait;

This comment was marked as outdated.

@@ -0,0 +1,87 @@
use async_trait::async_trait;

This comment was marked as outdated.

}

async fn build(&self, cx: &BuildCx<'_>) -> Result<SectionOutcome, FatalError> {
let kind = cx_compaction_kind(cx);

This comment was marked as outdated.

@@ -0,0 +1,132 @@
use async_trait::async_trait;

This comment was marked as outdated.

…atform-independent

Bump `@tauri-apps/api` from `^2` to `^2.11.0` in package.json and package-lock.json.

Fix snapshot test portability by two changes:
- Replace `entry.bytes` with `[snap]` in `format_audit_snapshot` to avoid byte count differences across platforms (e.g. 7 on macOS vs 6 on Linux for the same text).
- Change the test workspace path from `/tmp/tiycode-snapshot-workspace` to `/tiycode-snap-workspace` to prevent collisions with the `TMPDIR` environment variable during path normalization.

Update all affected `.snap` files accordingly.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 33
  • Findings with unknown confidence: 0
  • Inline comments attempted: 11
  • Target files: 56
  • Covered files: 56
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

};

// Check for cycle: if already in_flight, we have a dependency loop
if slot

This comment was marked as outdated.

}
}

/// Render a handoff template that includes plan markdown.

This comment was marked as outdated.

/// 3. The wall-clock date enters the LLM context only via this prepend; if
/// a feature later needs server-authoritative time across the full message
/// history, a `compaction_pinned` column on `messages` would be required.
pub(crate) async fn inject_runtime_context(user_prompt: &str) -> String {

This comment was marked as outdated.

let mut marker_count = 0;
// Iterate layers in increasing order (StablePrefix < SessionStable < RuntimeOverlay)
for (_layer, indices) in by_layer.iter() {
if marker_count >= 2 {

This comment was marked as outdated.

}

#[cfg(test)]
mod tests {

This comment was marked as outdated.

}
}

/// Shared formatting for title-generation language rule line.

This comment was marked as outdated.

results.push((spec, layer, outcome, source_start.elapsed()));

// Hard overall budget cap; if exceeded mid-pipeline, stop building further sources
if start.elapsed() > self.exec_policy.overall_build_timeout {

This comment was marked as outdated.

// We construct a minimal BuildCx to satisfy the trait signature.
let injector = CurrentDateInjector::new(Arc::new(SystemClock));
// Build a dummy BuildCx — CurrentDateInjector doesn't read it.
let placeholder_pool = match sqlx::SqlitePool::connect_lazy("sqlite::memory:") {

This comment was marked as outdated.

let template = include_str!("../templates/subagent/explore.md");
let (_tmpl, body) =
super::super::templates::parse_front_matter(template).map_err(|e| {
FatalError::new("template.parse", format!("subagent/explore.md: {e}"))

This comment was marked as outdated.

}
}

// Fallback response_language

This comment was marked as outdated.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 13
  • Findings with unknown confidence: 0
  • Inline comments attempted: 12
  • Target files: 56
  • Covered files: 56
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

};
use std::sync::Arc;

let placeholder_pool =

This comment was marked as outdated.

};
use std::sync::Arc;

let placeholder_pool =

This comment was marked as outdated.

// We construct a minimal BuildCx to satisfy the trait signature.
let injector = CurrentDateInjector::new(Arc::new(SystemClock));
// Build a dummy BuildCx — CurrentDateInjector doesn't read it.
let placeholder_pool = match sqlx::SqlitePool::connect_lazy("sqlite::memory:") {

This comment was marked as outdated.


/// Global cache marker arbiter for a single LLM request.
/// Enforces the ≤4 breakpoint limit across system prompt + messages.
pub trait CacheMarkerArbiter: Send + Sync + std::fmt::Debug {

This comment was marked as outdated.

// Find the anchor target's position within the same layer
let target_spec = all_specs.iter().find(|s| &s.id == target_id);
let target_order = match target_spec {
Some(ts) => Self::resolve_anchored_order(ts, all_specs),

This comment was marked as outdated.

// We construct a minimal BuildCx to satisfy the trait signature.
let injector = CurrentDateInjector::new(Arc::new(SystemClock));
// Build a dummy BuildCx — CurrentDateInjector doesn't read it.
let placeholder_pool = match sqlx::SqlitePool::connect_lazy("sqlite::memory:") {

This comment was marked as outdated.

body: &SectionBody,
budget: &PromptBudget,
) -> (SectionBody, Option<SectionWarning>) {
let limit = spec.max_chars.unwrap_or(budget.per_section_default_chars);

This comment was marked as outdated.

/// All registered error codes for startup lint test.
pub const ALL_ERROR_CODES: &[&str] = &[
codes::TEMPLATE_NOT_FOUND,
codes::TEMPLATE_MISSING_KEY,

This comment was marked as outdated.

}

/// Convenience constructor for test clocks.
pub fn fixed_clock_for_test() -> Arc<FixedClock> {

This comment was marked as outdated.

Some(meta)
});

let _plan = match active_plan {

This comment was marked as outdated.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 11
  • Findings with unknown confidence: 0
  • Inline comments attempted: 7
  • Target files: 56
  • Covered files: 56
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

}
}

let mut result = tpl.to_string();

This comment was marked as outdated.


/// Resolve the order of a section within its layer, handling Anchored positions.
/// Returns (base_order, anchor_target_order) for topological comparison.
fn resolve_anchored_order(spec: &SectionSpec, all_specs: &[&SectionSpec]) -> u32 {

This comment was marked as outdated.


#[tokio::test]
async fn snapshot_compaction_compact() {
snap_surface(

This comment was marked as outdated.

let cx = BuildCx {
pool: &pool,
workspace_path: workspace,
thread_id: None, // No thread → Ephemeral sections will Skip

This comment was marked as outdated.

};
use std::sync::Arc;

let placeholder_pool =

This comment was marked as outdated.

)
}

fn apply_total_budget<'a>(

This comment was marked as outdated.

/// Construct a minimal BuildCx for testing sources that don't need DB access.
async fn test_cx() -> BuildCx<'static> {
let pool = sqlx::SqlitePool::connect_lazy("sqlite::memory:").unwrap();
let pool_ref = Box::leak(Box::new(pool));

This comment was marked as outdated.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 7
  • Findings with unknown confidence: 0
  • Inline comments attempted: 5
  • Target files: 56
  • Covered files: 56
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.


let system_prompt = build_system_prompt(pool, &raw_plan, workspace_path, run_mode).await?;
let system_prompt = inject_goal_context(pool, thread_id, system_prompt).await?;
let composed_prompt =

This comment was marked as outdated.

}

/// Render a handoff template that includes plan markdown.
fn render_handoff_template(

This comment was marked as outdated.

// We construct a minimal BuildCx to satisfy the trait signature.
let injector = CurrentDateInjector::new(Arc::new(SystemClock));
// Build a dummy BuildCx — CurrentDateInjector doesn't read it.
let placeholder_pool = match sqlx::SqlitePool::connect_lazy("sqlite::memory:") {

This comment was marked as outdated.

);
}

// ── build_helper_system_prompt(Phase 7 Composer pipeline)──

This comment was marked as outdated.

/// A fatal error that causes the entire prompt build to fail.
/// Rare; reserved for truly unrecoverable errors (template load failure, SQLite fatal disconnect).
#[derive(Debug)]
pub struct FatalError {

This comment was marked as outdated.

When the Skills section is removed from the prompt surface, the
leading newline of the matched "\n## " pattern was included in the
remaining text, producing a double blank line between sections. This
caused snapshot mismatches on CI runners where no skills are
installed.

Skip the leading newline so the result joins `before` (which already
ends with the layer separator) directly to the next section header,
yielding exactly one blank line—matching output on machines without
the Skills section.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 27
  • Findings with unknown confidence: 0
  • Inline comments attempted: 11
  • Target files: 56
  • Covered files: 52
  • Uncovered files: 4
    See the summary comment for detailed analysis and coverage details.

"- Respond in {language} unless the user explicitly asks for a different language."
));
}
pub(crate) async fn build_compact_summary_system_prompt(response_language: Option<&str>) -> String {

This comment was marked as outdated.

);
}

#[tokio::test]

This comment was marked as outdated.

}
}

impl CacheMarkerArbiter for DefaultCacheMarkerArbiter {

This comment was marked as outdated.

)
}

async fn build_message(&self, _cx: &BuildCx<'_>) -> Option<RuntimeMessage> {

This comment was marked as outdated.

use crate::persistence::init_database;

/// Build a snapshot for a given surface using a fresh temp DB.
async fn snap_surface(surface: PromptSurface, snapshot_name: &str) {

This comment was marked as outdated.

/// Construct a minimal BuildCx for testing sources that don't need DB access.
async fn test_cx() -> BuildCx<'static> {
let pool = sqlx::SqlitePool::connect_lazy("sqlite::memory:").unwrap();
let pool_ref = Box::leak(Box::new(pool));

This comment was marked as outdated.


impl SurfacePattern {
/// Check whether this pattern matches a given surface.
pub fn matches(&self, surface: &PromptSurface) -> bool {

This comment was marked as outdated.

per_section_overrides.insert(SectionId::CustomSubagentBody, total_chars / 4);

// Compaction / Title surfaces use tighter budgets
let total_chars = match surface {

This comment was marked as outdated.


impl CacheMarkerArbiter for DefaultCacheMarkerArbiter {
fn record_system_markers(&self, markers: &[CacheMarkerSlot]) {
let mut sys = self.system_markers.lock().unwrap();

This comment was marked as outdated.

let raw = load_template(TEMPLATE_REL_PATH, TEMPLATE_EMBEDDED);
let (tmpl, body) = parse_front_matter(&raw).map_err(|e| {
FatalError::new(
super::super::error_codes::codes::TEMPLATE_NOT_FOUND,

This comment was marked as outdated.

jorben added 5 commits June 6, 2026 16:10
…uidelines

- Add operating boundaries and safety red lines to role definition
- Reorganize behavioral guidelines into categorized sections
- Clarify run mode guidance for default and plan modes
- Bump Role, BehavioralGuidelines, and RunMode versions to 2
- Update snapshot tests and source tests for new versions
- Remove `shell` template variable from ShellToolingGuide; reference
  System Environment section instead of injecting runtime value
- Replace Chinese example fragment with English in
  FinalResponseStructure template
- Add honesty and coverage guidelines to explore and review subagent
  templates
- Add example output section to merge compaction template
- Use generic path placeholder in explore template examples
- Bump affected prompt section versions from 1 to 2

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 13
  • Findings with unknown confidence: 0
  • Inline comments attempted: 11
  • Target files: 56
  • Covered files: 56
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

SUMMARY_HISTORY_MIN_CHARS,
};
use super::agent_run_title::collapse_whitespace;
use crate::core::prompt::templates::strip_front_matter;

This comment was marked as outdated.

}

/// Build the Title surface system prompt via Composer (Phase 6).
pub(crate) async fn build_title_system_prompt() -> String {

This comment was marked as outdated.

}

fn build_helper_system_prompt(
parent_system_prompt: &str,

This comment was marked as outdated.

}
}

/// Shared formatting for title-generation language rule line.

This comment was marked as outdated.

/// 3. The wall-clock date enters the LLM context only via this prepend; if
/// a feature later needs server-authoritative time across the full message
/// history, a `compaction_pinned` column on `messages` would be required.
pub(crate) async fn inject_runtime_context(user_prompt: &str) -> String {

This comment was marked as outdated.

Some(meta)
});

let _plan = match active_plan {

This comment was marked as outdated.

@@ -0,0 +1,67 @@
use async_trait::async_trait;

This comment was marked as outdated.

/// Construct a minimal BuildCx for testing sources that don't need DB access.
async fn test_cx() -> BuildCx<'static> {
let pool = sqlx::SqlitePool::connect_lazy("sqlite::memory:").unwrap();
let pool_ref = Box::leak(Box::new(pool));

This comment was marked as outdated.

use crate::core::subagent::SubagentProfile;
use std::sync::Arc;

#[test]

This comment was marked as outdated.

Comment thread src-tauri/Cargo.toml

[dev-dependencies]
tempfile = "3"
insta = { version = "1", features = ["yaml"] }

This comment was marked as outdated.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 23
  • Findings with unknown confidence: 0
  • Inline comments attempted: 18
  • Target files: 56
  • Covered files: 55
  • Uncovered files: 1
    See the summary comment for detailed analysis and coverage details.

surfaces: SurfaceMatcher::Any(vec![SurfacePattern::AnyMainAgent]),
version: 1,
max_chars: None,
criticality: SectionCriticality::NonCritical,

This comment was marked as outdated.

@@ -0,0 +1,155 @@
use async_trait::async_trait;

This comment was marked as outdated.

@@ -0,0 +1,95 @@
use async_trait::async_trait;

This comment was marked as outdated.

/// Template-backed SectionSource for the Skills section.
/// Loads skills from the ExtensionsManager and renders via the skills_usage.md template.
/// Replaces LegacySkillsSource (which delegates to the old SkillsProvider).
pub struct SkillsSource;

This comment was marked as outdated.

use std::sync::Arc;

/// Construct a minimal BuildCx for testing sources that don't need DB access.
async fn test_cx() -> BuildCx<'static> {

This comment was marked as outdated.

.collect::<Vec<_>>()
.join("\n");

let raw = load_template(TEMPLATE_REL_PATH, TEMPLATE_EMBEDDED);

This comment was marked as outdated.

);
}

// ── build_helper_system_prompt(Phase 7 Composer pipeline)──

This comment was marked as outdated.

Comment thread src-tauri/Cargo.toml

[dev-dependencies]
tempfile = "3"
insta = { version = "1", features = ["yaml"] }

This comment was marked as outdated.

.in_flight
.swap(true, std::sync::atomic::Ordering::SeqCst)
{
// Already in flight → cycle detected

This comment was marked as outdated.

fn pattern(&self) -> SurfacePattern {
match self {
PromptSurface::MainAgent { run_mode } => SurfacePattern::MainAgent(*run_mode),
PromptSurface::SubagentExplore { .. } => SurfacePattern::AnySubagent,

This comment was marked as outdated.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 10
  • Findings with unknown confidence: 0
  • Inline comments attempted: 10
  • Target files: 57
  • Covered files: 57
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

/// stale declarations after template edits.
#[test]
fn templates_have_no_undeclared_keys() {
let template_dir = super::template_root();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CRITICAL] Template key validation test fails to compile in non-debug test builds

The test that validates template key declarations will fail to compile in any build where debug_assertions is not enabled, making it impossible to run tests in release mode or in CI configurations that disable debug assertions.

Suggestion: Either conditionally compile the test only when debug_assertions is enabled (#[cfg(debug_assertions)]), or provide a test‑specific template_root implementation that works in all test configurations (e.g., #[cfg(any(test, debug_assertions))]).

Risk: This prevents running the test suite in some CI pipelines, concealing template key mismatch regressions. Could lead to runtime errors if keys are misdeclared.

Confidence: 0.95

[From SubAgent: general]

// Record marker slots via the arbiter so the message layer can coordinate
// quota (≤4 Anthropic breakpoints across system prompt + runtime messages).
if let Some(ref arbiter) = self.cache_arbiter {
let mut byte_offset = 0usize;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Cache marker byte offsets may be incorrect due to missing layer separators

The byte offsets recorded for cache markers are computed solely on block text lengths, ignoring layer separators that will be inserted between sections in the final prompt text, leading to incorrect positions.

Suggestion: Compute offsets using the same logic as the final text assembly: either insert separators between block texts when iterating, or calculate offsets after joining.

Risk: If these offsets are used for cache-control breakpoints or other markers that depend on exact byte positions in the transmitted system prompt, the offsets will be misaligned, potentially breaking cache control hints and wrecking cache performance.

Confidence: 0.85

[From SubAgent: general]

fn compact_summary_system_prompt_uses_response_language_when_present() {
let prompt = build_compact_summary_system_prompt(Some(" 简体中文 "));
#[tokio::test]
async fn compact_summary_system_prompt_uses_response_language_when_present() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Response language whitespace trimming removed in compaction prompts

The response language string in compaction prompts is no longer normalized, which can cause malformed language instructions when the language value contains whitespace.

Suggestion: Reintroduce normalization (trim and empty-check) before passing response_language to the Composer, or ensure the Composer's BuildCx or downstream injection normalizes the value.

Risk: If the profile response language includes padding whitespace, the resulting system prompt may contain malformed lines like Respond in 简体中文 unless..., confusing the model.

Confidence: 0.85

[From SubAgent: general]

"- Respond in {language} unless the user explicitly asks for a different language."
));
}
pub(crate) async fn build_compact_summary_system_prompt(response_language: Option<&str>) -> String {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Language normalization edge case not covered for summary prompts

The removal of explicit language trimming may cause Composer to fail to match the language rule for strings with extranous whitespace.

Suggestion: Add a test with a language value containing leading/trailing spaces to ensure the Composer (or ProfileInstructionsSource) normalizes it correctly.

Risk: Agent might produce summaries in wrong language if the language tag is misidentified.

Confidence: 0.90

[From SubAgent: testing]

let system_prompt = inject_goal_context(pool, thread_id, system_prompt).await?;
let composed_prompt =
build_system_prompt(pool, &raw_plan, workspace_path, run_mode, thread_id).await?;
let system_prompt = composed_prompt.text.clone();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Cache arbiter path not verified in tests

The new prompt caching feature has no test coverage; no test uses a non-None cache arbiter or verifies that ephemeral markers appear in the system prompt.

Suggestion: Add an integration test that builds a session with caching enabled and asserts the system prompt contains appropriate cache markers for stable sections.

Risk: Cache marker placement errors could lead to poor cache utilization or incorrect context window injection on Anthropic models.

Confidence: 0.85

[From SubAgent: testing]

pub per_section_overrides: BTreeMap<SectionId, usize>,

/// Eviction order: layers are removed in this order when total budget is exceeded.
/// Default: [Ephemeral, RuntimeOverlay, SessionStable, StablePrefix]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Unused eviction_order field in PromptBudget

The budget's eviction_order is never consulted during prompt composition, making it dead code and potentially misleading.

Suggestion: Remove the field and its doc comment, or use it in the composer to determine which sections to evict when total budget is exceeded, ensuring the declared eviction order is honored.

Risk: Future developers might assume the declared order is respected, but since it's not tested or used, eviction behavior may silently differ from expectations if the sorting order changes.

Confidence: 0.90

[From SubAgent: general]


impl SectionRenderer for XmlRenderer {
fn render_section(&self, title: &str, body: &str) -> String {
format!("<section name=\"{}\">\n{}\n</section>", title, body)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] XmlRenderer does not escape XML special characters

The XmlRenderer produces raw XML without escaping special characters, which may produce malformed prompts if the title or body contain XML-sensitive characters.

Suggestion: Escape XML special characters (e.g., <, >, &, ") in render_section for the XmlRenderer, or wrap body content in CDATA sections with careful handling.

Risk: If prompt content includes XML special characters (common in code snippets, config values, or shell output), the resulting prompt will be malformed and may cause API errors or degraded section recall.

Confidence: 0.90

[From SubAgent: general]

/// Construct a minimal BuildCx for testing sources that don't need DB access.
async fn test_cx() -> BuildCx<'static> {
let pool = sqlx::SqlitePool::connect_lazy("sqlite::memory:").unwrap();
let pool_ref = Box::leak(Box::new(pool));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Test helper leaks SqlitePool via Box::leak

The test_cx function leaks a SqlitePool, which may cause resource exhaustion and interfere with other tests.

Suggestion: Use a lazy_static or Arc-based pool with a proper lifecycle, or adjust BuildCx to not require 'static.

Risk: Memory and connection leaks can cause test runner slowdowns and flaky tests due to resource contention.

Confidence: 0.90

[From SubAgent: general]

"Error: boom"
);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Replaced helper system prompt tests lack inheritance verification

The Composer-based helper prompt no longer verifies the correct inheritance of required sections; regression risk if surface matchers fail to include them.

Suggestion: Add assertions on the helper prompt content for essential inherited sections (e.g., Profile Instructions, System Environment) to ensure the Composer’s section matchers are correct.

Risk: Helper subagents could miss critical context like workspace instructions, leading to bad exploration/reviews.

Confidence: 0.90

[From SubAgent: testing]

pub model_plan: ResolvedRuntimeModelPlan,
pub initial_prompt: Option<String>,
pub initial_context_calibration: ContextTokenCalibration,
/// Global cache marker arbiter for the request lifecycle.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Unused cache_arbiter field in AgentSessionSpec

A new field is added and populated but never consumed, which may confuse future readers and suggests either an incomplete feature or dead code.

Suggestion: Either implement the intended usage (resetting the arbiter after LLM calls as mentioned in the comment) or remove the field until the feature is fully integrated.

Risk: If left unused, the field could mask the need for cache marker reset logic and cause eventual prompt-cache pollution if integrated incorrectly later.

Confidence: 0.95

[From SubAgent: general]

@jorben jorben merged commit 96e7710 into master Jun 6, 2026
4 checks passed
@jorben jorben deleted the refact/prompt-injection-refactor branch June 6, 2026 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant