refactor(prompt): ♻️ Overhaul prompt injection with modular layered architecture by jorben · Pull Request #222 · tiylabs/tiycode

jorben · 2026-06-05T09:48:42Z

Summary

Refactor the monolithic prompt injection system into a modular, layered architecture with 20+ new modules, template-based content management, and a composable section registry. This establishes the foundation for future prompt customization, safer emergency fallbacks, and cleaner subagent contracts.

Changes

Core Architecture (new modules)

composer: Orchestrates prompt assembly from registered sections
registry: Manages section provider registration and resolution with priority ordering
layer: Supports layered prompts (base, override, compaction, emergency fallback) with section anchoring, ordering, and audit
templates + template_sources: Template loading, rendering with Handlebars-style templating, front-matter parsing, and per-role template source abstraction
renderer: Markdown/XML renderers for prompt surfaces
legacy_adapter: Backward-compatible bridge for existing callers during phased migration

Feature Modules

build_context: Unified BuildCx and ModelTarget for passing build-time parameters
surface + surface_extensions: Prompt surface types (main agent, subagent, compaction, title) with extension traits
runtime_message: Runtime message injection models (current date, compaction policy)
section_source + section_id + section_order: Typed section specs with metadata, outcomes, and fatal error handling
signals: Event-driven build signals with caching
cache_marker: Cache marker arbitration for LLM caching
inheritance: Layer inheritance and override resolution
exec_policy: Source execution policy configuration
feature_set: Prompt feature flag management
budget + clock: Token budget estimation and time injection
run_mode: Run mode enum (default, plan)
redactor: Content redaction for sensitive information
emergency_fallback: Graceful degradation when prompt assembly fails
error_codes: Centralized error code definitions
active_goal_source: Active goal section provider

Template Files

Extracted 19 template files under prompt/templates/ covering:

Role definition, behavioral guidelines, shell tooling guide, skills usage
Final response structure
Project context, workspace location, system environment, sandbox permissions
Compaction (compact, merge)
Subagents (explore, review with output contracts)
Emergency fallback (main agent, subagent, compaction, title)
Run modes (default, plan)
Title contract
Active goal

Caller Updates

agent_session.rs: Updated to consume the new composable prompt architecture
agent_run_summary.rs: Adapted summary prompt assembly
agent_run_title.rs: Adapted title prompt assembly
subagent/orchestrator.rs: Refactored to use new prompt infrastructure
Tests: Updated agent_run_manager_tests.rs and agent_session_tests.rs

Documentation

Added comprehensive design document (docs/prompt-injection-refactor.md, 1816 lines) covering architecture decisions, data flow, and migration strategy

Motivation

The existing prompt assembly was monolithic and tightly coupled to the agent session lifecycle. This refactoring:

Separates concerns — Prompt content, assembly logic, and injection points are now distinct layers
Enables customization — Template-based content allows easier tuning without Rust recompilation
Improves safety — Emergency fallbacks prevent silent prompt corruption
Supports future features — Layered architecture enables runtime prompt customization, A/B testing, and provider-specific adaptations

Testing

cargo test --locked --manifest-path src-tauri/Cargo.toml passes all Rust integration tests
npm run typecheck passes TypeScript validation
All existing prompt-dependent flows work as before (no behavioral regression)
Emergency fallback paths activate correctly during simulated failures

Breaking Changes

None — backward compatibility preserved via legacy_adapter and retained legacy module re-exports.

Related Issues

Refs: Internal prompt injection refactoring initiative

Checklist

Code follows project style guidelines
Tests have been updated for new architecture
No new warnings introduced
Legacy API compatibility maintained

🤖 Generated with TiyCode

…tion

… cache marker arbitration, surface extension traits, schema version rules, and template front-matter

…rchitecture Introduce a new prompt/ submodule architecture that replaces monolithic prompt assembly with composable, layered components: New core modules: - composer: Orchestrates prompt assembly from registered sections - registry: Manages section provider registration and resolution - layer: Supports layered prompts (base, override, compaction, etc.) - templates: Template loading, rendering, and front-matter parsing - template_sources: Bespoke per-role template source abstraction - renderer: Markdown/XML renderers for prompt surfaces - legacy_adapter: Backward-compatible bridge for existing callers New feature modules: - budget, clock, cache_marker, error_codes, exec_policy, feature_set, redactor, run_mode, runtime_message, section_id, section_source, signals, surface, surface_extensions, inheritance, emergency_fallback, active_goal_source Template files extracted to prompt/templates/ covering roles, compaction, subagents, emergency fallback, run modes, and contracts. Callers updated: agent_session, agent_run_summary, agent_run_title, subagent orchestrator, and related tests.

github-actions · 2026-06-05T09:58:13Z

AI Code Review Summary

PR: #222 (refactor(prompt): ♻️ Overhaul prompt injection with modular layered architecture)
Preferred language: English

Overall Assessment

Detected 10 actionable findings, prioritize CRITICAL/HIGH before merge.

Major Findings by Severity

CRITICAL (1)
- src-tauri/src/core/prompt/templates.rs:419 - Template key validation test fails to compile in non-debug test builds
HIGH (1)
- src-tauri/src/core/prompt/composer.rs:234 - Cache marker byte offsets may be incorrect due to missing layer separators
MEDIUM (7)
- src-tauri/src/core/agent_run_manager_tests.rs:584 - Response language whitespace trimming removed in compaction prompts
- src-tauri/src/core/agent_run_summary.rs:147 - Language normalization edge case not covered for summary prompts
- src-tauri/src/core/agent_session.rs:571 - Cache arbiter path not verified in tests
- src-tauri/src/core/prompt/budget.rs:22 - Unused eviction_order field in PromptBudget
- src-tauri/src/core/prompt/renderer.rs:37 - XmlRenderer does not escape XML special characters
- src-tauri/src/core/prompt/sources/source_tests.rs:18 - Test helper leaks SqlitePool via Box::leak
- src-tauri/src/core/subagent/orchestrator.rs:1114 - Replaced helper system prompt tests lack inheritance verification
LOW (1)
- src-tauri/src/core/agent_session_types.rs:172 - Unused cache_arbiter field in AgentSessionSpec

Actionable Suggestions

Fix the cache marker byte offset calculation to include layer separators.
Remove the unused eviction_order field from PromptBudget or integrate it into the composer's eviction logic.
Fix the template key validation test to compile in all test configurations.
Replace Box::leak in test_cx with a proper pool lifecycle (e.g., lazy_static or Arc) to avoid resource leaks.
Standardize trailing whitespace trimming across all SectionSource implementations by adding trim_end() where missing.
Consider adding a warning log when approval policy JSON is malformed to avoid silent fallbacks.
Use the same response_language normalization function across all sources to ensure consistency.
Re-add trimming/normalization of response language before passing it to the Composer in compaction prompt builders.
Restore targeted tests that verify the helper subagent's system prompt contains the expected inherited sections (project context, profile instructions, etc.).
Either remove the unused cache_arbiter field or implement the planned cache marker reset after LLM calls to avoid dead code.
Add unit tests for render_handoff_template via build_implementation_handoff_prompt.
Add integration test for cache marker placement in system prompt.

Potential Risks

Cache marker offsets being wrong could silently degrade LLM cache-control hints.
The unused eviction_order field may mislead future developers about eviction behavior.
The template test compilation failure may allow key mismatch regressions to slip through CI.
Leaked SQLite pool in tests may cause CI failures or flaky tests under high concurrency.
Inconsistent trimming may lead to slightly different prompt formatting between runs, potentially affecting LLM cache stability.
Silent fallbacks on corrupted settings could hide configuration errors.
If the Composer does not inject goal context, active goals will be silently missing from the main agent prompt.
Response language with leading/trailing whitespace may produce malformed prompt text, confusing the model.
Absence of helper prompt coverage could allow future changes to strip necessary context from subagents.
Flaky title prompt test.
Missing coverage may hide bugs in handoff and caching.
Language normalization gap could cause miscommunication.

Test Suggestions

Add tests for cache marker offsets vs final text.
Add test for eviction order being applied correctly.
Ensure template validation test runs in CI.
Add integration tests for prompt assembly that verify output with all sources enabled.
Test behavior with empty or malformed settings JSON for sandbox_permissions.
Test compaction_contract source with both compaction kinds to ensure correct template selection.
Benchmark active_plan source performance with large thread histories.
Unit test: build_compact_summary_system_prompt with language " zh-CN " should produce "Respond in zh-CN unless..." (trimmed).
Integration test: build session spec for a thread with an active goal and assert that the goal context appears in the system prompt.
Unit test: call build_helper_system_prompt and assert that the output contains "Project Context (workspace instructions)" and "Profile Instructions" (or whichever sections are tagged for subagents).
Mock clock in deterministic title test.
Add cache arbiter integration test with real Composer build.

File-Level Coverage Notes

src-tauri/src/core/prompt/composer.rs: The composer implements a clear 7-step pipeline. The main issue is incorrect cache marker offsets and the unused eviction_order field.
src-tauri/src/core/prompt/templates.rs: Template rendering and validation are well implemented, but the test compilation problem is critical.
src-tauri/src/core/prompt/registry.rs: Registry is well-structured with comprehensive tests. No issues found.
src-tauri/src/core/prompt/providers.rs: Removed as part of the refactor. No regressions observed; make sure all functionality is covered by new sources. (File no longer exists; confirm that all dependents have been updated.)
src-tauri/src/core/prompt/signals.rs: Signal caching with cycle detection works, but concurrent calls may cause false cycle errors (low risk given current sequential build).
src-tauri/src/core/prompt/budget.rs: Budget calculation is sound, but eviction_order field is unused.
src-tauri/src/core/prompt/sources/mod.rs: Simple module declarations and re-exports. No issues.
src-tauri/src/core/prompt/inheritance.rs: Inheritance rules and tests are solid.
src-tauri/src/core/prompt/layer.rs: Layer definitions and ordering are correct; anchor validation tests cover cycles and cross-layer anchors.
src-tauri/src/core/prompt/section_source.rs: Core trait and data types are well-defined.
src-tauri/src/core/prompt/runtime_message.rs: Runtime message injection is straightforward.
src-tauri/src/core/prompt/mod.rs: Module reorganization done correctly with all new modules integrated.
src-tauri/src/core/prompt/error_codes.rs: Error code registry and consistency test are good.
src-tauri/src/core/prompt/build_context.rs: Context struct with helper constructors is functional.
src-tauri/src/core/prompt/snapshot_tests.rs: Comprehensive snapshot tests provide a safety net; depends on fixed_clock_for_test from clock module (not in batch). Ensure it exists. (rely on fixed_clock_for_test; verify it is available.)
src-tauri/src/core/prompt/redactor.rs: Simple redactor implementation, okay.
src-tauri/src/core/prompt/sources/profile_instructions.rs: Logic is correct; response_parts reassignment is redundant but harmless. The response style line is always appended, even if empty, which may produce an extra blank line.
src-tauri/src/core/prompt/sources/project_context.rs: Collects and truncates workspace instruction files correctly. Normalization strips blank lines, which may affect user content readability.
src-tauri/src/core/prompt/sources/sandbox_permissions.rs: Reads approval policy and writable roots from settings with robust fallbacks. Parsing logic defaults silently on malformed JSON.
src-tauri/src/core/prompt/sources/skills.rs: Loads skills from ExtensionsManager and renders them through template. Path handling for skill_file may need normalization in some environments.
... and 37 more file-level entries.

Inline Downgraded Items (processed but not inline)

None

Coverage Status

Target files: 57
Covered files: 57
Uncovered files: 0
No-patch/binary covered as file-level: 0
Findings with unknown confidence (N/A): 0

Uncovered list:

None

No-patch covered list:

None

Runtime/Budget

Rounds used: 2/4
Planned batches: 5
Executed batches: 5
Sub-agent runs: 9
Planner calls: 2
Reviewer calls: 11
Model calls: 13/64
Structured-output summary-only degradation: NO

github-actions

Automated PR review completed.

Findings kept: 7
Findings with unknown confidence: 0
Inline comments attempted: 5
Target files: 33
Covered files: 33
Uncovered files: 0
See the summary comment for detailed analysis and coverage details.

Sign in to view

+
+        // EmergencyFallback: if no sections were produced, inject hard-coded fallback
+        let fallback_used = bodies.is_empty();
+        if fallback_used {


Sign in to view

+
+/// Semantic ordering hint for sections within the same layer.
+/// Replaces the old bare `u16` with anchor-relative or absolute positioning.
+#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord)]


Sign in to view

+        let renderer = cx.renderer.as_ref();
+
+        for (spec, layer, body, warn, source_elapsed) in &kept {
+            let rendered = renderer.render_section(&spec.title, &body.markdown);


Sign in to view

+            Ok(v) => v,
+            Err(e) => {
+                return Ok(SectionOutcome::SoftFailed {
+                    code: "settings.approval_policy.load_failed",


Sign in to view

-        .filter(|(_, body)| !body.trim().is_empty())
-        .collect()
-}
+// Phase 2b: legacy string-parsing functions removed.


…late-based SubagentBodySource Phase 7 of the prompt-injection refactor: subagent identity, persona, and shell tooling guide are no longer hardcoded as separate strings in orchestrator or SubagentProfile::system_prompt(). They are now rendered by SubagentBodySource via the Composer from template files (explore.md, review.md) or from the user-provided system_prompt for custom subagents. Key changes: - Replace LegacyCustomSubagentBodySource with SubagentBodySource - Rename SectionId::CustomSubagentBody to SectionId::SubagentBody - Remove inline helper_shell_tooling_guide() and its test coverage - Add comprehensive tests for PromptBudget scaling and registry section lists - Bump registry schema version from 1 to 2

…lementations and add anchored ordering

…section enforcement Remove the compiler-inlined `EmergencyFallback` system that served as a last-resort prompt when all sections failed. This simplifies `Composer::build` by eliminating: - Per-surface fallback text files and `emergency_fallback_text()` lookup - Critical section lists that would upgrade `SoftFailed` to `FatalError` - `SectionAudit::fallback_used` flag and `SectionWarning::EmergencyFallback` - `SurfaceExtension::critical_sections()` method and its lint tests - Associated tracing metrics (`prompt.fallback.emergency_total`) The system trade-off (hardening against total prompt failure) added complexity and was never triggered in production. Removing it reduces maintenance burden and makes the prompt pipeline easier to evolve. Soft failures are already surfaced via warnings and logs.

…ation scaffolding

… source and add key validation linter

… template version validation

…irectory Move individual SectionSource implementations from flat module files into a dedicated `sources/` module directory. This improves code organization and maintainability by grouping related source implementations together and reducing the number of top-level modules. The refactoring includes: - Creating `sources/mod.rs` to re-export all source implementations - Moving ActiveGoalSource, ActivePlanSource, CompactionContractSource, and other sources into separate files within the new directory - Removing `template_sources.rs` which contained implementations now distributed to individual source files - Updating `registry.rs` and `mod.rs` imports to use the new module structure - Adding corresponding test file `source_tests.rs` for idempotency and determinism verification

…hot tests Replace hardcoded handoff prompt strings in `agent_run_summary.rs` with template files (`templates/handoff/with_plan.tpl.md` and `without_plan.tpl.md`) that include YAML front-matter for metadata. Remove the backward-compat `providers.rs` module and update `agent_session_tests.rs` to load template bodies directly. Add comprehensive snapshot tests using `insta` to capture the full composed prompt text for all prompt surfaces (MainAgent, Subagent, Compaction, Title). These snapshots serve as an audit trail against accidental prompt drift. Add `insta` as a dev dependency for snapshot assertions.

…es and add cache marker arbiter Replace generic `TemplateSource<F>` instantiations in `registry.rs` for the Role, BehavioralGuidelines, FinalResponseStructure, and ShellToolingGuide sections with first-class `SectionSource` implementations (`RoleSource`, `BehavioralGuidelinesSource`, `FinalResponseStructureSource`, `ShellToolingGuideSource`). Each new source performs explicit front-matter version validation and template rendering, improving error handling and testability. Rename `SectionId::SubagentBody` to `SectionId::CustomSubagentBody` to clarify its purpose (applies only to custom subagents, not built-in Explore/Review). Update all usages in `inheritance.rs`, `budget.rs`, and `registry.rs` accordingly. Introduce a `cache_arbiter` field in `AgentSessionSpec` and thread a `DefaultCacheMarkerArbiter` through `build_system_prompt` to enable per-request cache marker tracking for future LLM caching support. Change `PromptBudget::for_model` to accept a `ModelTarget` reference instead of a raw `usize`, deriving the context window from the target model. Adjust all affected tests and snapshot files to match the new output (minor whitespace/byte differences from the dedicated source renderers). Add a new snapshot test `snapshot_subagent_custom` to cover the custom subagent surface. These changes unify section construction, prepare the foundation for prompt caching, and eliminate inline generic template handling in the registry.

github-actions

Automated PR review completed.

Findings kept: 6
Findings with unknown confidence: 0
Inline comments attempted: 6
Target files: 54
Covered files: 24
Uncovered files: 30
See the summary comment for detailed analysis and coverage details.

Sign in to view

-            "- Respond in {language} unless the user explicitly asks for a different language."
-        ));
-    }
+pub(crate) async fn build_compact_summary_system_prompt(response_language: Option<&str>) -> String {


Sign in to view

    Ok(normalize_generated_title(&message.text_content()))
 }

+/// Build the Title surface system prompt via Composer (Phase 6).


Sign in to view


 [dev-dependencies]
 tempfile = "3"
+insta = { version = "1", features = ["yaml"] }


Sign in to view

 }

+/// Strip YAML front-matter and return the template body.
+fn strip_front_matter(tpl: &str) -> &str {


Sign in to view

    use crate::persistence::repo::provider_repo;

+    /// Strip YAML front-matter (delimited by ---) from a template string.
+    fn strip_front_matter(tpl: &str) -> &str {


Sign in to view

+            run_mode,
+            helper_profile: None,
+            custom_subagent_slug: None,
+            target_model: ModelTarget::AnthropicClaude {


…main_agent

github-actions

Automated PR review completed.

Findings kept: 20
Findings with unknown confidence: 0
Inline comments attempted: 12
Target files: 54
Covered files: 54
Uncovered files: 0
See the summary comment for detailed analysis and coverage details.

Sign in to view

+                    supports_cache_control: true,
+                },
+                renderer: Arc::new(MarkdownRenderer),
+                clock: Arc::new(FixedClock::new(chrono::Utc::now())),


Sign in to view

@@ -1,555 +0,0 @@
-use std::path::Path;


Sign in to view

+    use super::super::super::run_mode::RunMode;
+    use super::super::super::section_source::SectionSource;
+    use super::super::super::signals::SignalCache;
+    use super::super::{RunModeSource, SystemEnvironmentSource};


Sign in to view

+
+        let cx_plan = BuildCx {
+            run_mode: RunMode::Plan,
+            ..base_cx.clone()


Sign in to view

+/// The helper-specific tail (shell-tooling guide + helper-profile body)
+/// is appended after the composed prompt; full migration to template-backed
+/// sources is future work.
+async fn build_helper_system_prompt(


Sign in to view

+        })?;
+
+        if tmpl.version != self.spec_version {
+            return Err(FatalError::new(


Sign in to view

@@ -0,0 +1,68 @@
+use async_trait::async_trait;


Sign in to view

@@ -0,0 +1,64 @@
+use async_trait::async_trait;


Sign in to view


 [dev-dependencies]
 tempfile = "3"
+insta = { version = "1", features = ["yaml"] }


Sign in to view

+            })?;
+
+        // Cow wraps the const &'static str — clone if borrowed
+        let _ = Cow::Borrowed(rel_path);


…trator tests

github-actions

Automated PR review completed.

Findings kept: 7
Findings with unknown confidence: 1
Inline comments attempted: 5
Target files: 54
Covered files: 54
Uncovered files: 0
See the summary comment for detailed analysis and coverage details.

Sign in to view

-        _ => ProfileResponseStyle::Balanced,
-    }
-}
-


Sign in to view

@@ -0,0 +1,76 @@
+use async_trait::async_trait;


Sign in to view

@@ -0,0 +1,87 @@
+use async_trait::async_trait;


Sign in to view

+    }
+
+    async fn build(&self, cx: &BuildCx<'_>) -> Result<SectionOutcome, FatalError> {
+        let kind = cx_compaction_kind(cx);


Sign in to view

@@ -0,0 +1,132 @@
+use async_trait::async_trait;


…atform-independent Bump `@tauri-apps/api` from `^2` to `^2.11.0` in package.json and package-lock.json. Fix snapshot test portability by two changes: - Replace `entry.bytes` with `[snap]` in `format_audit_snapshot` to avoid byte count differences across platforms (e.g. 7 on macOS vs 6 on Linux for the same text). - Change the test workspace path from `/tmp/tiycode-snapshot-workspace` to `/tiycode-snap-workspace` to prevent collisions with the `TMPDIR` environment variable during path normalization. Update all affected `.snap` files accordingly.

github-actions

Automated PR review completed.

Findings kept: 33
Findings with unknown confidence: 0
Inline comments attempted: 11
Target files: 56
Covered files: 56
Uncovered files: 0
See the summary comment for detailed analysis and coverage details.

Sign in to view

+        };
+
+        // Check for cycle: if already in_flight, we have a dependency loop
+        if slot


Sign in to view

    }
 }

+/// Render a handoff template that includes plan markdown.


Sign in to view

+/// 3. The wall-clock date enters the LLM context only via this prepend; if
+///    a feature later needs server-authoritative time across the full message
+///    history, a `compaction_pinned` column on `messages` would be required.
+pub(crate) async fn inject_runtime_context(user_prompt: &str) -> String {


Sign in to view

+        let mut marker_count = 0;
+        // Iterate layers in increasing order (StablePrefix < SessionStable < RuntimeOverlay)
+        for (_layer, indices) in by_layer.iter() {
+            if marker_count >= 2 {


Sign in to view

-}
-
-#[cfg(test)]
-mod tests {


Sign in to view

    }
 }

+/// Shared formatting for title-generation language rule line.


Sign in to view

+            results.push((spec, layer, outcome, source_start.elapsed()));
+
+            // Hard overall budget cap; if exceeded mid-pipeline, stop building further sources
+            if start.elapsed() > self.exec_policy.overall_build_timeout {


Sign in to view

+    // We construct a minimal BuildCx to satisfy the trait signature.
+    let injector = CurrentDateInjector::new(Arc::new(SystemClock));
+    // Build a dummy BuildCx — CurrentDateInjector doesn't read it.
+    let placeholder_pool = match sqlx::SqlitePool::connect_lazy("sqlite::memory:") {


Sign in to view

+                let template = include_str!("../templates/subagent/explore.md");
+                let (_tmpl, body) =
+                    super::super::templates::parse_front_matter(template).map_err(|e| {
+                        FatalError::new("template.parse", format!("subagent/explore.md: {e}"))


Sign in to view

+                    }
+                }
+
+                // Fallback response_language


github-actions

Automated PR review completed.

Findings kept: 13
Findings with unknown confidence: 0
Inline comments attempted: 12
Target files: 56
Covered files: 56
Uncovered files: 0
See the summary comment for detailed analysis and coverage details.

Sign in to view

+    };
+    use std::sync::Arc;
+
+    let placeholder_pool =


Sign in to view

+    };
+    use std::sync::Arc;
+
+    let placeholder_pool =


Sign in to view

+    // We construct a minimal BuildCx to satisfy the trait signature.
+    let injector = CurrentDateInjector::new(Arc::new(SystemClock));
+    // Build a dummy BuildCx — CurrentDateInjector doesn't read it.
+    let placeholder_pool = match sqlx::SqlitePool::connect_lazy("sqlite::memory:") {


Sign in to view

+
+/// Global cache marker arbiter for a single LLM request.
+/// Enforces the ≤4 breakpoint limit across system prompt + messages.
+pub trait CacheMarkerArbiter: Send + Sync + std::fmt::Debug {


Sign in to view

+                // Find the anchor target's position within the same layer
+                let target_spec = all_specs.iter().find(|s| &s.id == target_id);
+                let target_order = match target_spec {
+                    Some(ts) => Self::resolve_anchored_order(ts, all_specs),


Sign in to view

+    // We construct a minimal BuildCx to satisfy the trait signature.
+    let injector = CurrentDateInjector::new(Arc::new(SystemClock));
+    // Build a dummy BuildCx — CurrentDateInjector doesn't read it.
+    let placeholder_pool = match sqlx::SqlitePool::connect_lazy("sqlite::memory:") {


Sign in to view

+        body: &SectionBody,
+        budget: &PromptBudget,
+    ) -> (SectionBody, Option<SectionWarning>) {
+        let limit = spec.max_chars.unwrap_or(budget.per_section_default_chars);


Sign in to view

+/// All registered error codes for startup lint test.
+pub const ALL_ERROR_CODES: &[&str] = &[
+    codes::TEMPLATE_NOT_FOUND,
+    codes::TEMPLATE_MISSING_KEY,


Sign in to view

+}
+
+/// Convenience constructor for test clocks.
+pub fn fixed_clock_for_test() -> Arc<FixedClock> {


Sign in to view

+            Some(meta)
+        });
+
+        let _plan = match active_plan {


github-actions

Automated PR review completed.

Findings kept: 11
Findings with unknown confidence: 0
Inline comments attempted: 7
Target files: 56
Covered files: 56
Uncovered files: 0
See the summary comment for detailed analysis and coverage details.

Sign in to view

+        }
+    }
+
+    let mut result = tpl.to_string();


Sign in to view

+
+    /// Resolve the order of a section within its layer, handling Anchored positions.
+    /// Returns (base_order, anchor_target_order) for topological comparison.
+    fn resolve_anchored_order(spec: &SectionSpec, all_specs: &[&SectionSpec]) -> u32 {


Sign in to view

+
+    #[tokio::test]
+    async fn snapshot_compaction_compact() {
+        snap_surface(


Sign in to view

+        let cx = BuildCx {
+            pool: &pool,
+            workspace_path: workspace,
+            thread_id: None, // No thread → Ephemeral sections will Skip


Sign in to view

+    };
+    use std::sync::Arc;
+
+    let placeholder_pool =


Sign in to view

+        )
+    }
+
+    fn apply_total_budget<'a>(


Sign in to view

+    /// Construct a minimal BuildCx for testing sources that don't need DB access.
+    async fn test_cx() -> BuildCx<'static> {
+        let pool = sqlx::SqlitePool::connect_lazy("sqlite::memory:").unwrap();
+        let pool_ref = Box::leak(Box::new(pool));


… remove TMPDIR from writable roots

github-actions

Automated PR review completed.

Findings kept: 7
Findings with unknown confidence: 0
Inline comments attempted: 5
Target files: 56
Covered files: 56
Uncovered files: 0
See the summary comment for detailed analysis and coverage details.

Sign in to view


-    let system_prompt = build_system_prompt(pool, &raw_plan, workspace_path, run_mode).await?;
-    let system_prompt = inject_goal_context(pool, thread_id, system_prompt).await?;
+    let composed_prompt =


Sign in to view

 }

+/// Render a handoff template that includes plan markdown.
+fn render_handoff_template(


Sign in to view

+    // We construct a minimal BuildCx to satisfy the trait signature.
+    let injector = CurrentDateInjector::new(Arc::new(SystemClock));
+    // Build a dummy BuildCx — CurrentDateInjector doesn't read it.
+    let placeholder_pool = match sqlx::SqlitePool::connect_lazy("sqlite::memory:") {


Sign in to view

        );
    }
+
+    // ── build_helper_system_prompt（Phase 7 Composer pipeline）──


Sign in to view

+/// A fatal error that causes the entire prompt build to fail.
+/// Rare; reserved for truly unrecoverable errors (template load failure, SQLite fatal disconnect).
+#[derive(Debug)]
+pub struct FatalError {


When the Skills section is removed from the prompt surface, the leading newline of the matched "\n## " pattern was included in the remaining text, producing a double blank line between sections. This caused snapshot mismatches on CI runners where no skills are installed. Skip the leading newline so the result joins `before` (which already ends with the layer separator) directly to the next section header, yielding exactly one blank line—matching output on machines without the Skills section.

github-actions

Automated PR review completed.

Findings kept: 27
Findings with unknown confidence: 0
Inline comments attempted: 11
Target files: 56
Covered files: 52
Uncovered files: 4
See the summary comment for detailed analysis and coverage details.

Sign in to view

-            "- Respond in {language} unless the user explicitly asks for a different language."
-        ));
-    }
+pub(crate) async fn build_compact_summary_system_prompt(response_language: Option<&str>) -> String {


Sign in to view

        );
    }
+
+    #[tokio::test]


Sign in to view

+    }
+}
+
+impl CacheMarkerArbiter for DefaultCacheMarkerArbiter {


Sign in to view

+        )
+    }
+
+    async fn build_message(&self, _cx: &BuildCx<'_>) -> Option<RuntimeMessage> {


Sign in to view

+    use crate::persistence::init_database;
+
+    /// Build a snapshot for a given surface using a fresh temp DB.
+    async fn snap_surface(surface: PromptSurface, snapshot_name: &str) {


Sign in to view

+    /// Construct a minimal BuildCx for testing sources that don't need DB access.
+    async fn test_cx() -> BuildCx<'static> {
+        let pool = sqlx::SqlitePool::connect_lazy("sqlite::memory:").unwrap();
+        let pool_ref = Box::leak(Box::new(pool));


Sign in to view

+
+impl SurfacePattern {
+    /// Check whether this pattern matches a given surface.
+    pub fn matches(&self, surface: &PromptSurface) -> bool {


Sign in to view

+        per_section_overrides.insert(SectionId::CustomSubagentBody, total_chars / 4);
+
+        // Compaction / Title surfaces use tighter budgets
+        let total_chars = match surface {


Sign in to view

+
+impl CacheMarkerArbiter for DefaultCacheMarkerArbiter {
+    fn record_system_markers(&self, markers: &[CacheMarkerSlot]) {
+        let mut sys = self.system_markers.lock().unwrap();


Sign in to view

+        let raw = load_template(TEMPLATE_REL_PATH, TEMPLATE_EMBEDDED);
+        let (tmpl, body) = parse_front_matter(&raw).map_err(|e| {
+            FatalError::new(
+                super::super::error_codes::codes::TEMPLATE_NOT_FOUND,


…nd fix unused variable warnings

…uidelines - Add operating boundaries and safety red lines to role definition - Reorganize behavioral guidelines into categorized sections - Clarify run mode guidance for default and plan modes - Bump Role, BehavioralGuidelines, and RunMode versions to 2 - Update snapshot tests and source tests for new versions

- Remove `shell` template variable from ShellToolingGuide; reference System Environment section instead of injecting runtime value - Replace Chinese example fragment with English in FinalResponseStructure template - Add honesty and coverage guidelines to explore and review subagent templates - Add example output section to merge compaction template - Use generic path placeholder in explore template examples - Bump affected prompt section versions from 1 to 2

github-actions

Automated PR review completed.

Findings kept: 13
Findings with unknown confidence: 0
Inline comments attempted: 11
Target files: 56
Covered files: 56
Uncovered files: 0
See the summary comment for detailed analysis and coverage details.

Sign in to view

    SUMMARY_HISTORY_MIN_CHARS,
 };
 use super::agent_run_title::collapse_whitespace;
+use crate::core::prompt::templates::strip_front_matter;


Sign in to view

 }

+/// Build the Title surface system prompt via Composer (Phase 6).
+pub(crate) async fn build_title_system_prompt() -> String {


Sign in to view

-}
-
-fn build_helper_system_prompt(
-    parent_system_prompt: &str,


Sign in to view

    }
 }

+/// Shared formatting for title-generation language rule line.


Sign in to view

+/// 3. The wall-clock date enters the LLM context only via this prepend; if
+///    a feature later needs server-authoritative time across the full message
+///    history, a `compaction_pinned` column on `messages` would be required.
+pub(crate) async fn inject_runtime_context(user_prompt: &str) -> String {


Sign in to view

+            Some(meta)
+        });
+
+        let _plan = match active_plan {


Sign in to view

@@ -0,0 +1,67 @@
+use async_trait::async_trait;


Sign in to view

+    /// Construct a minimal BuildCx for testing sources that don't need DB access.
+    async fn test_cx() -> BuildCx<'static> {
+        let pool = sqlx::SqlitePool::connect_lazy("sqlite::memory:").unwrap();
+        let pool_ref = Box::leak(Box::new(pool));


Sign in to view

    use crate::core::subagent::SubagentProfile;
    use std::sync::Arc;

-    #[test]


Sign in to view


 [dev-dependencies]
 tempfile = "3"
+insta = { version = "1", features = ["yaml"] }


github-actions

Automated PR review completed.

Findings kept: 23
Findings with unknown confidence: 0
Inline comments attempted: 18
Target files: 56
Covered files: 55
Uncovered files: 1
See the summary comment for detailed analysis and coverage details.

Sign in to view

+        surfaces: SurfaceMatcher::Any(vec![SurfacePattern::AnyMainAgent]),
+        version: 1,
+        max_chars: None,
+        criticality: SectionCriticality::NonCritical,


Sign in to view

@@ -0,0 +1,155 @@
+use async_trait::async_trait;


Sign in to view

@@ -0,0 +1,95 @@
+use async_trait::async_trait;


Sign in to view

+/// Template-backed SectionSource for the Skills section.
+/// Loads skills from the ExtensionsManager and renders via the skills_usage.md template.
+/// Replaces LegacySkillsSource (which delegates to the old SkillsProvider).
+pub struct SkillsSource;


Sign in to view

+    use std::sync::Arc;
+
+    /// Construct a minimal BuildCx for testing sources that don't need DB access.
+    async fn test_cx() -> BuildCx<'static> {


Sign in to view

+            .collect::<Vec<_>>()
+            .join("\n");
+
+        let raw = load_template(TEMPLATE_REL_PATH, TEMPLATE_EMBEDDED);


Sign in to view

        );
    }
+
+    // ── build_helper_system_prompt（Phase 7 Composer pipeline）──


Sign in to view


 [dev-dependencies]
 tempfile = "3"
+insta = { version = "1", features = ["yaml"] }


Sign in to view

+            .in_flight
+            .swap(true, std::sync::atomic::Ordering::SeqCst)
+        {
+            // Already in flight → cycle detected


Sign in to view

+    fn pattern(&self) -> SurfacePattern {
+        match self {
+            PromptSurface::MainAgent { run_mode } => SurfacePattern::MainAgent(*run_mode),
+            PromptSurface::SubagentExplore { .. } => SurfacePattern::AnySubagent,


github-actions

Automated PR review completed.

Findings kept: 10
Findings with unknown confidence: 0
Inline comments attempted: 10
Target files: 57
Covered files: 57
Uncovered files: 0
See the summary comment for detailed analysis and coverage details.

github-actions · 2026-06-06T10:37:43Z

+    /// stale declarations after template edits.
+    #[test]
+    fn templates_have_no_undeclared_keys() {
+        let template_dir = super::template_root();


[CRITICAL] Template key validation test fails to compile in non-debug test builds

The test that validates template key declarations will fail to compile in any build where debug_assertions is not enabled, making it impossible to run tests in release mode or in CI configurations that disable debug assertions.

Suggestion: Either conditionally compile the test only when debug_assertions is enabled (#[cfg(debug_assertions)]), or provide a test‑specific template_root implementation that works in all test configurations (e.g., #[cfg(any(test, debug_assertions))]).

Risk: This prevents running the test suite in some CI pipelines, concealing template key mismatch regressions. Could lead to runtime errors if keys are misdeclared.

Confidence: 0.95

[From SubAgent: general]

github-actions · 2026-06-06T10:37:43Z

+        // Record marker slots via the arbiter so the message layer can coordinate
+        // quota (≤4 Anthropic breakpoints across system prompt + runtime messages).
+        if let Some(ref arbiter) = self.cache_arbiter {
+            let mut byte_offset = 0usize;


[HIGH] Cache marker byte offsets may be incorrect due to missing layer separators

The byte offsets recorded for cache markers are computed solely on block text lengths, ignoring layer separators that will be inserted between sections in the final prompt text, leading to incorrect positions.

Suggestion: Compute offsets using the same logic as the final text assembly: either insert separators between block texts when iterating, or calculate offsets after joining.

Risk: If these offsets are used for cache-control breakpoints or other markers that depend on exact byte positions in the transmitted system prompt, the offsets will be misaligned, potentially breaking cache control hints and wrecking cache performance.

Confidence: 0.85

[From SubAgent: general]

github-actions · 2026-06-06T10:37:43Z

-    fn compact_summary_system_prompt_uses_response_language_when_present() {
-        let prompt = build_compact_summary_system_prompt(Some(" 简体中文 "));
+    #[tokio::test]
+    async fn compact_summary_system_prompt_uses_response_language_when_present() {


[MEDIUM] Response language whitespace trimming removed in compaction prompts

The response language string in compaction prompts is no longer normalized, which can cause malformed language instructions when the language value contains whitespace.

Suggestion: Reintroduce normalization (trim and empty-check) before passing response_language to the Composer, or ensure the Composer's BuildCx or downstream injection normalizes the value.

Risk: If the profile response language includes padding whitespace, the resulting system prompt may contain malformed lines like Respond in 简体中文 unless..., confusing the model.

Confidence: 0.85

[From SubAgent: general]

github-actions · 2026-06-06T10:37:43Z

-            "- Respond in {language} unless the user explicitly asks for a different language."
-        ));
-    }
+pub(crate) async fn build_compact_summary_system_prompt(response_language: Option<&str>) -> String {


[MEDIUM] Language normalization edge case not covered for summary prompts

The removal of explicit language trimming may cause Composer to fail to match the language rule for strings with extranous whitespace.

Suggestion: Add a test with a language value containing leading/trailing spaces to ensure the Composer (or ProfileInstructionsSource) normalizes it correctly.

Risk: Agent might produce summaries in wrong language if the language tag is misidentified.

Confidence: 0.90

[From SubAgent: testing]

github-actions · 2026-06-06T10:37:43Z

-    let system_prompt = inject_goal_context(pool, thread_id, system_prompt).await?;
+    let composed_prompt =
+        build_system_prompt(pool, &raw_plan, workspace_path, run_mode, thread_id).await?;
+    let system_prompt = composed_prompt.text.clone();


[MEDIUM] Cache arbiter path not verified in tests

The new prompt caching feature has no test coverage; no test uses a non-None cache arbiter or verifies that ephemeral markers appear in the system prompt.

Suggestion: Add an integration test that builds a session with caching enabled and asserts the system prompt contains appropriate cache markers for stable sections.

Risk: Cache marker placement errors could lead to poor cache utilization or incorrect context window injection on Anthropic models.

Confidence: 0.85

[From SubAgent: testing]

github-actions · 2026-06-06T10:37:43Z

+    pub per_section_overrides: BTreeMap<SectionId, usize>,
+
+    /// Eviction order: layers are removed in this order when total budget is exceeded.
+    /// Default: [Ephemeral, RuntimeOverlay, SessionStable, StablePrefix]


[MEDIUM] Unused eviction_order field in PromptBudget

The budget's eviction_order is never consulted during prompt composition, making it dead code and potentially misleading.

Suggestion: Remove the field and its doc comment, or use it in the composer to determine which sections to evict when total budget is exceeded, ensuring the declared eviction order is honored.

Risk: Future developers might assume the declared order is respected, but since it's not tested or used, eviction behavior may silently differ from expectations if the sorting order changes.

Confidence: 0.90

[From SubAgent: general]

github-actions · 2026-06-06T10:37:43Z

+
+impl SectionRenderer for XmlRenderer {
+    fn render_section(&self, title: &str, body: &str) -> String {
+        format!("<section name=\"{}\">\n{}\n</section>", title, body)


[MEDIUM] XmlRenderer does not escape XML special characters

The XmlRenderer produces raw XML without escaping special characters, which may produce malformed prompts if the title or body contain XML-sensitive characters.

Suggestion: Escape XML special characters (e.g., <, >, &, ") in render_section for the XmlRenderer, or wrap body content in CDATA sections with careful handling.

Risk: If prompt content includes XML special characters (common in code snippets, config values, or shell output), the resulting prompt will be malformed and may cause API errors or degraded section recall.

Confidence: 0.90

[From SubAgent: general]

github-actions · 2026-06-06T10:37:43Z

+    /// Construct a minimal BuildCx for testing sources that don't need DB access.
+    async fn test_cx() -> BuildCx<'static> {
+        let pool = sqlx::SqlitePool::connect_lazy("sqlite::memory:").unwrap();
+        let pool_ref = Box::leak(Box::new(pool));


[MEDIUM] Test helper leaks SqlitePool via Box::leak

The test_cx function leaks a SqlitePool, which may cause resource exhaustion and interfere with other tests.

Suggestion: Use a lazy_static or Arc-based pool with a proper lifecycle, or adjust BuildCx to not require 'static.

Risk: Memory and connection leaks can cause test runner slowdowns and flaky tests due to resource contention.

Confidence: 0.90

[From SubAgent: general]

github-actions · 2026-06-06T10:37:43Z

            "Error: boom"
        );
    }
+


[MEDIUM] Replaced helper system prompt tests lack inheritance verification

The Composer-based helper prompt no longer verifies the correct inheritance of required sections; regression risk if surface matchers fail to include them.

Suggestion: Add assertions on the helper prompt content for essential inherited sections (e.g., Profile Instructions, System Environment) to ensure the Composer’s section matchers are correct.

Risk: Helper subagents could miss critical context like workspace instructions, leading to bad exploration/reviews.

Confidence: 0.90

[From SubAgent: testing]

github-actions · 2026-06-06T10:37:43Z

    pub model_plan: ResolvedRuntimeModelPlan,
    pub initial_prompt: Option<String>,
    pub initial_context_calibration: ContextTokenCalibration,
+    /// Global cache marker arbiter for the request lifecycle.


[LOW] Unused cache_arbiter field in AgentSessionSpec

A new field is added and populated but never consumed, which may confuse future readers and suggests either an incomplete feature or dead code.

Suggestion: Either implement the intended usage (resetting the arbiter after LLM calls as mentioned in the comment) or remove the field until the feature is fully integrated.

Risk: If left unused, the field could mask the need for cache marker reset logic and cause eventual prompt-cache pollution if integrated incorrectly later.

Confidence: 0.95

[From SubAgent: general]

jorben added 5 commits June 5, 2026 11:28

docs: 📝 add prompt injection refactoring design document

58cb76a

docs(prompt): 📝 add comprehensive refactoring design for prompt injec…

f4ed219

…tion

docs(prompt-injection-refactor): 📝 add sections on source exec model,…

19a4d78

… cache marker arbitration, surface extension traits, schema version rules, and template front-matter

docs(prompt-refactor): 📝 update design with new sections and details

bf7fcf7

github-actions Bot reviewed Jun 5, 2026

View reviewed changes

jorben added 12 commits June 5, 2026 21:11

feat(prompt): ✨ migrate legacy section sources to template-backed imp…

ff6197d

…lementations and add anchored ordering

refactor(prompt): ♻️ remove PromptFeatureSet module and all references

492e795

refactor(prompt): ♻️ remove legacy compat sources and dual-track migr…

f874f57

…ation scaffolding

refactor(profile): ♻️ migrate profile instructions to template-backed…

622d6d4

… source and add key validation linter

refactor(prompt): ♻️ remove legacy prompt build modules and integrate…

bf02fd6

… template version validation

feat(prompt): ✨ enable Anthropic prompt caching with arbiter

173e446

docs: 📝 update WeChat group QR code URL

94277f1

github-actions Bot reviewed Jun 6, 2026

View reviewed changes

jorben added 2 commits June 6, 2026 12:19

style(core): 🎨 format Rust imports

7158a08

refactor(core): ♻️ relocate strip_front_matter and remove unused for_…

bc026de

…main_agent

github-actions Bot reviewed Jun 6, 2026

View reviewed changes

test: ✅ add cross-platform snapshot normalization and subagent orches…

ef70c26

…trator tests

github-actions Bot reviewed Jun 6, 2026

View reviewed changes

jorben added 2 commits June 6, 2026 15:16

test(prompt): ✅ strip Skills section from snapshot tests

617c7a8

style(snapshot-tests): 🎨 remove unused mutable variable

7b61e4c

github-actions Bot reviewed Jun 6, 2026

View reviewed changes

refactor(prompt): ♻️ update snapshot tests to redact token counts and…

759edd8

… remove TMPDIR from writable roots

github-actions Bot reviewed Jun 6, 2026

View reviewed changes

jorben added 5 commits June 6, 2026 16:10

refactor(prompt): ♻️ remove deprecated grayscale rollout references a…

e45fc25

…nd fix unused variable warnings

docs(prompt): 📝 replace detailed refactoring plan with concise README

f141543

test(agent): ✅ update prompt assertion strings to match revised prompts

c81850c

github-actions Bot reviewed Jun 6, 2026

View reviewed changes

test(agent): ✅ update system prompt assertions to match revised wording

7bb82d6

github-actions Bot reviewed Jun 6, 2026

View reviewed changes

jorben merged commit 96e7710 into master Jun 6, 2026
4 checks passed

jorben deleted the refact/prompt-injection-refactor branch June 6, 2026 10:47

Conversation

jorben commented Jun 5, 2026

Summary

Changes

Core Architecture (new modules)

Feature Modules

Template Files

Caller Updates

Documentation

Motivation

Testing

Breaking Changes

Related Issues

Checklist

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Code Review Summary

Overall Assessment

Major Findings by Severity

Actionable Suggestions

Potential Risks

Test Suggestions

File-Level Coverage Notes

Inline Downgraded Items (processed but not inline)

Coverage Status

Runtime/Budget

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

github-actions Bot left a comment

github-actions Bot commented Jun 5, 2026 •

edited

Loading