Skip to content

Comments

feat: SDK optimization — lifecycle, stall detection, communication, performance#187

Open
michael-wojcik wants to merge 13 commits intomainfrom
feat/sdk-optimization
Open

feat: SDK optimization — lifecycle, stall detection, communication, performance#187
michael-wojcik wants to merge 13 commits intomainfrom
feat/sdk-optimization

Conversation

@michael-wojcik
Copy link
Collaborator

@michael-wojcik michael-wojcik commented Feb 19, 2026

Summary

Full audit of PACT plugin's Task SDK and Agent Teams SDK usage, identifying and implementing optimizations across lifecycle management, stall detection, communication, cost, and performance.

  • Add TeamDelete to wrap-up and session_end cleanup (fixes resource leak)
  • Use last_assistant_message in validate_handoff (less fragile than transcript parsing)
  • Enrich worktree_guard error with corrected path suggestion (faster agent retry)
  • Add TeammateIdle hook for stall detection and idle cleanup (closes critical gap)
  • Make non-critical hooks async (reduced latency)
  • Use broadcast for HALT algedonic signals (team-wide emergency stop)
  • Add Terminate outcome to imPACT with TaskStop (handle unrecoverable agents)
  • Set model: haiku for memory-agent (cost optimization for mechanical tasks)
  • Clarify pact-memory vs SDK memory scope in agent documentation

Test plan

  • 1083 tests pass (160 new + 923 existing), zero regressions
  • New test_hooks_json.py validates hooks.json structure and async flags
  • TeammateIdle hook: 54 tests (36 coder + 18 test engineer) covering stall detection, idle cleanup, concurrent tracking
  • validate_handoff: 31 tests covering last_assistant_message preference and fallback
  • worktree_guard: 20 tests including path suggestion edge cases
  • session_end cleanup: 16 tests with mocked filesystem
  • Verify TeammateIdle hook fires on agent idle events
  • Verify broadcast HALT stops all running teammates

Closes design items #1-8, #10, #11, #12 from docs/plans/2026-02-19-sdk-optimization-design.md
Deferred items tracked in #186

Add team cleanup section to wrap-up command (shutdown teammates,
then TeamDelete). Add best-effort cleanup_stale_teams() to
session_end.py as safety net for sessions ending without wrap-up.
Prefer last_assistant_message field over transcript parsing in
SubagentStop hook, with fallback for older SDK versions. Add 16
new tests covering both paths.
When blocking edits outside the worktree, include "Did you mean:
{corrected_path}" in the error message to help agents retry faster.
Add 3 new tests for path suggestion logic.
New hook with two responsibilities:
- Stall detection: alerts orchestrator when agent goes idle with
  in_progress task (enables automated imPACT triggering)
- Idle cleanup: tracks consecutive idle events, suggests shutdown
  at 3, instructs orchestrator to send shutdown_request at 5

Also marks session_end, file_size_check, and file_tracker hooks
as async for reduced latency on non-critical paths.

Includes 36 new tests.
- HALT algedonic signals now use SendMessage broadcast to stop all
  teammates simultaneously instead of manual relay
- Add Terminate as fifth imPACT outcome using TaskStop for
  unrecoverable agents (infinite loops, context exhaustion)
- Set model: haiku for pact-memory-agent (purely mechanical tasks)
- Clarify in all agent skill tables and pact-agent-teams SKILL.md
  that pact-memory is for project-wide institutional knowledge only,
  not agent-level domain expertise (handled by SDK memory frontmatter)
Edge cases and error paths across all modified modules:
- session_end cleanup_stale_teams: 16 tests (filesystem mocking)
- teammate_idle stall/cleanup interaction: 18 tests
- validate_handoff last_assistant_message: 15 tests
- worktree_guard path suggestion: 17 tests
- hooks.json structural validation: 13 tests (new file)

1079 total tests pass, zero regressions.
- Fix task ID string comparison in teammate_idle.py (lexicographic
  "3" > "20" bug) — use int() with fallback for non-numeric IDs
- Scope cleanup_stale_teams to current session's team only via
  CLAUDE_CODE_TEAM_NAME env var (prevents cross-session deletion)
- Fix TOCTOU in write_idle_counts — acquire lock before truncating

1078 tests pass.
- Reset idle count to 0 when teammate is assigned new work (detects
  task ID change via structured idle_counts.json format)
- Validate file belongs to same project root before suggesting
  worktree path (prevents misleading suggestions from nested repos)
- Require project marker at common ancestor in fallback path
  suggestion (prevents false matches across unrelated projects)
- Test summary truncation >80 chars in session_end snapshot
- Validate pipe-separated matcher syntax in hooks.json
- Validate SubagentStart matcher covers all 11 agent types
  (auto-detects new agents from agents/ directory)

1083 tests pass.
Add _atomic_update_idle_counts() that locks the file (flock) around the
full read-modify-write cycle, preventing concurrent TeammateIdle events
from losing increments. Add top-level try/except to main() for defensive
error handling matching session_end.py pattern.
- Simplify redundant condition in worktree_guard _find_project_root
- Add explicit tasks_dir param to cleanup_stale_teams for testability
- Document snapshot-before-cleanup ordering dependency in session_end
- Add 2 legacy idle count migration tests (int-to-dict format)
- Clarify TaskStop as force-stop in imPACT and pact-workflows protocols
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements comprehensive optimizations to PACT's Task SDK and Agent Teams SDK usage, focusing on lifecycle management, stall detection, communication efficiency, and cost optimization. The changes address critical gaps in resource cleanup and agent monitoring while improving error messages and reducing latency.

Changes:

  • Added TeammateIdle hook for stall detection and idle cleanup (closes critical monitoring gap)
  • Enhanced worktree_guard with corrected path suggestions for faster agent retry
  • Made non-critical hooks async (session_end, file_size_check, file_tracker) to reduce latency
  • Added TeamDelete cleanup to wrap-up and session_end for proper resource cleanup
  • Introduced Terminate outcome to imPACT for handling unrecoverable agents
  • Set model: haiku for memory-agent (cost optimization for mechanical tasks)
  • Clarified pact-memory vs agent-level SDK memory scope in documentation
  • Added broadcast support for HALT algedonic signals (team-wide emergency stop)
  • Updated validate_handoff to prefer last_assistant_message over transcript (less fragile)

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pact-plugin/hooks/teammate_idle.py New hook implementing stall detection and idle cleanup with atomic file updates
pact-plugin/hooks/session_end.py Added cleanup_stale_teams to remove team/task directories on session end
pact-plugin/hooks/worktree_guard.py Added _suggest_worktree_path for corrected path suggestions in error messages
pact-plugin/hooks/validate_handoff.py Updated to prefer last_assistant_message over transcript (SDK v2.1.47+)
pact-plugin/hooks/hooks.json Added TeammateIdle entry, marked non-critical hooks as async
pact-plugin/tests/test_teammate_idle.py 819 lines of comprehensive tests for stall detection and idle cleanup
pact-plugin/tests/test_validate_handoff.py 437 lines testing last_assistant_message preference and validation
pact-plugin/tests/test_worktree_guard.py Added tests for path suggestion logic and edge cases
pact-plugin/tests/test_session_end.py Added 164 lines testing cleanup_stale_teams functionality
pact-plugin/tests/test_hooks_json.py 298 lines validating hooks.json structure and async configuration
pact-plugin/protocols/pact-workflows.md Added Terminate outcome to imPACT section
pact-plugin/protocols/algedonic.md Added broadcast HALT handling instructions
pact-plugin/commands/wrap-up.md Added Team Cleanup section with TeamDelete
pact-plugin/commands/orchestrate.md Added HALT broadcast handling
pact-plugin/commands/imPACT.md Added Terminate outcome with TaskStop details
pact-plugin/skills/pact-agent-teams/SKILL.md Clarified pact-memory vs SDK memory scopes
pact-plugin/agents/*.md Updated memory scope clarifications across 10 agent files
pact-plugin/agents/pact-memory-agent.md Added model: haiku for cost optimization
Version files Bumped version from 3.4.2 to 3.5.0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants