refactor: make agent loop support parallel and update docs#2503
refactor: make agent loop support parallel and update docs#2503yinwm merged 1 commit intosipeed:mainfrom
Conversation
eba955a to
c525a2e
Compare
9bd89cb to
5c2cfb5
Compare
yinwm
left a comment
There was a problem hiding this comment.
Thanks for the thorough refactor! The worker pool architecture is clean and well-documented. However, I found two critical issues with the shared placeholderTurnState singleton that need to be addressed before merging:
1. Worker placeholder cleanup can delete another worker's placeholder (breaks session serialization)
In loop.go:546-553, the safety-net defer checks ts.turnID == "pending" to detect a stale placeholder. Since placeholderTurnState is a package-level singleton shared across all sessions, Worker A's cleanup defer can accidentally delete Worker B's freshly-stored placeholder for the same session:
Worker A finishes → clearActiveTurn deletes session key
Run() main loop → LoadOrStore stores new placeholder → spawns Worker B
Worker A's cleanup defer → Load finds placeholder (Worker B's) → deletes it
Run() main loop → LoadOrStore succeeds again → spawns Worker C
→ Worker B and Worker C run concurrently for the same session ❌
Fix: Create a unique placeholder instance per LoadOrStore call (e.g., &turnState{turnID: "pending-" + sessionKey}), or use a unique ID to distinguish "my placeholder" from "someone else's placeholder" in the cleanup defer.
2. HardAbort/InterruptHard can mutate the shared placeholder singleton
HardAbort() (steering.go:489) does a type assertion tsInterface.(*turnState) which succeeds for placeholderTurnState since it's also *turnState. This calls ts.Finish(true) which permanently marks the global singleton as finished, affecting all future sessions that use it. Similarly, InterruptHard() via getAnyActiveTurnState() can call requestHardAbort() on the placeholder.
Fix: Add a guard in HardAbort and InterruptHard:
if ts.turnID == "pending" {
return fmt.Errorf("turn is still initializing for session %s", sessionKey)
}Additional non-blocking suggestions
- Continue() placeholder handling: In
steering.go:355-357,GetActiveTurnBySessionreturns non-nil for placeholder (turnID="pending"), causingContinue()to return an error instead of gracefully yielding. This can prematurely break the steering drain loop inrunTurnWithSteering. - sentTargets memory leak:
message.go:68—ResetSentInRoundtruncates the slice but neverdeletes the map key. Over time with many unique sessions, the map grows unbounded. - Dead code:
drainBusToSteeringis no longer called fromRun()(only referenced insteering_test.go).
These issues don't manifest when max_parallel_turns=1 (default), but will break session serialization when parallelism is enabled.
all done. |
yinwm
left a comment
There was a problem hiding this comment.
LGTM. Both CRITICAL issues from the first round are properly fixed:
- Placeholder is now a unique per-claim instance (makePendingTurnID with sequence number)
- HardAbort guards against pending turns with prefix check
- sentTargets memory leak fixed with delete()
- Dead code (drainBusToSteering, requeueInboundMessage) cleaned up
The extra cleanup (vision retry inline, code reorganization) is a nice bonus.
📝 Description
Base on PR#2481, please review after the PR was merged.
Refactored the AgentLoop in /Users/billy/github/picoclaw/pkg/agent/loop.go to support parallel processing of user messages with correct response routing. Here's what was implemented:
🗣️ Type of Change
🤖 AI Code Generation
🔗 Related Issue
📚 Technical Context (Skip for Docs)
🧪 Test Environment
📸 Evidence (Optional)
Click to view Logs/Screenshots
☑️ Checklist