Prerequisites
Problem Description
The task system (task_create / task_update) currently allows two things that undermine task state integrity in real workflows:
- No deduplication on creation — the same agent can create multiple near-identical tasks with the same subject.
- No completion proof on status transition — any task can be marked
completed without evidence that work was actually done.
Concrete failure modes
Duplicate task storms
task_create always generates a fresh T-{uuid} id with no subject or content matching. During long sessions, the model creates multiple tasks for the same logical work item. The task list inflates with duplicates, and the model reports progress on tasks that overlap.
Source: src/tools/task/task-create.ts — no deduplication logic.
Unearned completion
task_update sets status: "completed" via direct assignment with no validation:
if (validatedArgs.status !== undefined) {
task.status = validatedArgs.status; // Direct, no guard
}
No check that blockedBy tasks are resolved. No check that any artifact was produced. No check that the described work was performed. The model can mark a task complete the instant it is created.
Source: src/tools/task/task-update.ts lines 105-107.
Existing completion proof patterns are flow-specific, not general
The codebase already has completion proof mechanisms, but only in narrow contexts:
- Ralph-loop / ultrawork (
src/hooks/ralph-loop/pending-verification-handler.ts): Requires Oracle agent to emit <promise>VERIFIED</promise> before accepting completion.
- Atlas boulder sessions (
src/hooks/atlas/verification-reminders.ts): Requires editing a plan file checkbox and re-reading to verify the count changed. "Your completion is NOT tracked until the checkbox is marked."
These patterns work well within their scope but do not protect the general task_update flow.
Why This Matters
For any multi-step workflow that relies on task state (audits, migrations, long refactors), task state becomes meaningless if it is not tied to evidence. The model can create the appearance of healthy progress while the actual output remains incomplete.
This is not hypothetical. #1921 reports that agents bypass Ralph/ULW loops by outputting completion promises without doing work. The same pattern applies to the task system at large.
Proposed Solution
A. Near-duplicate detection on task creation
When task_create is called, check for existing tasks within the same thread or plan scope that have high subject similarity. Options:
- Exact subject match → reject or return existing task id
- High similarity (configurable threshold) → warn and suggest continuing the existing task instead
Keep the implementation simple. A subject string comparison within the same threadID scope is sufficient for a first pass.
B. Optional completion guards on status: "completed" transition
Add an opt-in mechanism for completion validation when a task transitions to completed. Possible guard types:
| Guard |
Description |
artifact_exists |
Specified file path must exist on disk |
file_changed |
Specified file must have been modified since task creation (mtime check) |
verification_command_passed |
A shell command must exit with code 0 |
field_populated |
A required metadata field must be non-empty |
Implementation sketch:
// In task-update.ts, before setting status
if (validatedArgs.status === "completed" && task.completionContract) {
const results = await evaluateGuards(task.completionContract);
if (!results.allPassed) {
return `Completion blocked: ${results.failures.join("; ")}`;
}
}
Contracts could be declared at task creation time via a completion_contract field, or configured globally per task class.
C. BlockedBy enforcement
When a task with non-empty blockedBy is marked completed, verify that all blocking tasks are already completed. This is a minimal integrity check that should be on by default.
Currently task-list.ts filters out completed blockers for display (lines 58-62), but task-update.ts does not check blockers at all.
Success Criteria
- A model cannot mark substantial work complete without either producing the artifact or explicitly failing the completion check.
- Duplicate task storms are blocked or surfaced early via warnings.
- Tasks with unresolved
blockedBy cannot be marked completed.
- Existing completion proof patterns (ralph-loop Oracle verification, atlas CompletionGate) remain unaffected.
Scope
This does not need to land all at once. Suggested phases:
- Phase 1: BlockedBy enforcement on
completed transition (smallest, highest value)
- Phase 2: Near-duplicate detection on task creation
- Phase 3: Optional completion guards with declaration API
Related
Prerequisites
Problem Description
The task system (
task_create/task_update) currently allows two things that undermine task state integrity in real workflows:completedwithout evidence that work was actually done.Concrete failure modes
Duplicate task storms
task_createalways generates a freshT-{uuid}id with no subject or content matching. During long sessions, the model creates multiple tasks for the same logical work item. The task list inflates with duplicates, and the model reports progress on tasks that overlap.Source:
src/tools/task/task-create.ts— no deduplication logic.Unearned completion
task_updatesetsstatus: "completed"via direct assignment with no validation:No check that
blockedBytasks are resolved. No check that any artifact was produced. No check that the described work was performed. The model can mark a task complete the instant it is created.Source:
src/tools/task/task-update.tslines 105-107.Existing completion proof patterns are flow-specific, not general
The codebase already has completion proof mechanisms, but only in narrow contexts:
src/hooks/ralph-loop/pending-verification-handler.ts): Requires Oracle agent to emit<promise>VERIFIED</promise>before accepting completion.src/hooks/atlas/verification-reminders.ts): Requires editing a plan file checkbox and re-reading to verify the count changed. "Your completion is NOT tracked until the checkbox is marked."These patterns work well within their scope but do not protect the general
task_updateflow.Why This Matters
For any multi-step workflow that relies on task state (audits, migrations, long refactors), task state becomes meaningless if it is not tied to evidence. The model can create the appearance of healthy progress while the actual output remains incomplete.
This is not hypothetical. #1921 reports that agents bypass Ralph/ULW loops by outputting completion promises without doing work. The same pattern applies to the task system at large.
Proposed Solution
A. Near-duplicate detection on task creation
When
task_createis called, check for existing tasks within the same thread or plan scope that have high subject similarity. Options:Keep the implementation simple. A subject string comparison within the same
threadIDscope is sufficient for a first pass.B. Optional completion guards on
status: "completed"transitionAdd an opt-in mechanism for completion validation when a task transitions to
completed. Possible guard types:artifact_existsfile_changedverification_command_passedfield_populatedImplementation sketch:
Contracts could be declared at task creation time via a
completion_contractfield, or configured globally per task class.C. BlockedBy enforcement
When a task with non-empty
blockedByis markedcompleted, verify that all blocking tasks are alreadycompleted. This is a minimal integrity check that should be on by default.Currently
task-list.tsfilters out completed blockers for display (lines 58-62), buttask-update.tsdoes not check blockers at all.Success Criteria
blockedBycannot be markedcompleted.Scope
This does not need to land all at once. Suggested phases:
completedtransition (smallest, highest value)Related