Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions OPENSPEC-RALPH-BP.md
Original file line number Diff line number Diff line change
Expand Up @@ -343,13 +343,13 @@ Authoring rules:
- **Resolve or explicitly defer policy before writing tasks.** Phrases like "may be shared or tenant-specific," "one option is," or "could support later" are fine while exploring; they are blockers once the loop starts. Resolve algorithms, fallback behavior, retention math, config shape, failure taxonomy, and compatibility-window behavior in `design.md`.
- **Specs must be deterministic.** If two good implementers could read the spec and make materially different choices, the spec is not loop-safe yet.
- **If a dedicated coverage artifact exists** (such as a `figma-route-map.md`), route and shared-surface tasks should reuse it as the durable source of truth instead of rediscovering coverage each iteration.
- **Run with full OpenSpec context when available.** Repo guidance favors `./scripts/ralph-run.sh tasks <change>` over raw `tasks.md` mode because `opsx-apply` reloads proposal, design, specs, and tasks each iteration. If you run raw `prd.json` or raw `tasks.md` mode, push more detail down into each item because the companion docs will not be reloaded.
- **Run with full OpenSpec context when available.** Repo guidance favors `./scripts/ralph-run.sh tasks <change>` over raw `tasks.md` mode because `opsx-apply` provides the agent with a manifest of OpenSpec artifact paths (`## OpenSpec Artifacts`) so the agent can read proposal, design, and specs as needed each iteration. If you run raw `prd.json` or raw `tasks.md` mode, push more detail down into each item because the companion docs will not be listed in the manifest.

### Loop-prompt / wrapper instructions

At minimum, the loop prompt must tell the agent to:

- Read `proposal.md`, `design.md`, `specs/**`, and `tasks.md` at the start of every iteration.
- Read the OpenSpec artifacts listed in `## OpenSpec Artifacts` (proposal, design, specs) before implementing the current task.
- Inspect prior iteration state before starting new work.
- Implement exactly one task per iteration.
- Run the exact validators relevant to that task.
Expand Down Expand Up @@ -508,7 +508,7 @@ The `tenant-scoped-content-versioning` example and subsequent reviews produced t

4. **Every wide task needs explicit "done when" signals.** Verbs like `ensure`, `validate`, `keep`, or `support` are too soft on their own.

5. **Full OpenSpec context is better than raw task-file mode.** Repo guidance favors `./scripts/ralph-run.sh tasks <change>` over raw `tasks.md` mode because `opsx-apply` reloads proposal, design, specs, and tasks. A task list can be shorter when the design/specs fully resolve tricky decisions, but only if the loop actually reloads those artifacts each iteration.
5. **Full OpenSpec context is better than raw task-file mode.** Repo guidance favors `./scripts/ralph-run.sh tasks <change>` over raw `tasks.md` mode because `opsx-apply` provides a manifest (`## OpenSpec Artifacts`) listing artifact paths so the agent can read proposal, design, and specs as needed. A task list can be shorter when the design/specs fully resolve tricky decisions, but only if the loop actually references those artifacts each iteration.

6. **"Done when" gates are hard stops, not soft guidelines.** The most common single-task quality failure is a loop marking a task complete after a `Done when` check failed, with a rationalization note. The gate is self-authorizing; the loop decides the gate does not apply, bypasses it, and moves on, recording a completion claim the stated verifier never confirmed.

Expand Down
38 changes: 23 additions & 15 deletions RALPH-METHODOLOGY-ASSESSMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ OpenSpec specs → docs (README/QUICKSTART/BOTW) → archived artifacts.
| P2 | Iterative loop with limits | verified | high |
| P3 | tasks.md as single source of truth | verified | high |
| P4 | Symlink architecture for task sharing | verified | high |
| P5 | Fresh context per iteration (PRD snapshot + live task context)| verified | high |
| P5 | Fresh context per iteration (manifest-style OpenSpec Artifacts + bounded task context)| verified | high |
| P6 | Iteration numbering aligned with tasks| partially-verified | medium |
| P7 | Structured git commit format | verified | high |
| P8 | Auto-resume on restart | verified | high |
Expand Down Expand Up @@ -137,18 +137,23 @@ file state simultaneously" confirms the shared-access invariant holds at
runtime. `tests/integration/test-symlink-macos.bats` provides platform-specific
end-to-end coverage.

#### P5 — Fresh context per iteration (PRD snapshot + live task context)
#### P5 — Fresh context per iteration (manifest-style OpenSpec Artifacts + bounded task context)

`lib/mini-ralph/runner.js:95` calls `prompt.render(options, iterationCount)`
inside the loop on every iteration. `lib/mini-ralph/prompt.js:82-89` reads
`tasksFile` content fresh on every call and exposes the loop-start prompt body as
`{{base_prompt}}`; `lib/mini-ralph/tasks.js:152-180` — `taskContext()` always
reads live `tasks.md`. The bash side generates the PRD once at loop start in
`scripts/ralph-run.sh:968-979`, then reuses it for the rest of the run.
inside the loop on every iteration. The iteration prompt uses a manifest shape:
`scripts/ralph-run.sh:create_prompt_template()` writes a `## OpenSpec Artifacts`
section that lists artifact file paths (proposal, design, specs, plus
`.ralph/PRD.md` as a convenience copy), and when a repo-root `AGENTS.md` is
present, includes it in the same manifest. The task-context surface is bounded to
`## Current Task` + `## Progress: N of M tasks complete` via
`lib/mini-ralph/tasks.js:taskContext()`. The bash side generates `.ralph/PRD.md`
once at loop start in `scripts/ralph-run.sh`, then reuses it for the rest of the
run as a pre-concatenated convenience copy of the artifacts.
`tests/unit/javascript/mini-ralph-prompt.test.js:149` — "injects fresh
task_context when tasksFile is present" and
`tests/unit/bash/test-prd-task-context-injection.bats` confirm that each
iteration receives up-to-date task state with no stale context carry-over.
`tests/unit/bash/test-prd-omits-task-context.bats` confirm that the PRD does not
carry task-context injection, and that each iteration receives only the bounded
current-task and progress context with no stale carry-over.

#### P7 — Structured git commit format with task numbers

Expand Down Expand Up @@ -449,17 +454,20 @@ same file.

---

### P5 — Fresh context per iteration via PRD snapshot + live task context
### P5 — Fresh context per iteration via manifest-style OpenSpec Artifacts + bounded task context

**Full claim:** The loop re-renders prompt context every iteration from a
loop-start PRD snapshot plus live `tasks.md`, current-task context, recent loop
signals, and pending injected context.
manifest that lists OpenSpec artifact paths (agent reads them as needed), a
bounded task-context surface (`## Current Task` + `## Progress`), recent loop
signals, and pending injected context. `.ralph/PRD.md` is still generated once
at loop start as a pre-concatenated convenience copy of proposal/specs/design.
When a repo-root `AGENTS.md` is present it is surfaced in the same manifest.

| Field | Value |
|-------|-------|
| Verdict | `verified` — `prompt.render()` is called inside the runner loop on every iteration and reads live `tasks.md` content each time, while `PRD.md` is generated once at loop start and then reused. Confirmed by unit tests for prompt rendering and PRD generation. Confidence: **high**. |
| Implementation evidence | `scripts/ralph-run.sh:404-444` — `generate_prd()` reads proposal, specs, and design and writes `$ralph_dir/PRD.md`; `ralph-run.sh:968-979` — PRD is generated before the loop starts; `lib/mini-ralph/prompt.js:83-107` — `render()` reads `tasksFile` content and `taskContext` fresh on every iteration call, exposes `{{base_prompt}}`, and injects commit-contract text; `lib/mini-ralph/tasks.js:152-180` — `taskContext()` always reads live `tasks.md`; `lib/mini-ralph/runner.js:95` — `prompt.render(options, iterationCount)` called inside the while loop |
| Test evidence | `tests/unit/javascript/mini-ralph-prompt.test.js` — `render()` suite (lines 104–217): `renders template with iteration variables` (line 110), `injects tasks content when tasksFile is present` (line 131), `injects fresh task_context when tasksFile is present` (line 149); `tests/unit/bash/test-generate-prd.bats` — `generate_prd: generates PRD with all required sections` (line 16), `generate_prd: includes current task context when available` (line 162), `generate_prd: includes completed tasks in context` (line 377); `tests/unit/bash/test-prd-task-context-injection.bats` — validates task context is injected per-call |
| Verdict | `verified` — `prompt.render()` is called inside the runner loop on every iteration and reads live `tasks.md` content each time; the iteration prompt lists OpenSpec artifact paths in `## OpenSpec Artifacts` rather than inlining their content; `taskContext()` emits only current-task + progress. Confirmed by unit tests for prompt rendering, PRD generation, and task-context shape. Confidence: **high**. |
| Implementation evidence | `scripts/ralph-run.sh:create_prompt_template()` — manifest heredoc lists artifact paths under `## OpenSpec Artifacts` and probes `AGENTS.md` via `probe_agents_md()`; `ralph-run.sh:generate_prd()` — generates `.ralph/PRD.md` before the loop as a convenience copy, no task-context appended; `lib/mini-ralph/tasks.js:taskContext()` — emits `## Current Task` + `## Progress: N of M tasks complete` only; `lib/mini-ralph/runner.js:95` — `prompt.render(options, iterationCount)` called inside the while loop |
| Test evidence | `tests/unit/javascript/mini-ralph-tasks.test.js` — `taskContext()` suite: `returns current task heading` (bounded shape); `tests/unit/bash/test-create-prompt-template.bats` — `includes OpenSpec Artifacts manifest section`, `AGENTS.md present adds entry to manifest`; `tests/unit/bash/test-prd-omits-task-context.bats` — asserts PRD does NOT contain `## Current Task Context` or `## Completed Tasks for Git Commit`; `tests/unit/bash/test-generate-prd.bats` — `does not include current task context section` |

---

Expand Down
97 changes: 72 additions & 25 deletions lib/mini-ralph/runner.js
Original file line number Diff line number Diff line change
Expand Up @@ -557,12 +557,48 @@ function _formatAutoCommitMessage(iteration, completedTasks) {
* @param {Array<object>} recentHistory
* @returns {string}
*/
function _firstNonEmptyLine(text, limit) {
if (!text) return '';
const lines = text.split('\n');
for (const line of lines) {
const trimmed = line.trim();
if (trimmed.length > 0) {
return trimmed.slice(0, limit);
}
}
return '';
}

function _failureFingerprint(entry, errorEntries) {
let stderrHead = '';
if (errorEntries) {
const match = errors.matchIteration(errorEntries, entry.iteration);
stderrHead = _firstNonEmptyLine(match && match.stderr, 120);
}
return JSON.stringify({
failureStage: entry.failureStage || '',
exitCode: entry.exitCode,
stderrHead,
});
}

function _isEmptyFingerprint(fingerprint) {
try {
const obj = JSON.parse(fingerprint);
return !obj.failureStage && obj.exitCode === 0 && !obj.stderrHead;
} catch {
return false;
}
}

function _buildIterationFeedback(recentHistory, errorEntries) {
if (!Array.isArray(recentHistory) || recentHistory.length === 0) {
return '';
}

const problemLines = [];
// Track fingerprint -> first iteration number for dedup
const fingerprintSeen = new Map();

for (const entry of recentHistory) {
const issues = [];
Expand All @@ -579,37 +615,46 @@ function _buildIterationFeedback(recentHistory, errorEntries) {
issues.push(`commit anomaly: ${entry.commitAnomaly}`);
}

if (!entry.filesChanged || entry.filesChanged.length === 0) {
issues.push('no files changed');
}

if (!entry.completionDetected && !entry.taskDetected) {
issues.push('no loop promise emitted');
}

if (issues.length > 0) {
let line = `- Iteration ${entry.iteration}: ${issues.join('; ')}.`;

if (_isFailedIteration(entry) && errorEntries) {
const errorDetails = _extractErrorForIteration(errorEntries, entry.iteration);
if (errorDetails) {
line += '\n Error output:';
if (errorDetails.signal) {
line += `\n signal: ${errorDetails.signal}`;
}
if (errorDetails.failureStage) {
line += `\n failure stage: ${errorDetails.failureStage}`;
}
if (errorDetails.stderr) {
line += `\n ${errorDetails.stderr}`;
}
if (errorDetails.stdout) {
line += `\n stdout: ${errorDetails.stdout}`;
// Compute fingerprint for dedup
const fp = _failureFingerprint(entry, errorEntries);
const isRealFailure = !_isEmptyFingerprint(fp);

if (isRealFailure && fingerprintSeen.has(fp)) {
const firstIteration = fingerprintSeen.get(fp);
problemLines.push(
`- Iteration ${entry.iteration}: same failure as iteration ${firstIteration} (see above).`
);
} else {
if (isRealFailure) fingerprintSeen.set(fp, entry.iteration);

let line = `- Iteration ${entry.iteration}: ${issues.join('; ')}.`;

if (_isFailedIteration(entry) && errorEntries) {
const errorDetails = _extractErrorForIteration(errorEntries, entry.iteration);
if (errorDetails) {
line += '\n Error output:';
if (errorDetails.signal) {
line += `\n signal: ${errorDetails.signal}`;
}
if (errorDetails.failureStage) {
line += `\n failure stage: ${errorDetails.failureStage}`;
}
if (errorDetails.stderr) {
line += `\n ${errorDetails.stderr}`;
}
if (errorDetails.stdout) {
line += `\n stdout: ${errorDetails.stdout}`;
}
}
}
}

problemLines.push(line);
problemLines.push(line);
}
}
}

Expand All @@ -632,8 +677,8 @@ function _extractErrorForIteration(errorEntries, iteration) {
let stderr = match.stderr || '';
let stdout = match.stdout || '';

if (stderr.length > 2000) stderr = stderr.substring(0, 2000) + '...';
if (stdout.length > 500) stdout = stdout.substring(0, 500) + '...';
if (stderr.length > 500) stderr = stderr.substring(0, 500) + '...';
if (stdout.length > 200) stdout = stdout.substring(0, 200) + '...';

return {
stderr,
Expand Down Expand Up @@ -799,4 +844,6 @@ module.exports = {
_failureStageForError,
_errorText,
_appendFatalIterationFailure,
_failureFingerprint,
_firstNonEmptyLine,
};
15 changes: 5 additions & 10 deletions lib/mini-ralph/tasks.js
Original file line number Diff line number Diff line change
Expand Up @@ -157,24 +157,19 @@ function taskContext(tasksFile) {
all.find((task) => task.status === 'in_progress') ||
all.find((task) => task.status === 'incomplete') ||
null;
const completed = all.filter((task) => task.status === 'completed');
const completedCount = all.filter((task) => task.status === 'completed').length;
const total = all.length;

const sections = [];

if (current) {
sections.push('## Current Task');
sections.push(`- ${current.fullDescription || current.description}`);
sections.push('');
}

if (completed.length > 0) {
if (sections.length > 0) {
sections.push('');
}
sections.push('## Completed Tasks for Git Commit');
sections.push(
...completed.map((task) => `- [x] ${task.fullDescription || task.description}`)
);
}
sections.push('## Progress');
sections.push(`- ${completedCount} of ${total} tasks complete`);

return sections.join('\n');
}
Expand Down
Loading
Loading