Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 48 additions & 7 deletions skills/content-refinement-agent/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: content-refinement-agent
description: Step 5 of the PaperOrchestra pipeline (arXiv:2604.05018). Iteratively refine drafts/paper.tex by simulating peer review and applying targeted revisions, with strict accept/revert halt rules. Maintains a worklog and snapshots each iteration so revert is real, not symbolic. TRIGGER when the orchestrator delegates Step 5 or when the user asks to "refine the draft", "iterate on the paper", or "run peer review on this paper".
description: Step 5 of the PaperOrchestra pipeline (arXiv:2604.05018). Iteratively refine drafts/paper.tex by simulating peer review and applying targeted revisions, with strict accept/revert halt rules, deterministic 0-100 decision bands (Accept/Minor/Major/Reject) that drive a target-met early stop, and a Devil's Advocate concession-threshold guard that blocks acceptance on unresolved critical findings. Maintains a worklog and snapshots each iteration so revert is real, not symbolic. TRIGGER when the orchestrator delegates Step 5 or when the user asks to "refine the draft", "iterate on the paper", or "run peer review on this paper".
data_access_level: verified_only
---

Expand Down Expand Up @@ -127,6 +127,24 @@ issues a CRITICAL finding that remains unaddressed after all reviewers weigh in,
that finding blocks the "refinement accepted" decision regardless of rubric scores.
Log DA CRITICAL findings in worklog.json: `{da_critical: true, finding: "..."}`.

Record the DA's per-round findings and concession decisions in
`workspace/refinement/da_concessions.json` (schema in `references/da-reviewer.md`)
and enforce the concession-threshold protocol deterministically — this stops the
simulated DA from sycophantically caving:

```bash
python skills/content-refinement-agent/scripts/concession_guard.py \
--log workspace/refinement/da_concessions.json \
--out workspace/refinement/iter<N>/da_guard.json
# exit 0 = clear; exit 1 = standing CRITICAL → force REVERT this iteration;
# exit 2 = a concession was rejected (caving/consecutive) → DA must restate;
# exit 3 = schema error.
```

The guard rejects any concession made at `rebuttal_score < 4` or in a round
immediately following another concession, and restores the affected finding to
"standing". A standing CRITICAL (exit 1) overrides an ACCEPT into a REVERT.

Save to `workspace/refinement/iter<N>/review.json`.

### 2. Score the draft
Expand All @@ -144,6 +162,7 @@ The reviewer call produces both qualitative feedback and a per-axis score:
"academic_style": {"score": 68, "justification": "..."}
},
"overall_score": 64.5,
"decision_band": "Major Revision",
"strengths": [...],
"weaknesses": [...],
"questions": [...]
Expand All @@ -153,6 +172,12 @@ The reviewer call produces both qualitative feedback and a per-axis score:
Save to `iter<N>/score.json`. (Combined with `review.json` if your host
emits one document; the schemas overlap.)

`decision_band` is derived deterministically from `overall_score` — Accept
(≥80) / Minor Revision (65–79) / Major Revision (50–64) / Reject (<50). Fill it
in with `python skills/content-refinement-agent/scripts/decision_band.py
--score-json iter<N>/score.json` rather than by hand, so it can never disagree
with the number. The bands drive the target-met halt in Step 5.

### 3. Apply revision

Load the **verbatim Content Refinement Agent prompt** at `references/prompt.md`.
Expand Down Expand Up @@ -195,6 +220,7 @@ python skills/content-refinement-agent/scripts/score_delta.py \
--curr workspace/refinement/iter<N>/score.json \
--plateau-threshold 1.0 \
--plateau-streak 3 \
--accept-threshold 80 \
--consecutive-small $CONSECUTIVE_SMALL \
> workspace/refinement/iter<N>/delta.json

Expand All @@ -208,10 +234,11 @@ print(d['consecutive_small'])
```

Exit codes:
- `0` — ACCEPT (overall improved or tied with non-negative net sub-axis, no plateau)
- `0` — ACCEPT (overall improved or tied with non-negative net sub-axis, below the Accept band, no plateau)
- `1` — REVERT (overall decreased)
- `2` — REVERT (tied overall, but net sub-axis change negative)
- `4` — HALT_PLATEAU (accepted but N consecutive iterations below threshold — stop early)
- `5` — HALT_TARGET_MET (accepted AND reached the Accept band, overall ≥ 80 — stop)

Behavior:

Expand All @@ -221,6 +248,15 @@ Behavior:
iterations are unlikely to yield meaningful gains. In practice ~85% of
refinement gain comes in iteration 1; the plateau fires when subsequent
iterations improve by less than 1 point for 3 consecutive rounds.
- **HALT_TARGET_MET (exit 5)**: keep current (it was accepted), but stop — the
paper has reached the Accept band (overall ≥ 80), so there is no reason to
keep iterating and risk a regression. The `delta.json` carries
`decision_band_prev` / `decision_band_curr` for the run report.

**Override — DA CRITICAL.** If `concession_guard.py` (Step 1) returned exit 1
for this iteration, treat the outcome as **REVERT** even when `score_delta.py`
says ACCEPT: roll back to `iter<N-1>/paper.tex` and require the next revision to
address the standing CRITICAL finding.

Always log the decision via `apply_worklog.py --decision ...`.

Expand All @@ -229,10 +265,13 @@ Always log the decision via `apply_worklog.py --decision ...`.
Halt the loop when ANY of these is true:

1. Iteration count reaches `ITER_CAP` (default 3).
2. `score_delta.py` returned exit code 1 or 2 (REVERT).
2. `score_delta.py` returned exit code 1 or 2 (REVERT), OR `concession_guard.py`
returned exit 1 (standing DA CRITICAL → forced REVERT).
3. The simulated reviewer's `weaknesses` list is empty (no actionable
feedback to apply).
4. `score_delta.py` returned exit code 4 (HALT_PLATEAU — plateau early-stop).
5. `score_delta.py` returned exit code 5 (HALT_TARGET_MET — reached the Accept
band, overall ≥ 80; promote the current draft).

### 7. Promote the best snapshot

Expand All @@ -247,9 +286,9 @@ cp workspace/refinement/iter<best>/paper.pdf workspace/final/paper.pdf

Then in the final report, tell the user:
- How many iterations were run
- The final overall score
- The score trajectory (e.g., "iter0 64.5 → iter1 67.3 (accept) → iter2 69.1 (accept) → iter3 68.9 (revert, halt)")
- Which iteration was promoted
- The final overall score and its decision band (Accept / Minor / Major / Reject)
- The score trajectory with bands (e.g., "iter0 58.0 Major → iter1 67.3 Minor (accept) → iter2 81.0 Accept (halt: target met)")
- Which iteration was promoted, and the halt reason (revert / plateau / target met / iter cap / DA critical)

## Critical safety constraints (App. F.1 page 50–51)

Expand Down Expand Up @@ -286,7 +325,9 @@ These rules prevent reward hacking and keep the refinement loop honest.
- `references/writing-quality-check.md` — 5-category anti-AI-prose checklist (pointer to shared)
- `references/ai-failure-modes.md` — 7-mode integrity gate run before first iteration (pointer to shared)
- `references/da-reviewer.md` — Devil's Advocate reviewer protocol and concession rules
- `scripts/score_delta.py` — accept/revert decision from two score JSONs
- `scripts/score_delta.py` — accept/revert/halt decision from two score JSONs; emits decision bands + target-met halt (exit 5)
- `scripts/decision_band.py` — map an overall score to a canonical decision band (Accept/Minor/Major/Reject)
- `scripts/concession_guard.py` — enforce the DA concession-threshold protocol; blocks accept on a standing CRITICAL
- `scripts/score_trajectory.py` — per-dimension score history, regression and plateau detection
- `scripts/apply_worklog.py` — append iteration entries to worklog.json
- `scripts/snapshot.py` — copy paper.tex/paper.pdf into iter<N>/ for rollback
Expand Down
57 changes: 57 additions & 0 deletions skills/content-refinement-agent/references/da-reviewer.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,60 @@ If the DA issues a CRITICAL finding, `score_delta.py` exit code is overridden to
continuing.

Log in worklog.json: `{da_critical: true, finding: "..."}`

## Deterministic enforcement: `scripts/concession_guard.py`

The concession threshold and the no-consecutive-concessions iron rule are easy
for a simulated reviewer to quietly relax — it caves. To make them
non-negotiable, record the DA's findings and concession decisions in a
**concession log** and run `concession_guard.py` each iteration. The script
re-derives which concessions are valid and whether any CRITICAL is still
standing; the host agent must obey its verdict over the LLM's prose.

Concession log schema (`workspace/refinement/da_concessions.json`):

```json
{
"rounds": [
{
"round": 1,
"findings": [
{
"id": "F1",
"severity": "critical",
"attack": "Sec 4 claims X *causes* Y from correlation only.",
"rebuttal_score": 2,
"conceded": false,
"resolved": false
}
]
}
]
}
```

- `rebuttal_score` (1–5) — the DA's score of the author/revision rebuttal,
using the concession-threshold scale above.
- `conceded` — did the DA drop the attack this round?
- `resolved` — was the underlying issue actually fixed in the revision?

```bash
python skills/content-refinement-agent/scripts/concession_guard.py \
--log workspace/refinement/da_concessions.json \
--out workspace/refinement/iter<N>/da_guard.json
```

Verdict → loop action:

| Guard exit | Meaning | Host action |
|---|---|---|
| 0 | CLEAR — no standing critical, no violations | accept may proceed |
| 1 | BLOCK — a critical is still standing | treat the iteration as **REVERT** (force `score_delta.py` outcome to exit 2) and require the next revision to address it |
| 2 | WARN — a concession was rejected (caving or consecutive) but no critical is blocked | the DA must restate the attack; do not let the rejected concession stand |
| 3 | input / schema error | fix the log |

The guard rejects (does not honor) any concession made at `rebuttal_score < 4`
or in a round immediately following another conceding round, and restores the
affected finding to "standing". A standing CRITICAL blocks acceptance
regardless of rubric scores — this is the deterministic backstop behind the
prose rules above.
34 changes: 34 additions & 0 deletions skills/content-refinement-agent/references/halt-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,32 @@ The script exits with:
| 0 | ACCEPT_TIED_NON_NEGATIVE | keep new draft, continue loop |
| 1 | REVERT_OVERALL_DECREASED | rollback to prev, halt loop |
| 2 | REVERT_TIED_NEGATIVE_SUBAXIS | rollback to prev, halt loop |
| 4 | HALT_PLATEAU | keep new draft (accepted), halt loop |
| 5 | HALT_TARGET_MET | keep new draft (accepted), halt loop |

The script also prints a one-line decision string and a JSON object on
stdout for the host agent to log.

## Decision bands and the target-met halt

`score_delta.py` annotates every comparison with the prev/curr **decision band**
(`decision_band.py`): Accept (≥80), Minor Revision (65–79), Major Revision
(50–64), Reject (<50). These give the loop an *absolute* quality target on top of
the *relative* delta rules.

```
if DECISION in {ACCEPT_IMPROVED, ACCEPT_TIED_NON_NEGATIVE}:
if curr.overall >= accept_threshold (default 80):
DECISION = HALT_TARGET_MET # exit 5 — keep current draft, stop
elif consecutive_small >= plateau_streak:
DECISION = HALT_PLATEAU # exit 4 — keep current draft, stop
```

Target-met takes precedence over plateau: once the paper reaches the Accept
band there is no reason to keep iterating and risk a regression. Both 4 and 5
**keep** the just-accepted draft (unlike REVERT, which rolls back). Disable the
target-met halt with `--no-target-halt` (the band is still reported).

## Loop-level halt conditions

In addition to the per-iteration accept/revert decision, the loop halts
Expand All @@ -66,6 +88,18 @@ when ANY of these is true:
`overall_delta < threshold`. Default: threshold=1.0 points, N=3.
Configurable via `--plateau-threshold` and `--plateau-streak`.

5. **Target met (exit code 5).** `score_delta.py` returns `HALT_TARGET_MET`
when an accepted iteration reaches the Accept band (overall ≥ 80). The
current draft is promoted; the loop stops rather than risk a regression.

6. **DA CRITICAL standing (concession guard).** `concession_guard.py` exit 1
means a Devil's Advocate CRITICAL finding is still standing (unresolved and
not validly conceded). This **overrides an ACCEPT into a REVERT**: roll back
and require the next revision to address the finding before continuing. Exit
2 (a rejected concession with no blocked critical) is a WARN — the DA must
restate the attack, but it does not by itself force a revert. See
`da-reviewer.md`.

The calling loop must pass `--consecutive-small <count>` to
`score_delta.py` to track the streak across iterations:

Expand Down
29 changes: 28 additions & 1 deletion skills/content-refinement-agent/references/reviewer-rubric.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,14 +60,34 @@ Then identify:
- Questions: 2-4 specific questions the paper should answer for a
reader to be convinced.
- Decision: one of "Strong Accept", "Accept", "Borderline", "Reject",
"Strong Reject".
"Strong Reject". This is your qualitative judgment; it must be consistent
with the decision band the overall score falls into (see below).
- Overall Score: weighted average 0-100. Use:
overall = 0.20*depth + 0.20*execution + 0.15*flow
+ 0.15*clarity + 0.20*evidence + 0.10*style

Output STRICT JSON only. No prose outside the JSON.
```

## Decision bands (canonical, derived from overall score)

The free-form `decision` above is advisory. The refinement loop reasons about a
**canonical decision band** computed deterministically from `overall_score` by
`scripts/decision_band.py`, so the band can never drift from the number it
summarizes:

| Overall score | Decision band | Loop meaning |
|---|---|---|
| ≥ 80 | **Accept** | Clears the acceptance bar — loop may stop (target met) |
| 65–79 | **Minor Revision** | Close; keep refining presentation |
| 50–64 | **Major Revision** | Substantive gaps remain |
| < 50 | **Reject** | Far from publishable |

The reviewer's qualitative `decision` should agree with the band (e.g. don't
write "Accept" with an overall of 62). The thresholds are configurable on
`decision_band.py` / `score_delta.py` (`--accept-min` etc.) but default to the
table above. See `halt-rules.md` for how the Accept band triggers an early halt.

## Output JSON schema

```json
Expand Down Expand Up @@ -99,10 +119,17 @@ Output STRICT JSON only. No prose outside the JSON.
"How does the temporal branch behave on videos longer than the training distribution?"
],
"decision": "Borderline",
"decision_band": "Major Revision",
"overall_score": 64.5
}
```

`decision_band` is filled in deterministically — run
`python scripts/decision_band.py --score-json iter<N>/score.json` and copy the
result, or let `score_delta.py` report it (it emits `decision_band_prev` /
`decision_band_curr` on every comparison). Never hand-set it inconsistently with
`overall_score`.

## How the loop uses this output

The `score_delta.py` script reads two consecutive score JSONs and applies
Expand Down
Loading