This file is the canonical tracker for unfinished work and future ambitions. Other documents provide design context, historical phase notes, and limitations, but task status must be updated here.
tracker_version: 1
last_updated: 2026-06-19
status_values: [in_progress, not_started, blocked, done, cancelled]
priority_values: [P0, P1, P2, P3]
scope_values: [scoped, partially_scoped, unscoped]- Keep each task ID stable. Never reuse an ID.
- Update the existing task instead of creating a duplicate.
- Use exactly one value from each controlled vocabulary above.
- Keep
Status,Priority, andScopeas separate fields. P0is urgent/correctness-critical,P1is the next meaningful work,P2is important but non-blocking, andP3is long-term research scope.- A task may be marked
doneonly when everyDone whenitem is satisfied. - When completing a task, set
Status: done, addCompleted: YYYY-MM-DD, add concise verification evidence, and move it toCompleted Tasks. - When adding a task, use the next unused ID in the appropriate range:
CRL-001throughCRL-099for active or near-term work andCRL-101onward for unscoped ambitions. - Do not convert explicit non-goals into tasks without a deliberate scope decision. In particular, replay rendering must not recompute simulation, and NLP must not directly mutate simulator state or emit raw actions.
- Update
last_updatedwhenever task state or content changes.
| ID | Priority | Status | Scope | Title |
|---|---|---|---|---|
| CRL-001 | P1 | done | scoped | Harden replay viewer ingestion |
| CRL-002 | P1 | not_started | unscoped | Scope the P11 backend and dashboard |
| CRL-003 | P1 | not_started | partially_scoped | Improve training distribution robustness |
| CRL-004 | P1 | not_started | partially_scoped | Validate reward-shaping removal |
| CRL-005 | P2 | not_started | partially_scoped | Improve teamwork intent and metrics |
| CRL-006 | P2 | done | scoped | Finish replay viewer product polish |
| CRL-007 | P2 | not_started | partially_scoped | Automate browser viewer regression coverage |
| CRL-008 | P2 | not_started | partially_scoped | Profile large-replay performance |
| CRL-101 | P2 | not_started | unscoped | Objective-control mode |
| CRL-102 | P2 | not_started | unscoped | Support/healer role |
| CRL-103 | P2 | not_started | unscoped | Pathfinding and richer arenas |
| CRL-104 | P3 | not_started | unscoped | Explicit targeting and skillshots |
| CRL-105 | P3 | not_started | unscoped | Advanced multi-agent RL |
| CRL-106 | P3 | not_started | unscoped | Live LLM profile translation service |
| CRL-107 | P3 | not_started | unscoped | Fog of war |
These tasks represent started subsystems, known gaps, or the next official phase. Resolve these before expanding into lower-priority ambitions unless a task is explicitly deprioritized.
- Status:
not_started - Priority:
P1 - Scope:
unscoped - Area:
product/backend/frontend - Current state: P11 is the next roadmap phase. A replay-only frontend exists, but there is no backend, replay catalog, experiment browser, or full dashboard.
- Remaining work:
- Write a focused P11 design that defines users, workflows, API boundaries, local artifact discovery, security constraints, and non-goals.
- Decide whether FastAPI is justified and define the minimal API if it is.
- Separate replay viewing from training/evaluation control surfaces.
- Define acceptance tests before implementation begins.
- Done when:
- An approved P11 scope document and implementation plan exist.
- Backend and frontend ownership boundaries are explicit.
- Tasks derived from the plan have acceptance criteria and dependencies.
- Dependencies: CRL-001 should inform replay-related API requirements.
- Sources:
README.mdroadmap,docs/phase_p10.md,docs/nlp.md
- Status:
not_started - Priority:
P1 - Scope:
partially_scoped - Area:
training/evaluation - Current state: The curriculum-trained ranged policy performs strongly on a narrow scenario with fixed spawns and limited opponent variation. The tank role is not comparably trained.
- Remaining work:
- Add controlled spawn randomization without weakening determinism.
- Evaluate against stronger and mixed opponent policies across enough seeds.
- Train and evaluate the tank-controlled slot.
- Define generalization gates before making broader learning claims.
- Done when:
- Evaluation covers randomized spawns, mixed opponents, and both controlled roles.
- Results include replay inspection and at least 30 fixed seeds per claim.
- Performance and failure modes are documented without overstating generality.
- Dependencies: Existing curriculum and evaluation framework.
- Sources:
README.mdlimitations,docs/rl_training.md
- Status:
not_started - Priority:
P1 - Scope:
partially_scoped - Area:
training/rewards - Current state: Reward shaping remains enabled in the final training stage; evaluation metrics are shaping-independent, but sustained behavior under an annealed or sparse objective has not been demonstrated.
- Remaining work:
- Design an annealing or fine-tuning experiment that approaches zero shaping.
- Compare shaped, annealed, and canonical sparse objectives on fixed seeds.
- Inspect action histograms and representative replays for regression.
- Done when:
- The experiment is reproducible from committed configs and commands.
- Results show whether combat behavior survives shaping reduction.
- Findings and limitations are documented.
- Dependencies: CRL-003 may supply broader evaluation scenarios.
- Sources:
README.mdlimitations,docs/rl_training.md
- Status:
not_started - Priority:
P2 - Scope:
partially_scoped - Area:
replay/evaluation - Current state: Some teamwork metrics are best-effort because replay events do not always expose rich target or tactical intent.
- Remaining work:
- Identify metrics that cannot be made reliable from existing saved data.
- Propose additive event payloads without changing existing event semantics.
- Add validator, round-trip, and evaluation tests for approved payloads.
- Done when:
- Teamwork metrics have documented definitions and required evidence.
- Metrics report unavailable data explicitly instead of inferring unsupported intent.
- Any schema evolution is versioned and backward-compatible.
- Dependencies: Replay schema design review.
- Sources:
docs/profiles.md,docs/evaluation.md,docs/replay_schema.md
- Status:
not_started - Priority:
P2 - Scope:
partially_scoped - Area:
frontend/testing - Current state: Parser, validation, interpolation, and shortcut behavior have unit coverage. Browser behavior still relies on manual acceptance because the in-app browser runner is unavailable in the current Windows sandbox.
- Remaining work:
- Select a lightweight browser-test harness compatible with Yarn Plug'n'Play.
- Cover bundled loading, local directory fixtures, recoverable invalid input, playback shortcuts, agent selection/follow, and compact viewports.
- Add the browser suite to the supported local or CI verification workflow.
- Done when:
- Browser acceptance scenarios run repeatably without manual file-picker steps.
- Failures capture actionable DOM state or screenshots.
- The suite does not require cloud services or live simulation.
- Dependencies: CRL-001 and CRL-006.
- Sources:
docs/3d_replay_viewer.md
- Status:
not_started - Priority:
P2 - Scope:
partially_scoped - Area:
frontend/replay/performance - Current state: Replay files are read and parsed fully in memory, which is appropriate for current artifacts but has no documented size budget.
- Remaining work:
- Define representative replay sizes and load/render performance budgets.
- Measure parsing latency, memory use, timeline updates, and scene rebuild cost.
- Introduce streaming parsing, indexing, or a Web Worker only if measurements demonstrate a need.
- Done when:
- Repeatable benchmark inputs and thresholds are documented.
- Large replay behavior is measured on supported browsers.
- Any optimization preserves replay fidelity and existing schema semantics.
- Dependencies: CRL-001.
- Sources:
docs/3d_replay_viewer.md,docs/replay_schema.md
These are recognized future directions, not approved implementation plans.
Before coding, change Scope to partially_scoped or scoped, define explicit
non-goals, split large items into near-term tasks, and update the task index.
- Status:
not_started - Priority:
P2 - Scope:
unscoped - Area:
simulation/environment/evaluation - Ambition: Add objective zones and objective-aware policies, profiles, observations, rewards, replays, and metrics.
- Required scoping: Win conditions, deterministic resolution, schema changes, reward boundaries, tests, and migration strategy.
- Dependencies: Stable elimination-mode behavior must remain backward-compatible.
- Sources:
docs/CombatRL_Canonical_Project_Spec.mdsection 24.2,docs/profiles.md
- Status:
not_started - Priority:
P2 - Scope:
unscoped - Area:
simulation/agents/environment - Ambition: Implement the reserved support role with deterministic ally support behavior and complete observation, action, replay, renderer, and evaluation coverage.
- Required scoping: Ability semantics, targeting, cooldowns, balance assumptions, action-space compatibility, and tests.
- Dependencies: Likely CRL-104 for explicit ally targeting decisions.
- Sources:
docs/CombatRL_Canonical_Project_Spec.mdsection 24.1,docs/agents.md
- Status:
not_started - Priority:
P2 - Scope:
unscoped - Area:
simulation/geometry - Ambition: Add deterministic obstacle-aware movement and scenarios where arena geometry creates meaningful tactical choices.
- Required scoping: Navigation algorithm, collision semantics, determinism, observation impact, performance budgets, and replay compatibility.
- Dependencies: None identified; must not silently alter existing scenarios.
- Sources:
README.mdlimitations, simulation config obstacle fields
- Status:
not_started - Priority:
P3 - Scope:
unscoped - Area:
simulation/actions/agents - Ambition: Move beyond
ATTACK_NEARESTwith explicit target IDs and eventually deterministic directional or point-targeted skillshots. - Required scoping: Action schema versioning, invalid-target behavior, Gymnasium encoding, bot APIs, observations, replay events, and renderer support.
- Dependencies: Foundational for richer support and combat abilities.
- Sources:
docs/agents.md,docs/CombatRL_Canonical_Project_Spec.mdsection 24.4
- Status:
not_started - Priority:
P3 - Scope:
unscoped - Area:
environment/training/evaluation - Ambition: Explore PettingZoo, simultaneous multi-agent learning, shared team policies, centralized critics, self-play, opponent pools, larger teams, and advanced MARL only after simpler baselines justify the complexity.
- Required scoping: Research question, baseline, API choice, compute budget, reproducibility gates, opponent sampling, and evaluation protocol.
- Dependencies: CRL-003 and stable single-agent baselines; do not introduce RLlib or distributed infrastructure prematurely.
- Sources:
docs/evaluation.md,docs/rl_environment.md,docs/CombatRL_Canonical_Project_Spec.mdsections 24.5-24.6
- Status:
not_started - Priority:
P3 - Scope:
unscoped - Area:
nlp/backend - Ambition: Expose the existing validated natural-language-to-profile translator through an optional live service without making the LLM a controller.
- Required scoping: Provider-neutral interface, structured outputs, timeouts, cost controls, privacy, deterministic fallback, and offline tests.
- Dependencies: CRL-002 backend scope.
- Sources:
docs/nlp.md,docs/phase_p10.md - Non-goal: Direct raw-action generation or simulator mutation from LLM output.
- Status:
not_started - Priority:
P3 - Scope:
unscoped - Area:
simulation/observations/rendering - Ambition: Add deterministic visibility and partial observability for research scenarios after full-observability baselines are mature.
- Required scoping: Visibility geometry, observation masking, hidden-state replay policy, renderer behavior, evaluation fairness, and compatibility.
- Dependencies: Richer arena geometry may depend on CRL-103.
- Sources:
docs/CombatRL_Canonical_Project_Spec.mdsection 24.3
- Status:
done - Priority:
P1 - Scope:
scoped - Area:
frontend/replay - Completed:
2026-06-19 - Final state:
- Users can select one local replay directory without an upload or backend.
- Runtime validation reports missing files, malformed JSON/JSONL, unsupported schema versions, invalid fields, identity mismatches, count mismatches, and invalid frame/event ranges with file/line/field context.
- Failed imports preserve the currently open replay and offer demo recovery.
- Done when:
- A user can open an arbitrary valid CombatRL replay without modifying code.
- Invalid replay input produces actionable errors and does not crash the UI.
- Loader and validation tests cover success and failure paths.
- Verification:
- Frontend test suite covers static and local loading, incomplete/duplicate directories, malformed JSONL, schema mismatch, invalid agents, and cross-file consistency.
- The unchanged bundled replay remains compatible with the authoritative Python validator.
- Dependencies: None.
- Sources:
docs/3d_replay_viewer.md,docs/replay_schema.md
- Status:
done - Priority:
P2 - Scope:
scoped - Area:
frontend/rendering - Completed:
2026-06-19 - Final state:
- Follow mode tracks the selected agent while preserving camera offset.
- Playback, seeking, speed, camera, follow, range, and target controls have documented keyboard shortcuts and visible focus/pressed/disabled states.
- Compact layouts use larger touch targets and scrollable control groups.
- The scene is lazy-loaded; the initial chunk is separated from the Three.js renderer under a documented 560 kB minified budget.
- Audio and richer effects remain deferred because current effects already communicate replay state without additional assets or controls.
- Done when:
- Core controls are keyboard-accessible and usable at supported viewport sizes.
- The production bundle warning is resolved or accepted with rationale.
- Optional polish does not introduce live simulation or game-rule logic.
- Verification:
- Shortcut mapping has unit coverage and TypeScript/build checks pass.
- Renderer code splitting is visible in production build output.
- Dependencies: CRL-001.
- Sources:
docs/3d_replay_viewer.md
### CRL-NNN: Short action-oriented title
- Status: `not_started`
- Priority: `P2`
- Scope: `unscoped`
- Area: `subsystem/name`
- Current state or ambition: What exists and what is missing.
- Remaining work or required scoping:
- Concrete item.
- Done when:
- Verifiable completion condition.
- Dependencies: Task IDs or `None`.
- Sources: Relevant files, issues, reports, or decisions.