Integrate Visual Execution Model (VEM): planning priors, eval harness, docs, and repo index#5
Conversation
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (15)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| policy_logits = torch.zeros(1, device=state.device) # placeholder | ||
| # Estimate action priors from a learned policy-query head. | ||
| pooled_next_state = next_state.mean(dim=1) if next_state.ndim > 2 else next_state | ||
| policy_query = self.model.policy_query_head(pooled_next_state) |
There was a problem hiding this comment.
Suggestion: _imagine_future now unconditionally calls policy_query_head, but MCTS's model contract/documentation only requires a predictor and value head, and _expand still contains fallback logic for models without policy priors. This unconditional call will raise AttributeError for compatible models that do not define policy_query_head; gate this call with hasattr and return an empty/None query when unavailable. [api mismatch]
Severity Level: Major ⚠️
- ❌ MCTS planning crashes with user models lacking policy_query_head.
- ⚠️ Breaks compatibility with models that follow documented MCTS interface.
- ⚠️ Future integrations must add unused heads just to satisfy planner.Steps of Reproduction ✅
1. Inspect the documented model contract in `MCTS.__init__` at
`models/vjepa/planning.py:120-137`, which states the `model` argument "must have predictor
and value_head" but does not mention a policy prior head.
2. Note that `_expand` is already defensive: it only uses `self.model.policy_query_head`
when `hasattr(self.model, "policy_query_head")` (see `models/vjepa/planning.py:251-255`),
and otherwise falls back to uniform priors (lines 263-266), meaning the planner is
intended to work even when the model lacks a policy query head.
3. Observe that `_imagine_future` now unconditionally calls
`self.model.policy_query_head(pooled_next_state)` at `models/vjepa/planning.py:186-188`,
and returns `(next_state, value, policy_query)` at `line 190`. This helper is used inside
`_expand` when creating children: `next_state, value, _ = self._imagine_future(node.state,
action)` at `lines 15-16` of the second chunk (`models/vjepa/planning.py:260-279`).
4. If a caller instantiates `MCTS` with a custom model that follows the documented
contract (exposes `.predictor.physics_engine` and `.value_head` but no
`.policy_query_head`), then on the first expansion `_imagine_future` will execute
`policy_query = self.model.policy_query_head(pooled_next_state)` (`line 188`), raising
`AttributeError: 'CustomModel' object has no attribute 'policy_query_head'` and causing
any planning call (e.g., `mcts.plan(...)` as used in `check_integrations.py:44-47`) to
fail for otherwise compatible models.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** models/vjepa/planning.py
**Line:** 188:188
**Comment:**
*Api Mismatch: `_imagine_future` now unconditionally calls `policy_query_head`, but `MCTS`'s model contract/documentation only requires a predictor and value head, and `_expand` still contains fallback logic for models without policy priors. This unconditional call will raise `AttributeError` for compatible models that do not define `policy_query_head`; gate this call with `hasattr` and return an empty/None query when unavailable.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| logits = torch.matmul(available_actions, policy_query.squeeze(0)) | ||
| priors = F.softmax(logits / self.temperature, dim=0) |
There was a problem hiding this comment.
Suggestion: self.temperature can be zero (the planner already supports temperature == 0 in action selection), but this new prior computation divides by self.temperature unconditionally. That will produce inf/nan logits and unstable priors during expansion. Add the same zero-temperature handling here (e.g., skip scaling or use an argmax prior path) before calling softmax. [falsy zero check]
Severity Level: Major ⚠️
- ❌ MCTS planning with greedy (temperature 0) yields NaN priors.
- ⚠️ Deterministic evaluation runs can produce unstable, non-reproducible plans.
- ⚠️ Any future use of temperature=0 silently corrupts tree search behavior.Steps of Reproduction ✅
1. Instantiate the MCTS planner with zero temperature using the VisualExecutionModel world
model (as in `check_integrations.py:15-20`):
`mcts = MCTS(model=model, n_simulations=4, temperature=0.0)` (constructor at
`models/vjepa/planning.py:139-157`).
2. Prepare a non-empty action set as done in `check_integrations.py:44-47` (e.g., `actions
= torch.randn(8, cfg.get("action_dim", 128))`) and a latent root state tensor with shape
`(1, D)` or `(1, seq, D)` (see `check_integrations.py:45`).
3. Call `mcts.plan(root_state, actions)` (`models/vjepa/planning.py:90-140`). Inside
`plan`, the root node is initialized (`line 111`) and `_expand(root, available_actions)`
is invoked for the initial expansion (`line 114`).
4. `_expand` at `models/vjepa/planning.py:222-262` computes the policy-query-based priors.
With a model that has `policy_query_head` (VJEPA at `models/vjepa/vjepa_model.py:64-71`),
it obtains `policy_query` (lines 252-255) and then executes `logits =
torch.matmul(available_actions, policy_query.squeeze(0))` and `priors = F.softmax(logits /
self.temperature, dim=0)` at `lines 261-262`. Since `self.temperature == 0.0`, `logits /
self.temperature` produces `inf`/`nan`, and PyTorch's numerically stable softmax over
`[inf, ...]` (after subtracting `max`) yields `nan` probabilities, giving invalid priors
and destabilizing the subsequent search (e.g., `priors.sort()` and `priors[idx].item()` at
`lines 11 and 21` in the same region operate on NaNs).Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** models/vjepa/planning.py
**Line:** 261:262
**Comment:**
*Falsy Zero Check: `self.temperature` can be zero (the planner already supports `temperature == 0` in action selection), but this new prior computation divides by `self.temperature` unconditionally. That will produce `inf`/`nan` logits and unstable priors during expansion. Add the same zero-temperature handling here (e.g., skip scaling or use an argmax prior path) before calling softmax.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| default="config/vjepa_micro.yaml", | ||
| help="Path to YAML config file (e.g., config/vjepa_micro.yaml or config/vjepa_10b.yaml)", | ||
| ) | ||
| args = parser.parse_args() |
There was a problem hiding this comment.
Suggestion: Using strict parse_args() can terminate the training script when launched via distributed runners that inject extra CLI flags (commonly --local-rank), which is likely in this file since it includes distributed/FSDP paths. Accept known launcher args or ignore unknown args to keep distributed startup working. [api mismatch]
Severity Level: Critical 🚨
- ❌ Multi-GPU FSDP training fails under torchrun/launch invocations.
- ⚠️ Users must avoid standard distributed launchers or patch script.
- ⚠️ CI or cluster jobs using launcher tooling will abort at startup.Steps of Reproduction ✅
1. Confirm that `vjepa_train.py` is designed for distributed/FSDP training: it imports
`torch.distributed` and FSDP (`lines 7-9`), conditionally wraps the model in `FSDP` when
`dist.is_initialized()` and `world_size > 1` (`lines 137-143`), and uses distributed-aware
optimizers (`build_optimizer` at `lines 19-90`).
2. At the bottom of `vjepa_train.py`, note the CLI setup: the script defines only a
`--config` argument (`parser.add_argument("--config", ...)` at `lines 215-218`) and parses
arguments strictly with `args = parser.parse_args()` at `line 220`.
3. Launch training via a standard distributed runner such as `torchrun`, which injects
extra arguments like `--local_rank` or `--local-rank` (common for multi-GPU setups), e.g.:
`torchrun --nproc_per_node=2 vjepa_train.py --config config/vjepa_micro.yaml`.
4. When `vjepa_train.py` executes, `argparse.ArgumentParser.parse_args()` at `line 220`
receives unknown flags (e.g., `--local_rank`, `--rdzv_backend`) that are not declared on
the parser. Argparse treats these as errors, prints an "unrecognized arguments" message,
and exits with `SystemExit`, preventing `train()` from running and effectively blocking
distributed training despite the FSDP logic being wired up.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** vjepa_train.py
**Line:** 220:220
**Comment:**
*Api Mismatch: Using strict `parse_args()` can terminate the training script when launched via distributed runners that inject extra CLI flags (commonly `--local-rank`), which is likely in this file since it includes distributed/FSDP paths. Accept known launcher args or ignore unknown args to keep distributed startup working.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fixThere was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 057e2568ad
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| logits = torch.matmul(available_actions, policy_query.squeeze(0)) | ||
| priors = F.softmax(logits / self.temperature, dim=0) |
There was a problem hiding this comment.
Preserve action indices when using policy-query priors
Once priors are computed from policy_query, child expansion is no longer in the original available_actions order; however, _get_action_probabilities still maps visits back by child creation position (root.children.index(child)), not by the original action index. This can make action_probs (and therefore best_action) point to the wrong action whenever priors reorder candidates, which directly degrades planning correctness.
Useful? React with 👍 / 👎.
| pt, ph, pw = cfg["encoder"]["patch_size"] | ||
| seq_len = (t // pt) * (h // ph) * (w // pw) | ||
| num_mask = max(1, seq_len // 4) | ||
| mask = torch.randperm(seq_len)[:num_mask] |
There was a problem hiding this comment.
Construct a boolean full-length mask in integration check
apply_mask expects a boolean mask over all seq_len patches, but this script passes a shortened LongTensor of selected indices. In that case, ~mask becomes negative integer indexing and the model sees arbitrary patch selections instead of true visible/masked partitions, so this smoke test can report success even if masking integration is broken.
Useful? React with 👍 / 👎.
| model = VJEPA( | ||
| cfg["encoder"], | ||
| cfg["predictor"], | ||
| cfg["training"]["ema_momentum"], | ||
| action_dim=128, | ||
| ).to(device).eval() |
There was a problem hiding this comment.
Suggestion: This evaluation also runs on a newly initialized network and never restores trained parameters, so robustness metrics will reflect random weights rather than actual model invariance. Load a checkpoint before computing latent-consistency scores. [incomplete implementation]
Severity Level: Major ⚠️
- ❌ Perception robustness eval script measures randomly initialized VJEPA weights.
- ⚠️ Phase-2 robustness gating cannot assess actual trained model invariance.Steps of Reproduction ✅
1. From the repository root, run the perception eval harness: `python
evaluate_perception.py --config config/vjepa_micro.yaml` (entrypoint guarded by `if
__name__ == "__main__":` at evaluate_perception.py:88-89).
2. In `main()` (evaluate_perception.py:46-51), the script parses `--config`, `--seed`, and
`--save-dir`, then loads the YAML config into `cfg` at lines 55-56.
3. The model is instantiated at lines 58-63: `model = VJEPA(cfg["encoder"],
cfg["predictor"], cfg["training"]["ema_momentum"], action_dim=128).to(device).eval()`,
with no call to `load_state_dict()`, no checkpoint path argument, and no other
weight-loading logic anywhere in evaluate_perception.py.
4. A synthetic random `video` tensor is generated at lines 65-67 and passed to
`model.context_encoder` inside `latent_consistency()` (evaluate_perception.py:39-43), so
all latent consistency metrics written at lines 69-82 and printed at line 85 are computed
using a freshly initialized VJEPA with random weights rather than any trained checkpoint.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** evaluate_perception.py
**Line:** 58:63
**Comment:**
*Incomplete Implementation: This evaluation also runs on a newly initialized network and never restores trained parameters, so robustness metrics will reflect random weights rather than actual model invariance. Load a checkpoint before computing latent-consistency scores.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| available_actions = torch.randn(num_actions, 128, device=device) | ||
| pooled = traj_a[-1].mean(dim=1) | ||
| query = model.policy_query_head(pooled).squeeze(0) | ||
| logits = torch.matmul(available_actions, query) | ||
| probs = torch.softmax(logits, dim=0) | ||
| max_prior = float(probs.max().item()) |
There was a problem hiding this comment.
Suggestion: --num-actions is used without validation, so passing 0 (or a negative value) will produce an empty/invalid action tensor and then fail at probs.max()/softmax path. Enforce num_actions > 0 before metric computation to avoid runtime crashes. [incorrect condition logic]
Severity Level: Major ⚠️
- ❌ World-model eval crashes when `--num-actions` is zero.
- ⚠️ Automated evaluation pipelines fail under invalid CLI input.Steps of Reproduction ✅
1. From the repository root, run the world-model eval harness with zero actions: `python
evaluate_world_model.py --config config/vjepa_micro.yaml --num-actions 0` (CLI defined in
`main()` at evaluate_world_model.py:116-123).
2. `argparse` parses `--num-actions` into `args.num_actions` at lines 117-123, and
`main()` passes this value directly into `evaluate_metrics()` at lines 137-142 without any
validation or clamping.
3. Inside `evaluate_metrics()` (evaluate_world_model.py:76-107), `available_actions =
torch.randn(num_actions, 128, device=device)` at line 100 creates a tensor of shape `(0,
128)` when `num_actions == 0`, and `logits = torch.matmul(available_actions, query)` at
line 103 yields an empty logits tensor.
4. `probs = torch.softmax(logits, dim=0)` at line 104 returns an empty probability tensor,
and `probs.max().item()` at line 105 then raises a runtime error because max-reduction on
an empty tensor is undefined, causing the evaluation script to crash instead of producing
metrics.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** evaluate_world_model.py
**Line:** 100:105
**Comment:**
*Incorrect Condition Logic: `--num-actions` is used without validation, so passing `0` (or a negative value) will produce an empty/invalid action tensor and then fail at `probs.max()`/softmax path. Enforce `num_actions > 0` before metric computation to avoid runtime crashes.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| model = VJEPA( | ||
| config["encoder"], | ||
| config["predictor"], | ||
| config["training"]["ema_momentum"], | ||
| action_dim=128, | ||
| ).to(device) |
There was a problem hiding this comment.
Suggestion: The script reports world-model evaluation metrics from a freshly initialized model because it never loads trained weights; this makes the output non-representative for regression gating or model quality checks. Add a checkpoint argument and load state dict before running metrics. [incomplete implementation]
Severity Level: Major ⚠️
- ❌ World-model metrics reflect untrained random network, not production.
- ⚠️ Regression gates misjudge model quality and temporal consistency.Steps of Reproduction ✅
1. Run the world-model evaluation harness from the project root: `python
evaluate_world_model.py --config config/vjepa_micro.yaml` (entrypoint at
evaluate_world_model.py:165-166).
2. In `main()` (evaluate_world_model.py:116-123), the script parses CLI arguments and
loads the YAML config into `config` at lines 127-128, but it does not parse or accept any
checkpoint path parameter.
3. The model is constructed at lines 130-135: `model = VJEPA(config["encoder"],
config["predictor"], config["training"]["ema_momentum"], action_dim=128).to(device)`, with
no corresponding `load_state_dict()`, `torch.load()`, or other weight-restore call
anywhere in evaluate_world_model.py.
4. `evaluate_metrics()` (evaluate_world_model.py:76-113) then uses a random latent `z0`
(lines 80-82) and random action sequences (lines 84-85) together with this freshly
initialized VJEPA to compute rollout drift, trajectory divergence, and action prior
metrics, so the JSON manifest written at lines 144-159 and printed at 161-162 reports
metrics for a random, untrained network rather than the trained model that
evaluation/regression gating is intended to check.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** evaluate_world_model.py
**Line:** 130:135
**Comment:**
*Incomplete Implementation: The script reports world-model evaluation metrics from a freshly initialized model because it never loads trained weights; this makes the output non-representative for regression gating or model quality checks. Add a checkpoint argument and load state dict before running metrics.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix|
CodeAnt AI finished reviewing your PR. |
User description
Motivation
Description
CODE_ADDRESS_INDEX.md,AUDIT_RECHECK.md, andRECTIFICATION_STATUS.mdto record line-level mapping, recheck steps, and rectification summary.evaluate_world_model.py,evaluate_perception.py, andcheck_integrations.pyfor smoke metrics and wiring checks.policy_query_headtomodels/vjepa/vjepa_model.py, updatedmodels/vjepa/planning.pyto return/use policy-query vectors and compute action priors by similarity, and preserved backward-compatibleVisualExecutionModelalias.vjepa_train.pynow accepts--config, detectsffmpegbefore generating synthetic videos, usestraining.epochsdefaulting to 100, and improved data-directory handling.models/adaptive_depth.pynow constructs returned carry usingtype(carry)instead of a concrete class,models/topological.pyfixes shapes and pooling logic for Betti estimates, and other wiring updates acrossmodels/*referenced by docs and the index.Testing
python -m compileall -q .completed successfully with no compile errors.python check_integrations.pywhich performed a forward pass and a small MCTS plan and printed an "Integration check passed" message.python evaluate_world_model.py --config config/vjepa_micro.yaml --seed 42which produced and saved JSON metrics (rollout drift, trajectory divergence, action-prior stats).python evaluate_perception.py --config config/vjepa_micro.yaml --seed 42which produced and saved latent-consistency metrics; all scripted smoke checks completed without runtime failures.Codex Task
CodeAnt-AI Description
Add VEM documentation, evaluation smoke checks, and learned planning priors
What Changed
Impact
✅ Clearer model setup and training steps✅ More reliable planning action ranking✅ Fewer setup failures when ffmpeg is missing💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.