Skip to content

Integrate Visual Execution Model (VEM): planning priors, eval harness, docs, and repo index#6

Open
jeevesh415 wants to merge 1 commit into
mainfrom
codex/review-project-vision-status-7koix8
Open

Integrate Visual Execution Model (VEM): planning priors, eval harness, docs, and repo index#6
jeevesh415 wants to merge 1 commit into
mainfrom
codex/review-project-vision-status-7koix8

Conversation

@jeevesh415
Copy link
Copy Markdown
Owner

@jeevesh415 jeevesh415 commented May 17, 2026

User description

Motivation

  • Unify V-JEPA + HRM into a single Visual Execution Model (VEM) stack and restore important repository artifacts removed accidentally.
  • Improve latent planning by replacing placeholder priors with a learned action-prior query to yield model-informed MCTS search.
  • Provide lightweight, reproducible evaluation and integration smoke checks to satisfy Phase-1 rigorous development gates.

Description

  • Renamed / reworked top-level README to present the repository as the "Visual Execution Model (VEM)" and document training/eval workflows and configs.
  • Added repository index and audit artifacts: CODE_ADDRESS_INDEX.md, AUDIT_RECHECK.md, and RECTIFICATION_STATUS.md to record line-level mapping, recheck steps, and rectification summary.
  • Introduced evaluation and integration harnesses: evaluate_world_model.py, evaluate_perception.py, and check_integrations.py for smoke metrics and wiring checks.
  • Implemented planning/action-prior changes: added policy_query_head to models/vjepa/vjepa_model.py, updated models/vjepa/planning.py to return/use policy-query vectors and compute action priors by similarity, and preserved backward-compatible VisualExecutionModel alias.
  • Restored and/or synchronized important training and dataset components and added practical fixes: vjepa_train.py now accepts --config, detects ffmpeg before generating synthetic videos, uses training.epochs defaulting to 100, and improved data-directory handling.
  • Small model fixes and robustness changes: models/adaptive_depth.py now constructs returned carry using type(carry) instead of a concrete class, models/topological.py fixes shapes and pooling logic for Betti estimates, and other wiring updates across models/* referenced by docs and the index.

Testing

  • Ran repository sanity checks: python -m compileall -q . completed successfully with no compile errors.
  • Executed integration smoke via python check_integrations.py which performed a forward pass and a small MCTS plan and printed an "Integration check passed" message.
  • Executed world-model smoke via python evaluate_world_model.py --config config/vjepa_micro.yaml --seed 42 which produced and saved JSON metrics (rollout drift, trajectory divergence, action-prior stats).
  • Executed perception smoke via python evaluate_perception.py --config config/vjepa_micro.yaml --seed 42 which produced and saved latent-consistency metrics; all scripted smoke checks completed without runtime failures.

Codex Task


CodeAnt-AI Description

Add VEM smoke checks, policy-guided planning, and repo recovery notes

What Changed

  • Added lightweight checks for model forward passes, latent planning, world-model rollouts, and perception robustness, with results saved as JSON run reports.
  • Replaced placeholder planning priors with learned action scoring from the model, so MCTS can prefer actions that match the current latent state.
  • Made training easier to run from the command line and more resilient when ffmpeg is missing or when epoch count is set in config.
  • Fixed carry and topology shape handling so adaptive-depth and topology modules return consistent outputs.
  • Updated the README and added audit/status documents and a code index to record the restored repository state and recheck results.

Impact

✅ Faster model smoke checks
✅ Clearer planning action selection
✅ Fewer training setup failures

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented May 17, 2026

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

Warning

Rate limit exceeded

@jeevesh415 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 50 minutes and 31 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 83f03798-1b16-4895-9a49-19b3a362851d

📥 Commits

Reviewing files that changed from the base of the PR and between 2c66f89 and 5ef3b40.

📒 Files selected for processing (15)
  • AUDIT_RECHECK.md
  • CODE_ADDRESS_INDEX.md
  • README.md
  • RECTIFICATION_STATUS.md
  • check_integrations.py
  • docs/FRONTIER_GAP_ANALYSIS.md
  • docs/HUMAN_VISION_EXECUTION_EVAL_SPEC.md
  • docs/RIGOROUS_DEVELOPMENT_PROTOCOL.md
  • evaluate_perception.py
  • evaluate_world_model.py
  • models/adaptive_depth.py
  • models/topological.py
  • models/vjepa/planning.py
  • models/vjepa/vjepa_model.py
  • vjepa_train.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/review-project-vision-status-7koix8

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codeant-ai codeant-ai Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files label May 17, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ef3b40771

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread check_integrations.py
pt, ph, pw = cfg["encoder"]["patch_size"]
seq_len = (t // pt) * (h // ph) * (w // pw)
num_mask = max(1, seq_len // 4)
mask = torch.randperm(seq_len)[:num_mask]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Build boolean mask for the integration forward pass

The smoke test creates mask as index positions (torch.randperm(...)) instead of a boolean patch mask, but the training path feeds boolean masks from dataset/video_dataset.py. In apply_mask, ~mask on an integer tensor is interpreted as bitwise inversion, so the “visible” patches are selected from unintended negative indices and the test no longer validates real masking behavior. This can let integration checks pass even when the boolean-mask path is broken.

Useful? React with 👍 / 👎.

Comment thread check_integrations.py
model.eval()

bsz = 1
t = cfg["encoder"].get("max_t", 8)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use full clip length when generating synthetic video

The integration clip uses t = max_t frames, but the encoder config defines max_t in patch units and training uses clip_len = max_t * patch_size[0]. With the current shorter clip, sequence length is reduced and a correct boolean mask of that length causes an index mismatch against rotary embeddings sized for max_t, so the smoke test avoids (rather than exercises) the production forward-path assumptions.

Useful? React with 👍 / 👎.

Comment thread models/vjepa/planning.py
Comment on lines +187 to +188
pooled_next_state = next_state.mean(dim=1) if next_state.ndim > 2 else next_state
policy_query = self.model.policy_query_head(pooled_next_state)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: _imagine_future now unconditionally calls self.model.policy_query_head, but MCTS is documented to require only predictor and value_head. Any existing model implementation without policy_query_head will now crash with AttributeError during expansion. Keep this path backward-compatible by guarding the call (as done in _expand) and returning a nullable/empty policy query when the head is absent. [api mismatch]

Severity Level: Major ⚠️
- ⚠️ Latent MCTS planner unusable with models sans policy head.
- ⚠️ Contract in MCTS docstring no longer matches implementation.
Steps of Reproduction ✅
1. Inspect the MCTS contract in `models/vjepa/planning.py:120-137`, which documents
`model: the VJEPA model (must have predictor and value_head)` and does not mention any
`policy_query_head` requirement.

2. Observe `_imagine_future` in `models/vjepa/planning.py:20-51`, which unconditionally
executes `policy_query = self.model.policy_query_head(pooled_next_state)` at lines 47-49,
without checking for the attribute.

3. Instantiate `MCTS` with any model object that follows the documented minimum contract
(has `predictor.physics_engine` and `value_head` but no `policy_query_head`), then call
`plan()` as implemented in `models/vjepa/planning.py:10-60`; during expansion, `_expand()`
at lines 83-135 calls `_imagine_future(node.state, action)` and hits the unconditional
`policy_query_head` access.

4. At runtime, when `_imagine_future` executes for such a model, Python raises
`AttributeError: '<Model>' object has no attribute 'policy_query_head'` at
`models/vjepa/planning.py:47-49`, demonstrating that the planner now requires an
undocumented head and breaks backward compatibility, unlike `_expand`, which correctly
guards on `hasattr(self.model, "policy_query_head")` at lines 112-117.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** models/vjepa/planning.py
**Line:** 187:188
**Comment:**
	*Api Mismatch: `_imagine_future` now unconditionally calls `self.model.policy_query_head`, but `MCTS` is documented to require only `predictor` and `value_head`. Any existing model implementation without `policy_query_head` will now crash with `AttributeError` during expansion. Keep this path backward-compatible by guarding the call (as done in `_expand`) and returning a nullable/empty policy query when the head is absent.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Comment thread check_integrations.py
pt, ph, pw = cfg["encoder"]["patch_size"]
seq_len = (t // pt) * (h // ph) * (w // pw)
num_mask = max(1, seq_len // 4)
mask = torch.randperm(seq_len)[:num_mask]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: mask is built as an index tensor, but the model's apply_mask path expects a boolean mask (True = masked). Passing indices here causes invalid boolean/integer inversion behavior in masking (~mask) and can produce incorrect indexing or runtime failures. Build a boolean mask tensor of length seq_len instead of a list of selected indices. [type error]

Severity Level: Major ⚠️
- ❌ Integration script misuses masking, not exercising boolean mask path.
- ⚠️ Potential future failures if batch size becomes greater one.
Steps of Reproduction ✅
1. Run the integration harness via `python check_integrations.py` which executes `main()`
defined in `check_integrations.py:13-56`.

2. Inside `main()`, a random mask tensor is created at `check_integrations.py:29-32` as
`mask = torch.randperm(seq_len)[:num_mask]`, yielding a 1D LongTensor of indices with
shape `(num_mask,)`, not a boolean mask of length `seq_len`.

3. The `batch` dict is constructed at `check_integrations.py:34-39` with this index-valued
`mask` and passed into `VisualExecutionModel` from `models/vjepa/vjepa_model.py` via the
call `out = model(batch)` at `check_integrations.py:41`.

4. In `VisualExecutionModel.forward` (`models/vjepa/vjepa_model.py:86-101`), `mask =
batch["mask"]` is forwarded to `apply_mask(all_latents, mask)` and
`apply_mask(target_latents, mask)`; `apply_mask` is implemented in
`models/vjepa/utils.py:21-44` and documented to expect a boolean mask (`get_block_mask()`
at `utils.py:3-19` returns `dtype=torch.bool` and `apply_mask` comments at
`utils.py:23-25` state `mask: (seq_len,) or (bs, seq_len)` with boolean semantics).

5. Because `mask` is a LongTensor of indices, `apply_mask` treats it incorrectly: it uses
`~mask` at `utils.py:32` and `utils.py:41` intending boolean inversion, but bitwise
negation on integer indices produces negative values, so `x[i, ~mask[i]]` and `x[i,
mask[i]]` index using arbitrary integer positions instead of selecting visible vs masked
patches, breaking the masking logic used by the model in this integration check.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** check_integrations.py
**Line:** 32:32
**Comment:**
	*Type Error: `mask` is built as an index tensor, but the model's `apply_mask` path expects a boolean mask (`True` = masked). Passing indices here causes invalid boolean/integer inversion behavior in masking (`~mask`) and can produce incorrect indexing or runtime failures. Build a boolean mask tensor of length `seq_len` instead of a list of selected indices.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Comment thread evaluate_world_model.py
Comment on lines +100 to +105
available_actions = torch.randn(num_actions, 128, device=device)
pooled = traj_a[-1].mean(dim=1)
query = model.policy_query_head(pooled).squeeze(0)
logits = torch.matmul(available_actions, query)
probs = torch.softmax(logits, dim=0)
max_prior = float(probs.max().item())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The script accepts --num-actions without enforcing it to be positive, but later computes probs.max(). When num_actions is 0, available_actions and probs are empty and probs.max() raises a runtime error. Reject non-positive values (or handle the empty-action case) before computing priors. [logic error]

Severity Level: Major ⚠️
- ❌ World-model eval crashes when --num-actions is zero.
- ⚠️ Evaluation harness unusable for configurations with no candidate actions.
Steps of Reproduction ✅
1. Invoke the world-model evaluation harness with zero actions, e.g. `python
evaluate_world_model.py --num-actions 0`, which runs `main()` in
`evaluate_world_model.py:116-166`.

2. Argument parsing at `evaluate_world_model.py:117-123` stores `args.num_actions = 0`;
there is no validation enforcing this to be positive before `args` is used.

3. `main()` calls `evaluate_metrics(model=model, device=device,
rollout_steps=args.rollout_steps, num_actions=args.num_actions)` at
`evaluate_world_model.py:137-142`, passing `num_actions=0` into `evaluate_metrics` defined
at `evaluate_world_model.py:76-113`.

4. Inside `evaluate_metrics`, `available_actions = torch.randn(num_actions, 128,
device=device)` at line 100 creates an empty tensor of shape `(0, 128)`, leading to
`logits = torch.matmul(available_actions, query)` at line 103 and `probs =
torch.softmax(logits, dim=0)` at line 104 also being empty; the subsequent reduction
`probs.max()` at line 105 is applied to an empty tensor and raises a runtime
`RuntimeError` (max reduction on zero elements), causing the evaluation script to crash
before writing the manifest or printing metrics.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** evaluate_world_model.py
**Line:** 100:105
**Comment:**
	*Logic Error: The script accepts `--num-actions` without enforcing it to be positive, but later computes `probs.max()`. When `num_actions` is `0`, `available_actions` and `probs` are empty and `probs.max()` raises a runtime error. Reject non-positive values (or handle the empty-action case) before computing priors.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented May 17, 2026

CodeAnt AI finished reviewing your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

codex size:XXL This PR changes 1000+ lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant