forked from sapientinc/HRM
-
Notifications
You must be signed in to change notification settings - Fork 0
Add world-model evaluation harness, learned MCTS action priors, docs, and README/training updates #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jeevesh415
wants to merge
7
commits into
main
Choose a base branch
from
codex/review-project-vision-status-1bv5jo
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Add world-model evaluation harness, learned MCTS action priors, docs, and README/training updates #4
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
0db3f49
Add phase-1 rigorous world-model evaluation harness and protocol
jeevesh415 07b8481
Add perception evaluation track and harden training bootstrap fallback
jeevesh415 52e879b
Unify naming under Visual Execution Model framework
jeevesh415 9429fdf
Remove puzzle-era components and harden VEM integrations
jeevesh415 a6414ad
Regenerate full line-level CODE_ADDRESS_INDEX after repo changes
jeevesh415 a3e1075
Restore previously removed legacy modules and resync CODE_ADDRESS_INDEX
jeevesh415 4f6bdc8
Add explicit post-PR audit record for conflict/integration recheck
jeevesh415 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| # Post-PR Recheck Audit | ||
|
|
||
| Date: 2026-05-17 (UTC) | ||
|
|
||
| ## Why this file | ||
| User requested a visible commit-level proof that the repository was rechecked for mistakes/conflicts after the VEM integration PR. | ||
|
|
||
| ## Recheck steps performed | ||
| 1. Searched for unresolved merge markers (`<<<<<<<`, `=======`, `>>>>>>>`). | ||
| 2. Ran repository compile sanity (`python -m compileall -q .`). | ||
| 3. Ran integration smoke (`python check_integrations.py`). | ||
| 4. Ran world-model eval smoke (`evaluate_world_model.py`). | ||
| 5. Ran perception eval smoke (`evaluate_perception.py`). | ||
|
|
||
| ## Outcome | ||
| - No merge conflict markers found. | ||
| - Compile sanity passed. | ||
| - Integration smoke passed. | ||
| - Both evaluation scripts executed and emitted metrics. | ||
|
|
||
| ## Note | ||
| This commit is intentionally documentation-only to provide an explicit, auditable record in git history. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use explicit Python invocation for reproducible audit steps (Line 12, Line 13).
These two commands are documented as bare script names, unlike Line 11. That can fail in many environments unless executable permissions/shebangs are set. Use explicit
python ...for consistency and reproducibility.Suggested doc fix
📝 Committable suggestion
🧰 Tools
🪛 LanguageTool
[style] ~12-~12: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ke (
python check_integrations.py). 4. Ran world-model eval smoke (`evaluate_world...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~13-~13: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l smoke (
evaluate_world_model.py). 5. Ran perception eval smoke (`evaluate_percep...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🤖 Prompt for AI Agents