best effort tito by eligotts · Pull Request #955 · PrimeIntellect-ai/verifiers

eligotts · 2026-02-24T04:53:19Z

Description

Best effort tito. instead of assuming extension and looking back at strictly last trajectory step, we walk backward until we find a MESSAGES level prefix hit

tested on both wiki-search and eligottlieb/poker-multiagent, which was giving the loud failure previously with TITO as it does explicit rewriting of history (like context folding)

wiki search no regression:

tests on true branching envs (multi-agent setups):

what this shows is for each agent trajectory (extension per agent turn, but because all stuffed into same trajectory we have 'branching'), we are successfully retrieving the latest trajectory step and doing token in from that point.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Note

Medium Risk
Changes how token-stitching selects prior-turn tokens and adds a new fallback path; mistakes could cause silent behavior changes (more MITO calls) or incorrect token prefixes in production.

Overview
Makes token-stitching (TITO) best-effort: get_prompt_ids now scans the trajectory backwards to find the largest message-level prefix match (normalizing message structures) instead of assuming the last step matches, and returns None when no match exists.

Updates get_native_response to fall back to the standard /chat/completions path when get_prompt_ids returns None, avoiding broken /chat/completions/tokens requests on history rewrites/context folding. Adds async tests covering largest-prefix selection, no-prefix behavior, and ensuring the correct route is used depending on whether prompt token IDs are available.

^{Written by Cursor Bugbot for commit 6ae6aff. This will update automatically on new commits. Configure here.}

…ith a MESSAGE prefix match

mikasenghaas

yeap, this works!

mikasenghaas · 2026-02-24T10:06:26Z

verifiers/clients/openai_chat_completions_token_client.py

-        prev_turn_completion_ids = prev_turn_tokens["completion_ids"]
-        prev_turn_ids = prev_turn_prompt_ids + prev_turn_completion_ids
+
+        def normalize_for_comparison(value: Any) -> Any:


should we make this a general message_util? seems useful in other places too? also, vaguely remember we have a similar util to this alr but might be wrong

verifiers/clients/openai_chat_completions_token_client.py

mikasenghaas · 2026-02-24T10:08:32Z

verifiers/clients/openai_chat_completions_token_client.py


            return 0

        # we add suffix_ids to prev_turn_ids. suffix_ids are tokens that are added


i know unrelated to this pr but can i think we might be able to remove the suffix part since we tokenize the env_response_ids = full_ids[len(prev_turn_ids) :] and not tokenize env response ids in isolation

yep looks like some circular logic, might be kinda confusing because env_response_ids now might contain suffix/delimiters of assistant message but functionally will be the same

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

verifiers/clients/openai_chat_completions_token_client.py

cursor · 2026-02-24T19:41:57Z

verifiers/clients/openai_chat_completions_token_client.py

+                best_step_tokens = step_tokens
+                if best_prefix_len == len(normalized_prompt_messages):
+                    break
+


Per-turn backward scan may be expensive

Medium Severity

get_prompt_ids() now walks backward over the entire state["trajectory"] and calls to_native_prompt() per step until it finds the best prefix, which can add significant overhead on long trajectories and slow every generation turn.

cursor · 2026-02-24T19:50:14Z

verifiers/clients/openai_chat_completions_token_client.py

+            if best_step_tokens is None:
+                return None
+            return best_step_tokens["prompt_ids"] + best_step_tokens["completion_ids"]
+


Prefix match can miss equivalent messages

Medium Severity

get_prompt_ids’s new message-level prefix matcher compares normalized message objects for strict equality, which can differ across representations (e.g., to_native_prompt emitting {"content": None} while incoming prompt_messages omits content, or other default/extra fields). This can produce false “no prefix match”, disabling the token route unexpectedly.

this is not valid right, because we expect the trajectory step list to be append only essentially right? so even if there is some anomalies in the message, it would be the same across all steps

yeah i don't think we need to worry about this. the flow is native response (OAI types) -> custom types (ToolCall, AssistantMessage, etc) which get stored in TrajectoryStep. as long as get_prompt_messages doesn't too anything too crazy as is extending from some existing TrajectoryStep (like for RLMs, we have the individual subagent trajectories. but for true compaction, we dont do this extension, so we wouldn't find any message prefix), those prompt messages go from custom types -> native prompt (OAI types), then in this code snippet we are comparing to previous trajectory steps which also get transformed to native types. so everything is going from custom types -> native types through to_native_response so we shouldn't really be missing anything.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-24T20:05:20Z

verifiers/clients/openai_chat_completions_token_client.py

+
+        prev_turn_ids = await find_largest_prefix_match_tokens()
+        if prev_turn_ids is None:
+            return None


Prefix match ignores tool-dependent tokenization

Medium Severity

find_largest_prefix_match_tokens() selects a trajectory step using only a message-level prefix comparison, but the stitched prev_turn_ids are later combined with full_ids produced by tokenize(..., tools=oai_tools). If the effective tool set differs from when the matched step’s tokens were produced, prev_turn_ids may not align with full_ids, yielding incorrect env_response_ids and an invalid prompt for /chat/completions/tokens.

best effort tito, we look back in the trajectory list for last step w…

0ebf090

…ith a MESSAGE prefix match

eligotts changed the title ~~best effort tito, we look back in the trajectory list for last step w…~~ best effort tito Feb 24, 2026

mikasenghaas reviewed Feb 24, 2026

View reviewed changes

Merge origin/main into eli/best-effort-tito

e3e2c59

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor bot reviewed Feb 24, 2026

View reviewed changes

add docstring for backward scan func

9f8ff0e

cursor bot reviewed Feb 24, 2026

View reviewed changes

deal with empty prefix

6ae6aff

cursor bot reviewed Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

best effort tito#955

best effort tito#955
eligotts wants to merge 4 commits intomainfrom
eli/best-effort-tito

eligotts commented Feb 24, 2026 •

edited

Loading

Uh oh!

mikasenghaas left a comment

Uh oh!

mikasenghaas Feb 24, 2026

Uh oh!

Uh oh!

mikasenghaas Feb 24, 2026

Uh oh!

eligotts Mar 1, 2026

Uh oh!

Uh oh!

cursor bot Feb 24, 2026

Uh oh!

cursor bot Feb 24, 2026

Uh oh!

mikasenghaas Mar 1, 2026

Uh oh!

eligotts Mar 1, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		return 0

		# we add suffix_ids to prev_turn_ids. suffix_ids are tokens that are added

Conversation

eligotts commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

mikasenghaas left a comment

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikasenghaas Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

eligotts Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Per-turn backward scan may be expensive

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Prefix match can miss equivalent messages

Uh oh!

mikasenghaas Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

eligotts Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Prefix match ignores tool-dependent tokenization

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eligotts commented Feb 24, 2026 •

edited

Loading