Force clarifier to search before asking clarification questions (#234) by torkian · Pull Request #245 · NVIDIA-AI-Blueprints/aiq

torkian · 2026-05-19T02:32:24Z

Closes #234.

Summary

The clarifier agent already had tool-calling bound to a search loop, but weak or non-frontier models routinely skipped the search and asked the user a clarification question immediately. The expected behavior — and what the issue asks for — is to gather context with the available tools first, and only ask the user once a genuine ambiguity remains.

This fixes the bug at the agent level (the issue explicitly says "this should not be overly prompt engineered for one model"), with a small prompt update as a supporting hint.

Changes

Inline search-before-clarify retry in agent_node. On the first turn, if the model returns a clarification request without using its bound search tools, the node nudges it once with FORCE_SEARCH_GUIDANCE and re-invokes the LLM inline, returning both the original response and the retry. No extra graph node or state field is needed.
One-shot by construction. The retry fires only when iteration == 0 (so it never fires after the user has replied) and not _has_tool_invocations(state.messages) (so once any tool call is in history — e.g. after a successful forced search, even while iteration is still 0 — it never re-nudges). A model that stubbornly refuses to search after the single nudge simply falls through to asking the user; no loop.
Guidance stays ephemeral. FORCE_SEARCH_GUIDANCE is sent only in the local retry_messages and is never returned to state, so it cannot leak into get_latest_user_query or other history lookups (which would otherwise surface internal scaffolding in fallback text).
prompts/research_clarification.j2 Tool Usage section rewritten as "Search first, ask second" so frontier models follow this voluntarily and skip the nudge entirely.

Design note: an earlier revision used a dedicated force_search graph node plus a ClarifierAgentState.force_search_used flag. Per review (thanks @cdgamarose-nv), this was simplified to the inline retry above — same behavior, ~70 fewer source lines, no added graph state.

How I verified (end-to-end, against a real LLM + web search)

I exercised the full clarifier graph against a real NVIDIA NIM-hosted LLM and a real Tavily web search tool, with model nvidia/nemotron-3-nano-30b-a3b (the repo default in configs/config_cli_default.yml) and the query "Research the latest NVIDIA AI announcements from 2026":

Branch	tool_calls emitted	user prompted?
`develop`	0 (bug — asked the user immediately)	yes
this PR	1 (searched first)	no

Deterministic across three runs. (The throwaway reproducer script is not included in the PR, per review — these are its results.)

Tests

150 clarifier tests pass. Key additions in tests/aiq_agent/agents/clarifier/test_agent.py:

TestHasToolInvocations — the helper that detects prior tool calls in the message history.
TestClarifierForceSearch:
- test_force_search_fires_when_llm_skips_tools — the bug path: model clarifies → nudge + inline retry → model emits a tool call → completes; user never prompted.
- test_force_search_skipped_when_no_tools — no behavior change when no tools are configured.
- test_force_search_fires_at_most_once — no loop if the model stubbornly refuses tools after the nudge.
- test_force_search_not_triggered_when_llm_searches_first — no extra detour for models that already search.
- test_force_search_guidance_not_in_state_messages — regression test: the internal guidance must never appear in user-facing fallback text.
- test_force_search_guidance_not_injected_after_user_reply — regression test: the nudge does not re-fire after the user has answered.
- test_forced_retry_does_not_emit_consecutive_assistant_messages — regression test for the message-ordering fix below.
TestClarifierSkipMessageOrdering — regression tests for the skip-path ordering fix (persisted history + planner sequence under enable_plan_approval).

Message-ordering correctness (from review)

Per @cdgamarose-nv's review the implementation was simplified to an inline retry; a follow-up Greptile review then flagged that returning both the skipped and retried responses produced two consecutive assistant-role messages once the retry carried a tool call, which the OpenAI/Anthropic APIs reject with a 400 (NVIDIA NIM tolerates it). Fixed by returning only the retry response. Auditing the same invariant across the graph surfaced a separate pre-existing instance on the skip-clarification path (skip emitted a completion without persisting the user reply, and graph re-entry appended a duplicate completion) — also fixed, with regression tests for both.

Test plan

pytest tests/aiq_agent/agents/clarifier/ — 150/150 passing
ruff check and ruff format clean
Reproduces the bug on develop against the repo's default model
Verifies the fix on this branch against the same model + query
Maintainer smoke test via the CLI / Web UI workflows

Closes NVIDIA-AI-Blueprints#234. The clarifier agent had tool-calling bound to a search loop, but weak or non-frontier models routinely skipped the search and asked the user a clarification question immediately. The expected behavior is to gather context with the available tools first and only ask once a genuine ambiguity remains. This is fixed at the graph level so it works model-agnostically rather than relying on prompt engineering for any one model: - New force_search node in the LangGraph workflow. When the LLM tries to ask for clarification on the first turn, tools are configured, and no tool call has happened yet, decide_route sends the run through force_search before ask_for_clarification. - Guidance is injected ephemerally in agent_node (same pattern as the existing JSON_REMINDER_AFTER_TOOLS) rather than written into state.messages, so it cannot leak into helpers like get_latest_user_query that surface text back to the user. - A force_search_used boolean on ClarifierAgentState makes the nudge fire at most once per run, so a model that ignores the nudge falls through to ask_for_clarification instead of looping. - prompts/research_clarification.j2 is updated so the Tool Usage section instructs the model to "Search first, ask second" for any unfamiliar entity, acronym, or term. Frontier models follow this voluntarily and skip the structural nudge entirely. End-to-end reproducer added at scripts/e2e_clarifier_search_test.py. Against nvidia/nemotron-3-nano-30b-a3b (the project's default clarifier model) with a real Tavily web search tool: develop: 0 tool calls, user prompted immediately (bug) fix: 1 tool call, user never prompted (fixed) The mock-based unit suite covers the new helper, the four routing branches, and the regression flagged during review where the guidance message could otherwise leak into the invalid-format fallback text. All 148 clarifier tests pass. Signed-off-by: Torkian <torkian@mac.com>

greptile-apps · 2026-05-19T02:37:11Z

Greptile Summary

This PR fixes the bug where the clarifier agent would prompt the user for clarification before using its bound search tools, by injecting an inline force-search retry inside agent_node on the first turn only. It also fixes two adjacent API-contract bugs: the skip-command branch now persists the user's HumanMessage before emitting AIMessage(complete), and a re-entry guard prevents a duplicate completion message from being appended when the clarification budget is already exhausted.

Force-search nudge: when iteration == 0 and no prior tool invocations exist and the model returns a clarification without tool calls, FORCE_SEARCH_GUIDANCE is injected ephemerally (not stored in state) and the LLM is retried once; only retry_response is returned to avoid consecutive assistant messages in history.
Skip-command fix: ask_clarification now returns [HumanMessage(user_reply), AIMessage(complete)] so the two assistant-role turns are always interleaved with a human turn.
Budget-exhaustion guard: agent_node detects when the last message is already a completion and returns {} instead of appending a second AIMessage(complete).

Confidence Score: 5/5

Safe to merge — the inline retry is tightly guarded by iteration and tool-invocation checks, the two API-contract bugs from prior review rounds are both addressed, and 147 tests including dedicated regression tests for each fix all pass.

The force-search nudge fires exactly once per conversation (gated on iteration==0 and absence of prior tool calls), is ephemeral (FORCE_SEARCH_GUIDANCE never enters state), and only retry_response is persisted — eliminating the consecutive-assistant-message violation. The skip-command HumanMessage interleaving and the budget-exhaustion early-exit guard close the other two previously identified issues. No new execution paths are left unguarded.

No files require special attention; the one note is a test-assertion coverage gap that does not affect production behavior.

Important Files Changed

Filename	Overview
src/aiq_agent/agents/clarifier/agent.py	Adds inline force-search retry in agent_node (iteration==0 guard + _has_tool_invocations check), fixes skip-command to persist HumanMessage before AIMessage(complete), and adds early-exit guard for budget-exhausted re-entry. Both previously-reported API-violation bugs are addressed.
src/aiq_agent/agents/clarifier/prompts/research_clarification.j2	Tool Usage section rewritten to 'Search first, ask second' with explicit rules for search-before-clarify; no logic changes.
tests/aiq_agent/agents/clarifier/test_agent.py	Adds TestHasToolInvocations, TestClarifierForceSearch (6 scenarios), and TestClarifierSkipMessageOrdering (2 scenarios). Minor: the completion_ais >= 1 assertion in test_skip_persists_reply_and_no_consecutive_assistants is too broad and would not catch a missing completion message.

_{Reviews (5): Last reviewed commit: "Fix consecutive assistant messages on th..." | Re-trigger Greptile}

cdgamarose-nv · 2026-05-26T19:47:13Z

@torkian Thank you for your contribution. We will review soon.

ruff (project config: force-single-line=true) flagged the combined `from langchain_core.messages import HumanMessage, ToolMessage` line in scripts/e2e_clarifier_search_test.py. Split it into two single-line imports to match the rest of the codebase, and re-run ruff format. Signed-off-by: Torkian <torkian@mac.com>

Greptile review on PR NVIDIA-AI-Blueprints#245 flagged two real issues: P1 (agent.py): the ephemeral force_search guidance injection in agent_node was missing the iteration==0 guard that decide_route already applies. After force_search fired once and a stubborn model still refused to call a tool, ask_for_clarification ran, the user replied, and iteration advanced to 1 - but the next agent_node call still saw force_search_used=True with no tool invocations in state.messages and re-injected the nudge. The model then received "issue one focused tool call now" immediately after the user's clarifying answer, causing a gratuitous search instead of synthesizing the reply. Fixed by adding state.iteration == 0 to the injection condition. P2 (e2e script): the PASS verdict gated on tool_calls >= 1 only, without verifying len(user_prompts) == 0. A run where the LLM both searched AND prompted the user would have falsely printed "before any user prompt" and exited 0. Now we require both conditions and print distinct FAIL messages for the two failure modes. Also cleaned up an unused mock side_effect in test_force_search_guidance_not_in_state_messages (the user replies "skip" which forces completion, so only two LLM calls actually happen), and added test_force_search_guidance_not_injected_after_user_reply as a regression test for the P1 fix. 149 clarifier tests pass. Signed-off-by: Torkian <torkian@mac.com>

torkian · 2026-05-28T17:00:11Z

Thanks for the review! Addressed both findings in 8f18f80:

P1 (agent.py): added state.iteration == 0 to the agent_node ephemeral-injection condition so the nudge can't be re-applied after the user has already replied. Added test_force_search_guidance_not_injected_after_user_reply as a regression test.
P2 (e2e script): PASS verdict now requires both tool_calls >= 1 AND len(user_prompts) == 0, with distinct FAIL messages for the two failure modes.

Also cleaned up an unused mock side_effect in test_force_search_guidance_not_in_state_messages that Greptile flagged. 149 clarifier tests pass.

cdgamarose-nv

Thanks for your PR! Left some comments

cdgamarose-nv · 2026-06-02T00:15:55Z

@@ -0,0 +1,135 @@
+"""End-to-end test for issue #234 fix.


Don't include this script in the PR. You can add test and results in PR description.

Done — removed scripts/e2e_clarifier_search_test.py in 1377209. Moved the end-to-end test approach and before/after results into the PR description.

cdgamarose-nv · 2026-06-02T01:04:12Z

+
            return "ask_for_clarification"

+        async def force_search_node(state: ClarifierAgentState):


Wondering if we need to add a node or a variable in state. I was thinking something like this would be sufficient:

response = await bound_llm.ainvoke(messages) if ( self.tools and state.iteration == 0 and not self._has_tool_invocations(state.messages) and not getattr(response, "tool_calls", None) and self._is_needed(response.content) ): logger.info("Clarifier: model skipped search; injecting guidance and retrying once") retry_messages = messages + [response, HumanMessage(content=FORCE_SEARCH_GUIDANCE)] retry_response = await bound_llm.ainvoke(retry_messages) return {"messages": [response, retry_response]} return {"messages": [response]}

Good call — adopted your inline approach in 1377209. Dropped the force_search node, its routing branch/edges, and the ClarifierAgentState.force_search_used field; the nudge-and-retry now lives directly in agent_node exactly as you sketched.

The one-shot guarantee holds without the flag from two conditions alone:

iteration == 0 — only the first turn; once the user replies, iteration advances and it never fires again.

not _has_tool_invocations(state.messages) — after a successful forced search the tool-call message is in history (even while iteration is still 0), so it won't re-nudge.

FORCE_SEARCH_GUIDANCE is passed only in the local retry_messages and never returned to state, so it can't leak into get_latest_user_query. Net −70 source lines, behavior unchanged (same LLM call sequence + final message history). All 147 clarifier tests pass, including the force-search and ephemeral-guidance regression tests.

Address review feedback on NVIDIA-AI-Blueprints#245: replace the dedicated force_search graph node and the ClarifierAgentState.force_search_used flag with a single inline retry inside agent_node. When, on the first turn, the model asks for clarification without first using its bound search tools, agent_node now nudges it once with FORCE_SEARCH_GUIDANCE and re-invokes inline, returning both the original response and the retry. This keeps the behavior model-agnostic without adding graph state or an extra node hop. The one-shot guarantee holds from two conditions alone, so the force_search_used flag is no longer needed: - iteration == 0: only fires on the first turn; once the user replies, iteration advances and it never fires again. - not _has_tool_invocations(state.messages): once any tool call is in history (e.g. after a successful forced search, even while iteration is still 0), it never re-nudges. FORCE_SEARCH_GUIDANCE is sent only in the local retry_messages and is never returned to state, so it cannot leak into get_latest_user_query or other history lookups. Also remove scripts/e2e_clarifier_search_test.py from the PR per review; the end-to-end test approach and results are documented in the PR description instead. Behavior is unchanged (same LLM call sequence and final message history); all 147 clarifier tests pass, including the force-search behavioral and ephemeral-guidance regression tests. Signed-off-by: Torkian <torkian@mac.com>

copy-pr-bot · 2026-06-02T04:58:57Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Greptile review on NVIDIA-AI-Blueprints#245: the inline search-before-clarify retry returned both the skipped clarification response and the retry response, {"messages": [response, retry_response]}. When the retry carries a tool call (the happy path), the persisted history became [..., AIMessage(clarification), AIMessage(tool_call), ToolMessage, ...] — two consecutive assistant-role messages, which the OpenAI Chat Completions and Anthropic Messages APIs reject with a 400. NVIDIA NIM tolerates it, so it did not surface in the NIM-backed e2e run, and the mocked unit tests did not enforce message ordering. Return only the retry response. The first attempt was already shown to the model inside the local retry_messages (ephemeral, never persisted), so dropping it from state loses nothing and keeps a valid sequence regardless of whether the retry is a tool call or another clarification. Add a regression test asserting no two consecutive AIMessages are ever sent to the LLM across a forced-search run; it fails against the old [response, retry_response] return and passes with the fix. Signed-off-by: Torkian <torkian@mac.com>

While auditing the message ordering for the forced-search fix, a separate pre-existing bug surfaced in the skip-command branch. When a user types a skip command, ask_clarification returned an AIMessage(complete) without persisting the user's reply, leaving it directly after the prior AIMessage(clarification) — two consecutive assistant messages. The unconditional ask_for_clarification -> agent edge then re-entered agent_node with the clarification budget exhausted, which appended a third AIMessage(complete). With enable_plan_approval enabled this corrupted history flowed into the planner's ainvoke, producing a 400 from the OpenAI/Anthropic APIs (NVIDIA NIM tolerates it). Two coordinated fixes: - The skip branch now persists the user's reply as a HumanMessage before the completion AIMessage, so the prior and new assistant turns are separated by a human message. - agent_node's budget-exhausted branch no longer emits a completion when the last message is already a completion AIMessage (the skip branch's), avoiding the duplicate on graph re-entry. Add regression tests covering both the persisted history ordering and the planner ainvoke sequence under enable_plan_approval; both fail against the prior behavior and pass with the fix. Signed-off-by: Torkian <torkian@mac.com>

torkian mentioned this pull request May 19, 2026

Force clarifier agent to trigger searches before asking questions #234

Open

greptile-apps Bot reviewed May 19, 2026

View reviewed changes

Comment thread src/aiq_agent/agents/clarifier/agent.py Outdated

Comment thread scripts/e2e_clarifier_search_test.py Outdated

AjayThorve added the external-contribution label May 27, 2026

torkian added 2 commits May 27, 2026 20:28

cdgamarose-nv reviewed Jun 2, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread src/aiq_agent/agents/clarifier/agent.py Outdated

torkian added 2 commits June 1, 2026 22:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force clarifier to search before asking clarification questions (#234)#245

Force clarifier to search before asking clarification questions (#234)#245
torkian wants to merge 6 commits into
NVIDIA-AI-Blueprints:developfrom
torkian:fix/clarifier-search-before-questions

torkian commented May 19, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

cdgamarose-nv commented May 26, 2026

Uh oh!

torkian commented May 28, 2026

Uh oh!

cdgamarose-nv left a comment

Uh oh!

cdgamarose-nv Jun 2, 2026

Uh oh!

torkian Jun 2, 2026

Uh oh!

cdgamarose-nv Jun 2, 2026

Uh oh!

torkian Jun 2, 2026

Uh oh!

copy-pr-bot Bot commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		return "ask_for_clarification"

		async def force_search_node(state: ClarifierAgentState):

Conversation

torkian commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

How I verified (end-to-end, against a real LLM + web search)

Tests

Message-ordering correctness (from review)

Test plan

Uh oh!

greptile-apps Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

cdgamarose-nv commented May 26, 2026

Uh oh!

torkian commented May 28, 2026

Uh oh!

cdgamarose-nv left a comment

Choose a reason for hiding this comment

Uh oh!

cdgamarose-nv Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

torkian Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

cdgamarose-nv Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

torkian Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

copy-pr-bot Bot commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

torkian commented May 19, 2026 •

edited

Loading

greptile-apps Bot commented May 19, 2026 •

edited

Loading