Skip to content

Force clarifier to search before asking clarification questions (#234)#245

Open
torkian wants to merge 6 commits into
NVIDIA-AI-Blueprints:developfrom
torkian:fix/clarifier-search-before-questions
Open

Force clarifier to search before asking clarification questions (#234)#245
torkian wants to merge 6 commits into
NVIDIA-AI-Blueprints:developfrom
torkian:fix/clarifier-search-before-questions

Conversation

@torkian
Copy link
Copy Markdown

@torkian torkian commented May 19, 2026

Closes #234.

Summary

The clarifier agent already had tool-calling bound to a search loop, but weak or non-frontier models routinely skipped the search and asked the user a clarification question immediately. The expected behavior — and what the issue asks for — is to gather context with the available tools first, and only ask the user once a genuine ambiguity remains.

This fixes the bug at the agent level (the issue explicitly says "this should not be overly prompt engineered for one model"), with a small prompt update as a supporting hint.

Changes

  • Inline search-before-clarify retry in agent_node. On the first turn, if the model returns a clarification request without using its bound search tools, the node nudges it once with FORCE_SEARCH_GUIDANCE and re-invokes the LLM inline, returning both the original response and the retry. No extra graph node or state field is needed.
  • One-shot by construction. The retry fires only when iteration == 0 (so it never fires after the user has replied) and not _has_tool_invocations(state.messages) (so once any tool call is in history — e.g. after a successful forced search, even while iteration is still 0 — it never re-nudges). A model that stubbornly refuses to search after the single nudge simply falls through to asking the user; no loop.
  • Guidance stays ephemeral. FORCE_SEARCH_GUIDANCE is sent only in the local retry_messages and is never returned to state, so it cannot leak into get_latest_user_query or other history lookups (which would otherwise surface internal scaffolding in fallback text).
  • prompts/research_clarification.j2 Tool Usage section rewritten as "Search first, ask second" so frontier models follow this voluntarily and skip the nudge entirely.

Design note: an earlier revision used a dedicated force_search graph node plus a ClarifierAgentState.force_search_used flag. Per review (thanks @cdgamarose-nv), this was simplified to the inline retry above — same behavior, ~70 fewer source lines, no added graph state.

How I verified (end-to-end, against a real LLM + web search)

I exercised the full clarifier graph against a real NVIDIA NIM-hosted LLM and a real Tavily web search tool, with model nvidia/nemotron-3-nano-30b-a3b (the repo default in configs/config_cli_default.yml) and the query "Research the latest NVIDIA AI announcements from 2026":

Branch tool_calls emitted user prompted?
develop 0 (bug — asked the user immediately) yes
this PR 1 (searched first) no

Deterministic across three runs. (The throwaway reproducer script is not included in the PR, per review — these are its results.)

Tests

150 clarifier tests pass. Key additions in tests/aiq_agent/agents/clarifier/test_agent.py:

  • TestHasToolInvocations — the helper that detects prior tool calls in the message history.
  • TestClarifierForceSearch:
    • test_force_search_fires_when_llm_skips_tools — the bug path: model clarifies → nudge + inline retry → model emits a tool call → completes; user never prompted.
    • test_force_search_skipped_when_no_tools — no behavior change when no tools are configured.
    • test_force_search_fires_at_most_once — no loop if the model stubbornly refuses tools after the nudge.
    • test_force_search_not_triggered_when_llm_searches_first — no extra detour for models that already search.
    • test_force_search_guidance_not_in_state_messages — regression test: the internal guidance must never appear in user-facing fallback text.
    • test_force_search_guidance_not_injected_after_user_reply — regression test: the nudge does not re-fire after the user has answered.
    • test_forced_retry_does_not_emit_consecutive_assistant_messages — regression test for the message-ordering fix below.
  • TestClarifierSkipMessageOrdering — regression tests for the skip-path ordering fix (persisted history + planner sequence under enable_plan_approval).

Message-ordering correctness (from review)

Per @cdgamarose-nv's review the implementation was simplified to an inline retry; a follow-up Greptile review then flagged that returning both the skipped and retried responses produced two consecutive assistant-role messages once the retry carried a tool call, which the OpenAI/Anthropic APIs reject with a 400 (NVIDIA NIM tolerates it). Fixed by returning only the retry response. Auditing the same invariant across the graph surfaced a separate pre-existing instance on the skip-clarification path (skip emitted a completion without persisting the user reply, and graph re-entry appended a duplicate completion) — also fixed, with regression tests for both.

Test plan

  • pytest tests/aiq_agent/agents/clarifier/ — 150/150 passing
  • ruff check and ruff format clean
  • Reproduces the bug on develop against the repo's default model
  • Verifies the fix on this branch against the same model + query
  • Maintainer smoke test via the CLI / Web UI workflows

Closes NVIDIA-AI-Blueprints#234.

The clarifier agent had tool-calling bound to a search loop, but weak or
non-frontier models routinely skipped the search and asked the user a
clarification question immediately. The expected behavior is to gather
context with the available tools first and only ask once a genuine
ambiguity remains.

This is fixed at the graph level so it works model-agnostically rather
than relying on prompt engineering for any one model:

- New force_search node in the LangGraph workflow. When the LLM tries
  to ask for clarification on the first turn, tools are configured, and
  no tool call has happened yet, decide_route sends the run through
  force_search before ask_for_clarification.
- Guidance is injected ephemerally in agent_node (same pattern as the
  existing JSON_REMINDER_AFTER_TOOLS) rather than written into
  state.messages, so it cannot leak into helpers like
  get_latest_user_query that surface text back to the user.
- A force_search_used boolean on ClarifierAgentState makes the nudge
  fire at most once per run, so a model that ignores the nudge falls
  through to ask_for_clarification instead of looping.
- prompts/research_clarification.j2 is updated so the Tool Usage
  section instructs the model to "Search first, ask second" for any
  unfamiliar entity, acronym, or term. Frontier models follow this
  voluntarily and skip the structural nudge entirely.

End-to-end reproducer added at scripts/e2e_clarifier_search_test.py.
Against nvidia/nemotron-3-nano-30b-a3b (the project's default
clarifier model) with a real Tavily web search tool:

  develop:  0 tool calls, user prompted immediately   (bug)
  fix:      1 tool call,  user never prompted          (fixed)

The mock-based unit suite covers the new helper, the four routing
branches, and the regression flagged during review where the guidance
message could otherwise leak into the invalid-format fallback text.
All 148 clarifier tests pass.

Signed-off-by: Torkian <torkian@mac.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 19, 2026

Greptile Summary

This PR fixes the bug where the clarifier agent would prompt the user for clarification before using its bound search tools, by injecting an inline force-search retry inside agent_node on the first turn only. It also fixes two adjacent API-contract bugs: the skip-command branch now persists the user's HumanMessage before emitting AIMessage(complete), and a re-entry guard prevents a duplicate completion message from being appended when the clarification budget is already exhausted.

  • Force-search nudge: when iteration == 0 and no prior tool invocations exist and the model returns a clarification without tool calls, FORCE_SEARCH_GUIDANCE is injected ephemerally (not stored in state) and the LLM is retried once; only retry_response is returned to avoid consecutive assistant messages in history.
  • Skip-command fix: ask_clarification now returns [HumanMessage(user_reply), AIMessage(complete)] so the two assistant-role turns are always interleaved with a human turn.
  • Budget-exhaustion guard: agent_node detects when the last message is already a completion and returns {} instead of appending a second AIMessage(complete).

Confidence Score: 5/5

Safe to merge — the inline retry is tightly guarded by iteration and tool-invocation checks, the two API-contract bugs from prior review rounds are both addressed, and 147 tests including dedicated regression tests for each fix all pass.

The force-search nudge fires exactly once per conversation (gated on iteration==0 and absence of prior tool calls), is ephemeral (FORCE_SEARCH_GUIDANCE never enters state), and only retry_response is persisted — eliminating the consecutive-assistant-message violation. The skip-command HumanMessage interleaving and the budget-exhaustion early-exit guard close the other two previously identified issues. No new execution paths are left unguarded.

No files require special attention; the one note is a test-assertion coverage gap that does not affect production behavior.

Important Files Changed

Filename Overview
src/aiq_agent/agents/clarifier/agent.py Adds inline force-search retry in agent_node (iteration==0 guard + _has_tool_invocations check), fixes skip-command to persist HumanMessage before AIMessage(complete), and adds early-exit guard for budget-exhausted re-entry. Both previously-reported API-violation bugs are addressed.
src/aiq_agent/agents/clarifier/prompts/research_clarification.j2 Tool Usage section rewritten to 'Search first, ask second' with explicit rules for search-before-clarify; no logic changes.
tests/aiq_agent/agents/clarifier/test_agent.py Adds TestHasToolInvocations, TestClarifierForceSearch (6 scenarios), and TestClarifierSkipMessageOrdering (2 scenarios). Minor: the completion_ais >= 1 assertion in test_skip_persists_reply_and_no_consecutive_assistants is too broad and would not catch a missing completion message.

Reviews (5): Last reviewed commit: "Fix consecutive assistant messages on th..." | Re-trigger Greptile

Comment thread src/aiq_agent/agents/clarifier/agent.py Outdated
Comment thread scripts/e2e_clarifier_search_test.py Outdated
@cdgamarose-nv
Copy link
Copy Markdown
Collaborator

@torkian Thank you for your contribution. We will review soon.

torkian added 2 commits May 27, 2026 20:28
ruff (project config: force-single-line=true) flagged the combined
`from langchain_core.messages import HumanMessage, ToolMessage` line in
scripts/e2e_clarifier_search_test.py. Split it into two single-line
imports to match the rest of the codebase, and re-run ruff format.

Signed-off-by: Torkian <torkian@mac.com>
Greptile review on PR NVIDIA-AI-Blueprints#245 flagged two real issues:

P1 (agent.py): the ephemeral force_search guidance injection in
agent_node was missing the iteration==0 guard that decide_route already
applies. After force_search fired once and a stubborn model still
refused to call a tool, ask_for_clarification ran, the user replied,
and iteration advanced to 1 - but the next agent_node call still saw
force_search_used=True with no tool invocations in state.messages and
re-injected the nudge. The model then received "issue one focused tool
call now" immediately after the user's clarifying answer, causing a
gratuitous search instead of synthesizing the reply. Fixed by adding
state.iteration == 0 to the injection condition.

P2 (e2e script): the PASS verdict gated on tool_calls >= 1 only,
without verifying len(user_prompts) == 0. A run where the LLM both
searched AND prompted the user would have falsely printed "before any
user prompt" and exited 0. Now we require both conditions and print
distinct FAIL messages for the two failure modes.

Also cleaned up an unused mock side_effect in
test_force_search_guidance_not_in_state_messages (the user replies
"skip" which forces completion, so only two LLM calls actually happen),
and added test_force_search_guidance_not_injected_after_user_reply as a
regression test for the P1 fix. 149 clarifier tests pass.

Signed-off-by: Torkian <torkian@mac.com>
@torkian
Copy link
Copy Markdown
Author

torkian commented May 28, 2026

Thanks for the review! Addressed both findings in 8f18f80:

  • P1 (agent.py): added state.iteration == 0 to the agent_node ephemeral-injection condition so the nudge can't be re-applied after the user has already replied. Added test_force_search_guidance_not_injected_after_user_reply as a regression test.
  • P2 (e2e script): PASS verdict now requires both tool_calls >= 1 AND len(user_prompts) == 0, with distinct FAIL messages for the two failure modes.

Also cleaned up an unused mock side_effect in test_force_search_guidance_not_in_state_messages that Greptile flagged. 149 clarifier tests pass.

Copy link
Copy Markdown
Collaborator

@cdgamarose-nv cdgamarose-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR! Left some comments

Comment thread scripts/e2e_clarifier_search_test.py Outdated
@@ -0,0 +1,135 @@
"""End-to-end test for issue #234 fix.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't include this script in the PR. You can add test and results in PR description.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — removed scripts/e2e_clarifier_search_test.py in 1377209. Moved the end-to-end test approach and before/after results into the PR description.

Comment thread src/aiq_agent/agents/clarifier/agent.py Outdated

return "ask_for_clarification"

async def force_search_node(state: ClarifierAgentState):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we need to add a node or a variable in state. I was thinking something like this would be sufficient:

response = await bound_llm.ainvoke(messages)

if (
    self.tools
    and state.iteration == 0
    and not self._has_tool_invocations(state.messages)
    and not getattr(response, "tool_calls", None)
    and self._is_needed(response.content)
):
    logger.info("Clarifier: model skipped search; injecting guidance and retrying once")
    retry_messages = messages + [response, HumanMessage(content=FORCE_SEARCH_GUIDANCE)]
    retry_response = await bound_llm.ainvoke(retry_messages)
    return {"messages": [response, retry_response]}

return {"messages": [response]}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — adopted your inline approach in 1377209. Dropped the force_search node, its routing branch/edges, and the ClarifierAgentState.force_search_used field; the nudge-and-retry now lives directly in agent_node exactly as you sketched.

The one-shot guarantee holds without the flag from two conditions alone:

  • iteration == 0 — only the first turn; once the user replies, iteration advances and it never fires again.
  • not _has_tool_invocations(state.messages) — after a successful forced search the tool-call message is in history (even while iteration is still 0), so it won't re-nudge.

FORCE_SEARCH_GUIDANCE is passed only in the local retry_messages and never returned to state, so it can't leak into get_latest_user_query. Net −70 source lines, behavior unchanged (same LLM call sequence + final message history). All 147 clarifier tests pass, including the force-search and ephemeral-guidance regression tests.

Address review feedback on NVIDIA-AI-Blueprints#245: replace the dedicated force_search
graph node and the ClarifierAgentState.force_search_used flag with a
single inline retry inside agent_node.

When, on the first turn, the model asks for clarification without first
using its bound search tools, agent_node now nudges it once with
FORCE_SEARCH_GUIDANCE and re-invokes inline, returning both the original
response and the retry. This keeps the behavior model-agnostic without
adding graph state or an extra node hop.

The one-shot guarantee holds from two conditions alone, so the
force_search_used flag is no longer needed:
- iteration == 0: only fires on the first turn; once the user replies,
  iteration advances and it never fires again.
- not _has_tool_invocations(state.messages): once any tool call is in
  history (e.g. after a successful forced search, even while iteration
  is still 0), it never re-nudges.

FORCE_SEARCH_GUIDANCE is sent only in the local retry_messages and is
never returned to state, so it cannot leak into get_latest_user_query
or other history lookups.

Also remove scripts/e2e_clarifier_search_test.py from the PR per review;
the end-to-end test approach and results are documented in the PR
description instead.

Behavior is unchanged (same LLM call sequence and final message
history); all 147 clarifier tests pass, including the force-search
behavioral and ephemeral-guidance regression tests.

Signed-off-by: Torkian <torkian@mac.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 2, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Comment thread src/aiq_agent/agents/clarifier/agent.py Outdated
torkian added 2 commits June 1, 2026 22:38
Greptile review on NVIDIA-AI-Blueprints#245: the inline search-before-clarify retry returned
both the skipped clarification response and the retry response,
{"messages": [response, retry_response]}. When the retry carries a tool
call (the happy path), the persisted history became
[..., AIMessage(clarification), AIMessage(tool_call), ToolMessage, ...]
— two consecutive assistant-role messages, which the OpenAI Chat
Completions and Anthropic Messages APIs reject with a 400. NVIDIA NIM
tolerates it, so it did not surface in the NIM-backed e2e run, and the
mocked unit tests did not enforce message ordering.

Return only the retry response. The first attempt was already shown to
the model inside the local retry_messages (ephemeral, never persisted),
so dropping it from state loses nothing and keeps a valid sequence
regardless of whether the retry is a tool call or another clarification.

Add a regression test asserting no two consecutive AIMessages are ever
sent to the LLM across a forced-search run; it fails against the old
[response, retry_response] return and passes with the fix.

Signed-off-by: Torkian <torkian@mac.com>
While auditing the message ordering for the forced-search fix, a
separate pre-existing bug surfaced in the skip-command branch.

When a user types a skip command, ask_clarification returned an
AIMessage(complete) without persisting the user's reply, leaving it
directly after the prior AIMessage(clarification) — two consecutive
assistant messages. The unconditional ask_for_clarification -> agent
edge then re-entered agent_node with the clarification budget exhausted,
which appended a third AIMessage(complete). With enable_plan_approval
enabled this corrupted history flowed into the planner's ainvoke,
producing a 400 from the OpenAI/Anthropic APIs (NVIDIA NIM tolerates it).

Two coordinated fixes:
- The skip branch now persists the user's reply as a HumanMessage before
  the completion AIMessage, so the prior and new assistant turns are
  separated by a human message.
- agent_node's budget-exhausted branch no longer emits a completion when
  the last message is already a completion AIMessage (the skip branch's),
  avoiding the duplicate on graph re-entry.

Add regression tests covering both the persisted history ordering and
the planner ainvoke sequence under enable_plan_approval; both fail
against the prior behavior and pass with the fix.

Signed-off-by: Torkian <torkian@mac.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Force clarifier agent to trigger searches before asking questions

3 participants