You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/google/adk/evaluation/rubric_based_final_response_quality_v1.py
+11-2Lines changed: 11 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -25,6 +25,7 @@
25
25
from .eval_caseimportInvocationEvents
26
26
from .eval_metricsimportEvalMetric
27
27
from .eval_metricsimportRubricsBasedCriterion
28
+
from .llm_as_judge_utilsimportget_grounding_metadata_as_json_str
28
29
from .llm_as_judge_utilsimportget_text_from_content
29
30
from .llm_as_judge_utilsimportget_tool_calls_and_responses_as_json_str
30
31
from .llm_as_judge_utilsimportget_tool_declarations_as_json_str
@@ -45,8 +46,9 @@
45
46
46
47
# Key Evaluation Principles
47
48
Your evaluation must follow a two-part process: first, collect trusted evidence from the agent's work, and second, judge the final answer against it.
48
-
1. **Establish Trusted Evidence from Tool Calls**: You must first examine the agent's tool calls to determine if they are procedurally sound, meaning that the agent used the appropriate tools with logical parameters to address the user's prompt.
49
-
* Your ONLY sources of truth are the <user_prompt> and the direct output ('tool_response') from PROCEDURALLY SOUND tool calls found in the <response_steps>. Examples of procedural flaws include:
49
+
1. **Establish Trusted Evidence from Tool Calls and Grounding**: You must first examine the agent's tool calls to determine if they are procedurally sound, meaning that the agent used the appropriate tools with logical parameters to address the user's prompt.
50
+
* Your ONLY sources of truth are the <user_prompt>, the direct output ('tool_response') from PROCEDURALLY SOUND tool calls found in the <response_steps>, and model-supplied grounding metadata found in <grounding_metadata>.
51
+
* Grounding metadata is trusted evidence for model-internal tools such as google_search whose raw search results may not appear as function tool responses. Examples of procedural flaws include:
50
52
* The agent failed to call a tool that will enable it to answer the user's prompt despite having all the necessary parameters to do so.
51
53
* The agent called the tool with incorrect or missing parameters.
52
54
* The agent called a tool that does not exist, or called a tool with a parameter that does not exist.
0 commit comments