Harden planner, validation, and analysis handling#19
Conversation
|
This is a big surface-area change (~1.9k LOC across agent, prompts, semantic model, API, UI). It should be split into smaller PRs (schema/executor validation → evidence/claims → narrative validation → planner UI). |
|
Tighter coupling to one data source(premise step, period comparison, distinct periods, grouping columns). That’s a specific methodology, not “whatever shape the customer’s data is in.” More checks ⇒ more paths that never produce a narrative that still could have been useful (e.g. exploratory breakdowns, single-period insights). |
abhinavsingh9714
left a comment
There was a problem hiding this comment.
Needs more clarity, discussion and result comparisons
| ) | ||
|
|
||
|
|
||
| def _render_fallback_analysis(claims: list[ApprovedClaim], answer_status: str) -> str: |
There was a problem hiding this comment.
what is this fallback for?
There was a problem hiding this comment.
_render_fallback_analysis(...) is a deterministic safety path used when the normal LLM rendering step cannot produce a validated final answer. In the standard flow, we build approved claims from validated evidence and then ask the model to turn those claims into user-facing analysis. If that rendering step fails validation, returns unusable output, or if the workflow only has incomplete or caveat-level evidence, this function generates the response directly on the backend instead. It is not a single fixed fallback message; it still uses the approved claims that were already grounded in validated evidence, so it preserves whatever the system has actually established, such as a contradicted premise, a partial answer, or unresolved caveats. The purpose is to stay as close as possible to a reliable grounded conclusion while avoiding any unvalidated LLM wording or internal validator/orchestration details in the final user response.`
| return result | ||
|
|
||
|
|
||
| def _validate_step_expectation(expectation: StepExpectation) -> str | None: |
There was a problem hiding this comment.
Can you explain what is expectation? and where is it coming from?
There was a problem hiding this comment.
Here, \expectationis the structured contract for what a plan step is supposed to return. It is represented byStepExpectation` and includes things like the step category, comparison type, expected grouping columns, expected metric columns, expected period column, minimum row count, and whether distinct periods are required.
It originates from the planner output. When the compiled plan is created, each step includes an expectation block alongside the SQL. That block is defined in the planner schema and prompt, then carried through execution. In executor.py, we convert each compiled step into the internal execution shape, parse that expectation into a StepExpectation object, and use it to validate that the SQL result is not just syntactically valid, but also analytically valid for the intended purpose.`
|
|
||
| import re | ||
|
|
||
| _CANONICAL_METRIC_ALIASES = { |
There was a problem hiding this comment.
Why are we aliasing for one specific metric which may not be present is some other data?
There was a problem hiding this comment.
Good call. I updated this so metric alias normalization is now schema-aware rather than relying on a static hardcoded alias map. The planner and executor resolve canonical aggregate metric names from the normalized schema manifest and the SQL aggregate shape, which keeps the behavior aligned with the data-agnostic architecture we want.
There was a problem hiding this comment.
Can you explain all the changes in this file?
| try: | ||
| state = run_analysis(request.query) | ||
| base_response = AnalyzeResponse( | ||
| answer_status=state.get("answer_status", "insufficient_evidence"), |
There was a problem hiding this comment.
we are not using '/analyse' endpoint now.
There was a problem hiding this comment.
We are still using POST /analyze at the moment. The UI submits real chat prompts through that endpoint in ui/src/api/chat.ts, and the backend route is defined in app/api/routes.py. If we plan to move away from that endpoint, I agree we should clean up this response path as part of that change, but in the current code it is still active.
There was a problem hiding this comment.
What are we doing with the data here? Are we creating a new view of data?
There was a problem hiding this comment.
At this point we are not creating a new persisted view of the underlying data. What we are doing here is validating and shaping the query result in memory after execution so we can check whether it satisfies the analytical expectation for that step. For example, for a period comparison we inspect the returned columns and rows to confirm that the result actually contains the required metric, period field, and comparable periods before we treat it as valid evidence. So this is post-execution result validation, not creation of a new durable database view.
|
Not required anymore |
his PR hardens Planera’s backend analytics pipeline, with a focus on schema-grounded planning, semantic execution validation, and safer analysis generation.
Key improvements:
strengthened the normalized schema contract used by the planner, including richer relation metadata, relationships, and protected schema-subset selection
added deterministic step expectations so compiled plans carry required comparison shape, grouping, metric, and period requirements
tightened planner and repair behavior so invalid identifiers, weakened repairs, and expectation drift are rejected before misleading execution proceeds
upgraded executor validation from “query ran and returned rows” to semantic/result-shape checks, including period-comparison enforcement and grouped-comparison validation
improved workflow state handling so partial execution is preserved and valid earlier evidence is not discarded when later steps fail
expanded answer status handling to support answered, partial_answer, contradicted_premise, insufficient_evidence, and conflicting_evidence end to end
improved grounded analysis generation with deterministic evidence building, approved claims, contradiction-first behavior, verdict-first rendering, normalized period labels, and cleaner user-safe fallback messaging
exposed answer_status through the backend and frontend response contracts so these outcomes are visible in the product
added regression coverage for invalid identifier rejection, schema subset preservation, repair validation, contradiction handling, grouped-period validation, partial evidence preservation, and internal error leakage prevention
Overall, this makes the system more reliable, more transparent, and more resistant to both hallucinated analysis and semantically weak SQL repairs.