Skip to content

Harden planner, validation, and analysis handling#19

Closed
Ay2012 wants to merge 2 commits into
mainfrom
feature/analysis-hardening
Closed

Harden planner, validation, and analysis handling#19
Ay2012 wants to merge 2 commits into
mainfrom
feature/analysis-hardening

Conversation

@Ay2012

@Ay2012 Ay2012 commented Apr 5, 2026

Copy link
Copy Markdown
Owner

his PR hardens Planera’s backend analytics pipeline, with a focus on schema-grounded planning, semantic execution validation, and safer analysis generation.

Key improvements:

strengthened the normalized schema contract used by the planner, including richer relation metadata, relationships, and protected schema-subset selection
added deterministic step expectations so compiled plans carry required comparison shape, grouping, metric, and period requirements
tightened planner and repair behavior so invalid identifiers, weakened repairs, and expectation drift are rejected before misleading execution proceeds
upgraded executor validation from “query ran and returned rows” to semantic/result-shape checks, including period-comparison enforcement and grouped-comparison validation
improved workflow state handling so partial execution is preserved and valid earlier evidence is not discarded when later steps fail
expanded answer status handling to support answered, partial_answer, contradicted_premise, insufficient_evidence, and conflicting_evidence end to end
improved grounded analysis generation with deterministic evidence building, approved claims, contradiction-first behavior, verdict-first rendering, normalized period labels, and cleaner user-safe fallback messaging
exposed answer_status through the backend and frontend response contracts so these outcomes are visible in the product
added regression coverage for invalid identifier rejection, schema subset preservation, repair validation, contradiction handling, grouped-period validation, partial evidence preservation, and internal error leakage prevention
Overall, this makes the system more reliable, more transparent, and more resistant to both hallucinated analysis and semantically weak SQL repairs.

@abhinavsingh9714

Copy link
Copy Markdown
Collaborator

This is a big surface-area change (~1.9k LOC across agent, prompts, semantic model, API, UI). It should be split into smaller PRs (schema/executor validation → evidence/claims → narrative validation → planner UI).

@abhinavsingh9714

Copy link
Copy Markdown
Collaborator

Tighter coupling to one data source(premise step, period comparison, distinct periods, grouping columns). That’s a specific methodology, not “whatever shape the customer’s data is in.” More checks ⇒ more paths that never produce a narrative that still could have been useful (e.g. exploratory breakdowns, single-period insights).

@abhinavsingh9714 abhinavsingh9714 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs more clarity, discussion and result comparisons

Comment thread app/agent/analysis.py
)


def _render_fallback_analysis(claims: list[ApprovedClaim], answer_status: str) -> str:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this fallback for?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_render_fallback_analysis(...) is a deterministic safety path used when the normal LLM rendering step cannot produce a validated final answer. In the standard flow, we build approved claims from validated evidence and then ask the model to turn those claims into user-facing analysis. If that rendering step fails validation, returns unusable output, or if the workflow only has incomplete or caveat-level evidence, this function generates the response directly on the backend instead. It is not a single fixed fallback message; it still uses the approved claims that were already grounded in validated evidence, so it preserves whatever the system has actually established, such as a contradicted premise, a partial answer, or unresolved caveats. The purpose is to stay as close as possible to a reliable grounded conclusion while avoiding any unvalidated LLM wording or internal validator/orchestration details in the final user response.`

Comment thread app/agent/executor.py
return result


def _validate_step_expectation(expectation: StepExpectation) -> str | None:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what is expectation? and where is it coming from?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, \expectationis the structured contract for what a plan step is supposed to return. It is represented byStepExpectation` and includes things like the step category, comparison type, expected grouping columns, expected metric columns, expected period column, minimum row count, and whether distinct periods are required.

It originates from the planner output. When the compiled plan is created, each step includes an expectation block alongside the SQL. That block is defined in the planner schema and prompt, then carried through execution. In executor.py, we convert each compiled step into the internal execution shape, parse that expectation into a StepExpectation object, and use it to validate that the SQL result is not just syntactically valid, but also analytically valid for the intended purpose.`

Comment thread app/agent/metric_aliases.py Outdated

import re

_CANONICAL_METRIC_ALIASES = {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we aliasing for one specific metric which may not be present is some other data?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I updated this so metric alias normalization is now schema-aware rather than relying on a static hardcoded alias map. The planner and executor resolve canonical aggregate metric names from the normalized schema manifest and the SQL aggregate shape, which keeps the behavior aligned with the data-agnostic architecture we want.

Comment thread app/agent/planner.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain all the changes in this file?

Comment thread app/api/routes.py
try:
state = run_analysis(request.query)
base_response = AnalyzeResponse(
answer_status=state.get("answer_status", "insufficient_evidence"),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not using '/analyse' endpoint now.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still using POST /analyze at the moment. The UI submits real chat prompts through that endpoint in ui/src/api/chat.ts, and the backend route is defined in app/api/routes.py. If we plan to move away from that endpoint, I agree we should clean up this response path as part of that change, but in the current code it is still active.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we doing with the data here? Are we creating a new view of data?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point we are not creating a new persisted view of the underlying data. What we are doing here is validating and shaping the query result in memory after execution so we can check whether it satisfies the analytical expectation for that step. For example, for a period comparison we inspect the returned columns and rows to confirm that the result actually contains the required metric, period field, and comparable periods before we treat it as valid evidence. So this is post-execution result validation, not creation of a new durable database view.

@saranshkr saranshkr closed this Apr 6, 2026
@saranshkr

Copy link
Copy Markdown
Collaborator

Not required anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants