Harden planner, validation, and analysis handling by Ay2012 · Pull Request #19 · Ay2012/planera

Ay2012 · 2026-04-05T01:20:56Z

his PR hardens Planera’s backend analytics pipeline, with a focus on schema-grounded planning, semantic execution validation, and safer analysis generation.

Key improvements:

strengthened the normalized schema contract used by the planner, including richer relation metadata, relationships, and protected schema-subset selection
added deterministic step expectations so compiled plans carry required comparison shape, grouping, metric, and period requirements
tightened planner and repair behavior so invalid identifiers, weakened repairs, and expectation drift are rejected before misleading execution proceeds
upgraded executor validation from “query ran and returned rows” to semantic/result-shape checks, including period-comparison enforcement and grouped-comparison validation
improved workflow state handling so partial execution is preserved and valid earlier evidence is not discarded when later steps fail
expanded answer status handling to support answered, partial_answer, contradicted_premise, insufficient_evidence, and conflicting_evidence end to end
improved grounded analysis generation with deterministic evidence building, approved claims, contradiction-first behavior, verdict-first rendering, normalized period labels, and cleaner user-safe fallback messaging
exposed answer_status through the backend and frontend response contracts so these outcomes are visible in the product
added regression coverage for invalid identifier rejection, schema subset preservation, repair validation, contradiction handling, grouped-period validation, partial evidence preservation, and internal error leakage prevention
Overall, this makes the system more reliable, more transparent, and more resistant to both hallucinated analysis and semantically weak SQL repairs.

abhinavsingh9714 · 2026-04-05T03:12:11Z

This is a big surface-area change (~1.9k LOC across agent, prompts, semantic model, API, UI). It should be split into smaller PRs (schema/executor validation → evidence/claims → narrative validation → planner UI).

abhinavsingh9714 · 2026-04-05T03:21:50Z

Tighter coupling to one data source(premise step, period comparison, distinct periods, grouping columns). That’s a specific methodology, not “whatever shape the customer’s data is in.” More checks ⇒ more paths that never produce a narrative that still could have been useful (e.g. exploratory breakdowns, single-period insights).

abhinavsingh9714

Needs more clarity, discussion and result comparisons

abhinavsingh9714 · 2026-04-05T02:55:00Z

    )


+def _render_fallback_analysis(claims: list[ApprovedClaim], answer_status: str) -> str:


what is this fallback for?

_render_fallback_analysis(...) is a deterministic safety path used when the normal LLM rendering step cannot produce a validated final answer. In the standard flow, we build approved claims from validated evidence and then ask the model to turn those claims into user-facing analysis. If that rendering step fails validation, returns unusable output, or if the workflow only has incomplete or caveat-level evidence, this function generates the response directly on the backend instead. It is not a single fixed fallback message; it still uses the approved claims that were already grounded in validated evidence, so it preserves whatever the system has actually established, such as a contradicted premise, a partial answer, or unresolved caveats. The purpose is to stay as close as possible to a reliable grounded conclusion while avoiding any unvalidated LLM wording or internal validator/orchestration details in the final user response.`

abhinavsingh9714 · 2026-04-05T02:56:18Z

+    return result
+
+
+def _validate_step_expectation(expectation: StepExpectation) -> str | None:


Can you explain what is expectation? and where is it coming from?

Here, \expectationis the structured contract for what a plan step is supposed to return. It is represented byStepExpectation` and includes things like the step category, comparison type, expected grouping columns, expected metric columns, expected period column, minimum row count, and whether distinct periods are required.

It originates from the planner output. When the compiled plan is created, each step includes an expectation block alongside the SQL. That block is defined in the planner schema and prompt, then carried through execution. In executor.py, we convert each compiled step into the internal execution shape, parse that expectation into a StepExpectation object, and use it to validate that the SQL result is not just syntactically valid, but also analytically valid for the intended purpose.`

abhinavsingh9714 · 2026-04-05T02:57:36Z

+
+import re
+
+_CANONICAL_METRIC_ALIASES = {


Why are we aliasing for one specific metric which may not be present is some other data?

Good call. I updated this so metric alias normalization is now schema-aware rather than relying on a static hardcoded alias map. The planner and executor resolve canonical aggregate metric names from the normalized schema manifest and the SQL aggregate shape, which keeps the behavior aligned with the data-agnostic architecture we want.

abhinavsingh9714 · 2026-04-05T02:59:17Z

Can you explain all the changes in this file?

abhinavsingh9714 · 2026-04-05T02:59:53Z

    try:
        state = run_analysis(request.query)
        base_response = AnalyzeResponse(
+            answer_status=state.get("answer_status", "insufficient_evidence"),


we are not using '/analyse' endpoint now.

We are still using POST /analyze at the moment. The UI submits real chat prompts through that endpoint in ui/src/api/chat.ts, and the backend route is defined in app/api/routes.py. If we plan to move away from that endpoint, I agree we should clean up this response path as part of that change, but in the current code it is still active.

abhinavsingh9714 · 2026-04-05T03:00:57Z

What are we doing with the data here? Are we creating a new view of data?

At this point we are not creating a new persisted view of the underlying data. What we are doing here is validating and shaping the query result in memory after execution so we can check whether it satisfies the analytical expectation for that step. For example, for a period comparison we inspect the returned columns and rows to confirm that the result actually contains the required metric, period field, and comparable periods before we treat it as valid evidence. So this is post-execution result validation, not creation of a new durable database view.

saranshkr · 2026-04-06T03:52:16Z

Not required anymore

Harden planner, validation, and analysis handling

56b97dc

Ay2012 requested review from abhinavsingh9714 and saranshkr April 5, 2026 01:20

saranshkr assigned Ay2012 Apr 5, 2026

abhinavsingh9714 reviewed Apr 5, 2026

View reviewed changes

Make metric alias normalization schema-aware

619b9aa

saranshkr closed this Apr 6, 2026

		)


		def _render_fallback_analysis(claims: list[ApprovedClaim], answer_status: str) -> str:

		return result


		def _validate_step_expectation(expectation: StepExpectation) -> str \| None:


		import re

		_CANONICAL_METRIC_ALIASES = {

Conversation

Ay2012 commented Apr 5, 2026

Uh oh!

abhinavsingh9714 commented Apr 5, 2026

Uh oh!

abhinavsingh9714 commented Apr 5, 2026

Uh oh!

abhinavsingh9714 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saranshkr commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants