diff --git a/README.md b/README.md
index 476d588..230d82b 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,4 @@
----
title: Scheme Enrollment Env
-emoji: 🏛️
colorFrom: blue
colorTo: green
sdk: docker
@@ -8,221 +6,822 @@ pinned: false
app_port: 7860
tags:
- openenv
+ - reinforcement-learning
+ - evaluation
+ - agents
---
# Indian Government Scheme Enrollment — RL Environment
-An open-source Reinforcement Learning environment simulating the workflow of an Indian Government CSC (Common Service Centre) operator. An LLM-based agent must interview applicants, collect missing documents, detect boundary fraud, and either enroll them in the correct welfare scheme or safely escalate contradictory cases to a senior officer.
+> *A reinforcement learning benchmark for bureaucratic reasoning: interviewing applicants, verifying documents, applying strict scheme rules, detecting fraud, and knowing when to escalate rather than decide.*
+
+[](https://huggingface.co/spaces/advikdivekar/scheme-enrollment-env)
+[](https://github.com/advikdivekar/rl-agent)
+[](https://huggingface.co/openenv)
+[](tests/)
+[](#-the-5-tasks)
+
+## The Case Study
+
+Priya is a CSC operator in Barmer, Rajasthan. She interviews dozens of applicants every day across a wooden desk, a government-issue computer, and a slow internet connection. One afternoon, a young man walks in claiming to be a student. He wants to enroll in PMKVY, a skill-training scheme. On the surface, his profile looks plausible.
+
+But something feels wrong. His income is unusually high for a student. Priya asks for his PAN card. It reveals six years of active pension-linked employment from a public sector company. He is not a student. He is attempting to claim a benefit under false pretenses.
+
+Priya does not guess. She does not overreach. She escalates the case.
+
+**This environment trains AI agents to behave like Priya.**
+
+Not just to read a table of rules, but to:
+
+- gather missing information before acting
+- verify the right document at the right time
+- apply exact arithmetic boundaries
+- ignore irrelevant context
+- distinguish ineligibility from contradiction
+- escalate only when escalation is genuinely required
-## Why This Exists
+## Why This Environment Exists
-Millions of rural Indians access government welfare schemes through CSC operators — human workers who interview applicants, verify documents, and submit applications. This process requires multi-step reasoning, strict rule adherence, and the ability to detect fraud. This environment trains and evaluates AI agents on that exact workflow, filling a real gap in the RL/agent evaluation ecosystem.
+Most RL and agent benchmarks focus on coding, games, search, or generic dialogue. Very few test policy compliance under partial observability, exact thresholds, and procedural safety.
-## MDP Formalization
+This environment exists to measure a harder and more realistic capability cluster:
+
+- **Policy compliance under uncertainty**: the agent must collect evidence before deciding
+- **Fraud detection through document verification**: contradictions emerge only after the correct document is requested
+- **Boundary arithmetic**: `9999` qualifies, `10000` does not
+- **Escalation protocol**: the agent must know when not to decide
+- **Noise filtering**: irrelevant profile fields appear alongside real signal
+
+The benchmark is grounded in a workflow that affects welfare access, fraud prevention, and administrative fairness.
+
+## Hackathon Compliance Snapshot
+
+This repository is structured to satisfy the official Round 1 requirements:
+
+- real-world task simulation, not a toy domain
+- full OpenEnv environment with typed models, `step()`, `reset()`, `state()`, and `openenv.yaml`
+- 5 graded tasks with deterministic programmatic scoring in the `0.0–1.0` range
+- meaningful reward shaping over the trajectory
+- root-level `inference.py` using the OpenAI client
+- Dockerfile plus Hugging Face Space deployment metadata
+- pre-submission validation via `scripts/pre-validation-script.sh`
+- README coverage for environment description, action space, observation space, tasks, setup, and baseline scores
+
+## Table of Contents
+
+- [Environment at a Glance](#environment-at-a-glance)
+- [Repository Structure](#repository-structure)
+- [Architecture Overview](#architecture-overview)
+- [System Architecture](#system-architecture)
+- [Agent-Environment Architecture](#agent-environment-architecture)
+- [Training Pipeline Architecture](#training-pipeline-architecture)
+- [Reward Architecture](#reward-architecture)
+- [Deployment and Inference Architecture](#deployment-and-inference-architecture)
+- [Data Flow Architecture](#data-flow-architecture)
+- [Environment Contract](#environment-contract)
+- [Action Space](#action-space)
+- [Observation Space](#observation-space)
+- [Scheme Eligibility Rules](#scheme-eligibility-rules)
+- [The 5 Tasks](#the-5-tasks)
+- [The Distraction Trap](#the-distraction-trap)
+- [Benchmark Outputs and Screenshots](#benchmark-outputs-and-screenshots)
+- [Baseline Results](#baseline-results)
+- [Setup and Running](#setup-and-running)
+- [Environment Variables](#environment-variables)
+- [Testing](#testing)
+- [Pre-Submission Validation](#pre-submission-validation)
+- [OpenEnv Compliance](#openenv-compliance)
+
+## Environment at a Glance
| Component | Definition |
|---|---|
-| **State (S)** | Worker profile (16 fields: age, income, occupation, has_aadhaar, family_income, worker_type, has_epfo, has_esic, is_govt_employee, has_pan, has_bank_account, has_pucca_house, is_pregnant, first_child, is_income_tax_payer, not_nps) + application form state + step count |
-| **Action (A)** | 5 discrete actions: ask_question, request_document, approve_scheme, reject_applicant, escalate |
-| **Transition (T)** | Deterministic given persona — ask_question reveals hidden fields, verify_document surfaces contradictions |
-| **Reward (R)** | Dense per-step rewards (see reward table below) + terminal bonus |
-| **Discount (γ)** | 1.0 — episodic task, all steps matter equally |
-| **Max Steps** | 20 per episode |
+| **State (S)** | Applicant profile, partial observation state, hidden persona fields, step count |
+| **Action (A)** | `ask_question`, `request_document`, `approve_scheme`, `reject_applicant`, `escalate` |
+| **Transition (T)** | Deterministic given persona and task template |
+| **Reward (R)** | Intermediate shaping plus terminal outcome rewards |
+| **Horizon** | 20 steps per episode |
+| **Grader** | Terminal normalized score `0.0` to `1.0` |
+| **Server** | FastAPI via OpenEnv `create_app` |
+| **Inference** | OpenAI-compatible client, provider-agnostic |
+| **Benchmarking** | Inference-first evaluation flow with optional multi-model orchestration |
+
+## Repository Structure
+
+```text
+.
+├── README.md
+├── pyproject.toml
+├── requirements.txt
+├── uv.lock
+├── Dockerfile
+├── openenv.yaml
+├── .env.example
+├── models.py
+├── client.py
+├── inference.py
+├── benchmark_runner.py
+├── benchmark_report.py
+├── server/
+│ ├── __init__.py
+│ ├── app.py
+│ ├── models.py
+│ ├── scheme_env_environment.py
+│ └── schemes.py
+├── tests/
+│ ├── conftest.py
+│ └── test_scheme_eligibility.py
+└── reports/
+ ├── average_scores.png
+ ├── task_heatmap.png
+ ├── difficulty_profile.png
+ ├── efficiency_scatter.png
+ ├── inference_logs/
+ └── test_logs/
+```
-## Action Space
+### What each major file does
+
+- [server/app.py](server/app.py): FastAPI/OpenEnv server entrypoint exposing `/reset`, `/step`, and `/health`
+- [server/scheme_env_environment.py](server/scheme_env_environment.py): environment lifecycle, task logic, reward shaping, step transitions, shared state, metadata sanitization
+- [server/schemes.py](server/schemes.py): scheme metadata, eligibility logic, optimal scheme selection
+- [models.py](models.py): root `Action` and `Observation` schemas used by inference and server logic
+- [client.py](client.py): OpenEnv client wrapper for typed environment access
+- [inference.py](inference.py): single-model evaluation loop that produces the primary output bundle under `reports/`
+- [benchmark_runner.py](benchmark_runner.py): optional multi-model orchestration layer
+- [benchmark_report.py](benchmark_report.py): report and chart generation from benchmark artifacts
+- [tests/test_scheme_eligibility.py](tests/test_scheme_eligibility.py): boundary-condition and grading tests
+- [reports](reports): benchmark outputs, summary files, charts, and archived logs
+
+## Architecture Overview
+
+This repo has a clean separation between:
+
+1. the **environment runtime**
+2. the **model interaction loop**
+3. the **benchmark orchestration layer**
+4. the **reporting and visualization layer**
+
+## System Architecture
+
+```mermaid
+flowchart LR
+ A["LLM / External Policy"] --> B["inference.py
Prompting + JSON extraction"]
+ B --> C["OpenEnv HTTP API
/reset /step"]
+ C --> D["server/app.py
FastAPI + create_app"]
+ D --> E["SchemeEnvEnvironment
server/scheme_env_environment.py"]
+ E --> F["Persona Generation"]
+ E --> G["Observation Builder"]
+ E --> H["Reward + Grader Logic"]
+ E --> I["Scheme Rules
server/schemes.py"]
+ B --> J["reports/inference_logs
per-run logs"]
+ B --> K["reports/test_logs
validation outputs"]
+ B --> L["reports/*.png + reports/*.csv + reports/*.json"]
+ M["benchmark_runner.py
optional orchestration"] --> B
+ L --> N["README visual outputs"]
+```
-| Action | Value | Description | Reward |
-|---|---|---|---|
-| `ask_question` | field name | Gather missing eligibility data | 0.0 valid step, -0.10 noise/redundant |
-| `request_document` | document name | Request verification documents | 0.0 valid step |
-| `approve_scheme` | scheme name | Enroll applicant in optimal scheme | +10.0 (optimal), +3.0 (suboptimal), -5.0 (wrong) |
-| `reject_applicant` | category | Reject ineligible applicant | +5.0 (correct), -5.0 (incorrect) |
-| `escalate` | category or empty | Hand off contradictory case to senior officer | +10.0 (Task 4 only), -2.0 (other tasks) |
+### Runtime layers
+
+- **Inference layer**: talks to external models and formats actions
+- **API layer**: standard OpenEnv-compatible transport over HTTP
+- **Environment layer**: task logic, hidden persona state, reward logic
+- **Data layer**: scheme rules and typed schemas
+- **Reporting layer**: benchmark aggregation and visualization
+
+## Agent-Environment Architecture
+
+```mermaid
+sequenceDiagram
+ participant Agent as LLM Agent
+ participant Runner as inference.py
+ participant API as FastAPI/OpenEnv
+ participant Env as SchemeEnvEnvironment
+
+ Agent->>Runner: JSON action
+ Runner->>API: POST /step
+ API->>Env: step(action)
+ Env->>Env: Validate action
+ Env->>Env: Update hidden state
+ Env->>Env: Compute reward and terminal result
+ Env->>Env: Strip hidden metadata
+ Env-->>API: Observation
+ API-->>Runner: Observation + reward + done
+ Runner-->>Agent: Updated state context
+```
+
+### Core interaction pattern
+
+- the agent never mutates internal state directly
+- every step is mediated through a strict typed action schema
+- the environment can soft-block some wrong protocol steps and allow recovery
+- the final score depends on both correctness and efficiency
+
+## Training Pipeline Architecture
+
+This repository is an **evaluation and benchmarking pipeline**, not an on-policy RL training loop with replay buffers and optimizer steps. Still, there is a clear training-style pipeline structure:
+
+```mermaid
+flowchart TD
+ A["Environment Server"] --> B["inference.py
single-model episodes"]
+ B --> C["reports/inference_logs
raw traces"]
+ B --> D["reports/test_logs
test outputs"]
+ B --> E["reports/*.png
charts"]
+ B --> F["reports/*.csv + reports/*.json + reports/*.txt"]
+ G["benchmark_runner.py
optional repeated invocation"] --> B
+```
+
+### What this pipeline enables
+
+- repeated evaluation over randomized personas
+- capability comparison across model sizes and families
+- exploit detection through artifact inspection
+- persistent inference logs and validation outputs inside `reports/`
+
+## Reward Architecture
+
+The reward system has three layers:
+
+1. **intermediate shaping**
+2. **terminal outcome reward**
+3. **continuous grader score**
+
+### Intermediate shaping
+
+| Event | Reward |
+|---|---|
+| Valid `ask_question` | `0.0` |
+| Valid `request_document` | `0.0` |
+| Noise query | `-0.10` |
+| Redundant query | `-0.10` |
+| Soft-block protocol violation | `-1.0` to `-1.5` depending on task/context |
+
+### Terminal outcomes
+
+| Event | Reward |
+|---|---|
+| Correct optimal approval | `+10.0` |
+| Correct escalation | `+10.0` |
+| Correct rejection | `+5.0` |
+| Suboptimal but eligible approval | `+3.0` |
+| Wrong escalation | `-2.0` |
+| Wrong rejection | `-5.0` |
+| Ineligible approval | `-5.0` |
+| Premature approval | `-5.0` |
+| Timeout | `-2.0` |
+
+### Continuous grader
+
+```text
+grader_score = max(0.30, min(1.0, base_score - penalty + bonus))
+```
+
+Where:
+
+```text
+penalty =
+ (noise_queries * 0.08) +
+ (redundant_queries * 0.05) +
+ (wasted_steps * 0.04) # Task 2 only
+
+bonus =
+ 0.05 if document_verified else 0.0
+```
+
+### Why this design is strong
+
+- correct but sloppy agents still outrank wrong agents
+- agents cannot farm intermediate reward
+- document protocol adherence is rewarded
+- score remains leaderboard-friendly
+
+## Deployment and Inference Architecture
+
+```mermaid
+flowchart TD
+ A["Dockerfile"] --> B["Pinned openenv-base image"]
+ B --> C["uv sync --frozen"]
+ C --> D["Runtime container"]
+ D --> E["uvicorn server.app:app :7860"]
+ E --> F["/health"]
+ G["inference.py"] --> H["OpenAI, Hugging Face Router, or NVIDIA NIM"]
+ G --> E
+```
+
+### Deployment characteristics
+
+- Dockerfile uses a multi-stage build
+- base image is sha256-pinned
+- `uv.lock` is used for reproducible dependency resolution
+- server runs with `uvicorn server.app:app`
+- health checks hit `/health`
+
+### Inference characteristics
+
+- all LLM calls use the OpenAI Python client
+- the client is configured from environment variables in `inference.py`
+- structured stdout logs follow `[START]`, `[STEP]`, and `[END]`
+- provider normalization remains in place for compatible endpoints
+
+## Data Flow Architecture
+
+```mermaid
+flowchart TD
+ A["reset(seed)"] --> B["generate_dynamic_persona(task_id)"]
+ B --> C["_make_fresh_obs(task, persona)"]
+ C --> D["Inject 1-3 noise fields"]
+ D --> E["Return initial observation"]
+ E --> F["Agent chooses action"]
+ F --> G["step(action)"]
+ G --> H{"Action type"}
+ H -->|ask_question| I["Reveal field or penalize noise/redundancy"]
+ H -->|request_document| J["Reveal document-backed truth"]
+ H -->|approve_scheme| K["Check optimality or protocol violation"]
+ H -->|reject_applicant| L["Check rejection validity"]
+ H -->|escalate| M["Allow only verified contradiction path"]
+ I --> N["_compute_grader_score at terminal only"]
+ J --> N
+ K --> N
+ L --> N
+ M --> N
+ N --> O["_finalize_step()"]
+ O --> P["Persist full internal state"]
+ O --> Q["Return sanitized observation to agent"]
+```
+
+### Important data flow properties
+
+- hidden persona flags never go directly to the model
+- internal metadata is stripped before return
+- timeout enforcement happens centrally in `_finalize_step()`
+- all step paths converge through the same finalization logic
-**Valid field names for ask_question:** `age`, `income`, `occupation`, `has_aadhaar`
+## Environment Contract
-**Valid document names for request_document:** `aadhaar_card`, `pan_card`, `aadhaar`, `pan`
+The environment follows the OpenEnv contract with:
-**Valid scheme names for approve_scheme:** `PMKVY`, `MGNREGS`, `PMAY`
+- `POST /reset`
+- `POST /step`
+- `GET /health`
-**Valid decision categories for reject/escalate:** `AGE_EXCEEDED`, `INCOME_TOO_HIGH`, `NO_ELIGIBLE_SCHEME`, `MISSING_REQUIRED_DATA`, `DATA_MISMATCH`, `DOCUMENT_CONFLICT`, `MANUAL_REVIEW_REQUIRED`
+[openenv.yaml](openenv.yaml) currently specifies:
+
+- `name: scheme_env`
+- `version: 0.2.0`
+- `runtime: fastapi`
+- `app: server.app:app`
+- `port: 7860`
+- `max_steps: 20`
+
+## Action Space
+
+| Action | Valid Values | Description | Reward |
+|---|---|---|---|
+| `ask_question` | `age`, `income`, `occupation`, `has_aadhaar` | Request a specific eligibility field | `0.0` valid, `-0.10` redundant/noise |
+| `request_document` | `aadhaar_card`, `pan_card`, `aadhaar`, `pan` | Request an official verification document | `0.0` valid, reveals hidden truth |
+| `approve_scheme` | `PMKVY`, `MGNREGS`, `PMAY` | Enroll the applicant in a scheme | `+10.0`, `+3.0`, or `-5.0` |
+| `reject_applicant` | `AGE_EXCEEDED`, `INCOME_TOO_HIGH`, `NO_ELIGIBLE_SCHEME`, `MISSING_REQUIRED_DATA`, `DATA_MISMATCH`, `DOCUMENT_CONFLICT` | Reject with a concise reason code | `+5.0` or `-5.0` |
+| `escalate` | `DATA_MISMATCH`, `MANUAL_REVIEW_REQUIRED`, or empty | Escalate to a senior officer | correct only in contradiction path |
+
+The action space is intentionally small, real-world, and exploit-resistant.
## Observation Space
+Each step returns a structured observation:
+
| Field | Type | Description |
|---|---|---|
-| `known_profile` | Dict | Applicant data collected so far — grows as agent asks valid questions |
-| `missing_data` | List[str] | Fields still needed before agent can make a terminal decision |
-| `notification` | str | Environment feedback on the last action taken |
-| `is_terminated` | bool | True when the episode has ended |
-| `grader_score` | float | Continuous score 0.0–1.0, set only at episode termination |
-| `metadata` | Dict | Internal tracking: task id, noise_queries, redundant_queries |
+| `known_profile` | `Dict[str, Any]` | Applicant data collected so far |
+| `missing_data` | `List[str]` | Fields still required before a valid terminal decision |
+| `notification` | `str` | Natural-language feedback from the environment |
+| `is_terminated` | `bool` | Episode has ended |
+| `grader_score` | `Optional[float]` | Terminal normalized score |
+| `metadata` | `Dict[str, Any]` | Agent-visible counters only |
+
+### Metadata exposure policy
+
+The agent sees only:
+
+- `noise_queries`
+- `redundant_queries`
+- `relevant_queries`
+
+Internal fields such as `pan_verified`, `aadhaar_verified`, and hidden task markers are stripped before transmission.
## Scheme Eligibility Rules
-All thresholds are strict integer comparisons — no rounding or approximation.
+All comparisons use strict integer arithmetic.
-| Scheme | Age | Occupation | Income | Aadhaar |
-|---|---|---|---|---|
-| **PMKVY** | 18–35 | mason OR carpenter | ≤ 9999 | — |
-| **MGNREGS** | 18–60 | farm_labourer | — | Required |
-| **PMAY** | 21–55 | any | ≤ 5999 | Required |
+| Scheme | Full Name | Age Range | Occupation | Income Ceiling | Aadhaar | Benefit |
+|---|---|---|---|---|---|---|
+| **PMKVY** | Pradhan Mantri Kaushal Vikas Yojana | 18 to 35 | `mason` or `carpenter` | `<= 9999` | Not required | Rs 8,000 training stipend |
+| **MGNREGS** | Mahatma Gandhi National Rural Employment Guarantee Scheme | 18 to 60 | `farm_labourer` only | None | Required | 100 days wage employment |
+| **PMAY** | Pradhan Mantri Awaas Yojana | 21 to 55 | Any | `<= 5999` | Required | Rs 1.2 lakh housing grant |
-**Reject if:** no scheme criteria are fully satisfied.
+### Priority rule
-## Reward Function
+When multiple schemes are eligible:
-| Event | Reward | Terminal? |
-|---|---|---|
-| Valid question from missing_data | 0.0 | No |
-| Valid document request | 0.0 | No |
-| Redundant or noise field query | -0.10 | No |
-| Correct optimal scheme approved | +10.0 | Yes |
-| Suboptimal but eligible scheme | +3.0 | Yes |
-| Correct rejection (Task 3) | +5.0 | Yes |
-| Correct escalation (Task 4) | +10.0 | Yes |
-| Wrong scheme / ineligible approval | -5.0 | Yes |
-| Premature approval (missing data) | -5.0 | Yes |
-| Boundary violation (Task 3) | -5.0 | Yes |
-| Fraud authorization (Task 4) | -5.0 | Yes |
-| Premature rejection (Task 4) | -3.0 | Yes |
-| Wrong escalation (Tasks 1–3) | -2.0 | Yes |
-| Timeout (20 steps) | -2.0 | Yes |
+```text
+PMAY > MGNREGS > PMKVY
+```
+
+The repo also defines future-facing extended schemes in [server/schemes.py](server/schemes.py), but current benchmark tasks are built around the core three.
+
+## The 5 Tasks
+
+### Task 1 — Scheme Discovery
+
+The agent starts with a partially hidden profile and must collect the remaining eligibility fields before approving the **optimal** scheme, not merely an eligible one.
-## Grader Scoring
+| Parameter | Value |
+|---|---|
+| Profile at reset | `age` and `income` visible, `occupation` and `has_aadhaar` hidden |
+| Persona range | age 18 to 35, income 1,000 to 9,999 |
+| Minimum steps | 3 |
+| Core skill | benefit-aware scheme ranking |
+
+### Task 2 — Missing Data
+
+The applicant file is incomplete. The agent must collect all required fields before making any terminal decision.
+
+| Parameter | Value |
+|---|---|
+| Profile at reset | age + income visible, randomized missing field order |
+| Optimal scheme | MGNREGS once fields are collected |
+| Minimum steps | 3 |
+| Core skill | procedural completeness |
+
+### Task 3 — Boundary Fraud Detection
+
+Income is hidden initially. Once revealed, it always exceeds the PMKVY threshold, and the correct action is rejection.
+
+| Parameter | Value |
+|---|---|
+| Profile at reset | age visible, income hidden |
+| Income range | 10,001 to 12,000 |
+| Minimum steps | 4 |
+| Core skill | exact arithmetic boundary reasoning |
+
+### Task 4 — Escalation Dilemma
+
+The applicant claims to be a student, but PAN verification reveals long-term public-sector employment. The correct response is escalation after verification.
+
+| Parameter | Value |
+|---|---|
+| Profile at reset | complete profile, occupation=`student` |
+| Income range | 8,000 to 20,000 |
+| Minimum steps | 2 |
+| Core skill | contradiction handling and escalation |
+
+### Task 5 — Document Conflict
+
+The self-reported age looks near the PMKVY boundary, but Aadhaar reveals a disqualifying official age. The correct response is verified rejection.
+
+| Parameter | Value |
+|---|---|
+| Self-reported age | 33, 34, or 35 |
+| Aadhaar age | always greater than 35 |
+| Income range | 6,001 to 9,000 |
+| Minimum steps | 2 |
+| Core skill | document authority over self-report |
+
+## The Distraction Trap
-Terminal outcomes are scored continuously between 0.0 and 1.0 using an efficiency-weighted formula:
+Every episode injects 1 to 3 irrelevant fields into `known_profile`, for example:
+- `marital_status`
+- `state_of_residence`
+- `number_of_children`
+- `bank_name`
+
+These look plausibly administrative, but they do **not** affect eligibility. Querying them incurs penalties and lowers the grader score.
+
+This is a deliberate benchmark feature, not cosmetic clutter.
+
+## Benchmark Outputs and Screenshots
+
+The evaluation flow centers on `inference.py`. The run outputs shown here are written under `reports/`, with the most important raw output directories being:
+
+- `reports/inference_logs/`
+- `reports/test_logs/`
+
+The top-level `reports/` directory also holds the rendered charts and summary files generated from the same inference-driven run bundle.
+
+### Generated artifact bundle
+
+```text
+reports/
+├── average_scores.png
+├── task_heatmap.png
+├── difficulty_profile.png
+├── efficiency_scatter.png
+├── leaderboard.csv
+├── results.json
+├── summary.txt
+├── README.txt
+├── inference_logs/
+└── test_logs/
```
-grader_score = max(0.30, base_score - penalty)
-penalty = (noise_queries × 0.08) + (redundant_queries × 0.05)
- + (wasted_steps × 0.04) # Task 2 only
+These artifacts represent the output bundle produced by the inference flow. The raw per-model traces live in `reports/inference_logs/`, and the verification outputs live in `reports/test_logs/`.
+
+### 1. Leaderboard output
+
+The top-level CSV output from the sample run is:
+
+| Model | Size | Task1 | Task2 | Task3 | Task4 | Task5 | Average |
+|---|---|---:|---:|---:|---:|---:|---:|
+| mistralai/mistral-nemotron | ~56B | 0.833 | 1.000 | 1.000 | 1.000 | 1.000 | **0.967** |
+| nvidia/llama-3.3-nemotron-super-49b-v1 | 49B | 0.800 | 0.973 | 1.000 | 1.000 | 1.000 | 0.955 |
+| nvidia/llama-3.1-nemotron-51b-instruct | 51B | 0.800 | 0.957 | 1.000 | 1.000 | 1.000 | 0.951 |
+| nvidia/nemotron-3-nano-30b-a3b | 30B | 1.000 | 0.000 | 1.000 | 1.000 | 1.000 | 0.800 |
+| nvidia/nemotron-3-super-120b-a12b | 120B | 1.000 | 0.000 | 1.000 | 1.000 | 1.000 | 0.800 |
+| nvidia/nemotron-mini-4b-instruct | 4B | 0.483 | 0.667 | 0.667 | 0.967 | 0.000 | 0.557 |
+| meta/llama-3.1-8b-instruct | 8B | 0.400 | 0.000 | 0.317 | 0.867 | 1.000 | 0.517 |
+| nvidia/llama-3.1-nemotron-nano-8b-v1 | 8B | 0.283 | 0.303 | 0.000 | 0.333 | 0.000 | 0.184 |
+
+### 2. Summary output
+
+The sample summary file reports:
+
+```text
+OpenEnv scheme_env Benchmark — Baseline Report Summary
+========================================================
+Date generated : 2026-04-08
+Models evaluated : 8
+
+Best model : mistral-nemotron (avg=0.967)
+Worst model : nemotron-nano-8b (avg=0.184)
+
+Hardest task : Task 2 (mean=0.487)
+Easiest task : Task 4 (mean=0.896)
+
+Perfect score (1.0 on all tasks): none
```
-A correct but inefficient agent always outscores an incorrect agent.
+### 3. Average score chart
-## Tasks
+
-### Task 1 — Scheme Discovery (Easy)
-**Objective:** Complete profile provided. Agent must identify and approve the optimal scheme.
-**Challenge:** Profile contains 1–3 irrelevant noise fields that must be ignored.
-**Minimum steps:** 1
-**Grader:** 1.0 for optimal scheme, 0.5 for eligible but suboptimal, 0.0 for wrong/timeout. Efficiency penalty applied.
+This chart gives the fastest overall leaderboard comparison across models.
-### Task 2 — Missing Data (Medium)
-**Objective:** Profile is incomplete. Agent must collect all missing fields before approving.
-**Challenge:** Must ask for `occupation` and `has_aadhaar` before any terminal action.
-**Minimum steps:** 3
-**Grader:** Weighted score across scheme correctness, fields collected, and step efficiency. Premature approval = 0.0.
+### 4. Per-task heatmap
-### Task 3 — Boundary Fraud Detection (Hard)
-**Objective:** Profile looks PMKVY-eligible but income is 1–2000 rupees over the 10,000 threshold.
-**Challenge:** Agent must perform strict integer comparison — income=10,500 means PMKVY is impossible.
-**Minimum steps:** 1
-**Grader:** 1.0 for correct rejection, 0.0 for any approval attempt. Efficiency penalty applied.
+
-### Task 4 — Escalation Dilemma (Expert)
-**Objective:** Applicant claims `occupation=student` but PAN card reveals active government pension deposits.
-**Challenge:** Agent must proactively verify the PAN card, detect the contradiction, and escalate — not approve or reject.
-**Minimum steps:** 2 (verify PAN → escalate)
-**Grader:** 1.0 for escalation after PAN verification, 0.85 for escalation without verification, 0.0 for approval or rejection.
+This chart is especially useful for spotting capability cliffs and task-specific failure modes.
-## Distraction Trap
+### 5. Difficulty profile
-Every task injects 1–3 irrelevant fields into `known_profile`:
-`marital_status`, `state_of_residence`, `number_of_children`, `bank_name`
+
-Querying any of these costs `-0.10` and reduces the final grader score. This tests whether agents can filter irrelevant context — a key real-world capability.
+This chart summarizes which tasks are easiest or hardest across the evaluated model set.
-- `reports/report_/leaderboard_.csv`
-- `reports/report_/logs_/`
-- `reports/report_/run_manifest_.json`
-- `reports/report_/average_scores.png`
-- `reports/report_/task_heatmap.png`
-- `reports/report_/efficiency_scatter.png`
-- `reports/report_/results.json`
-- `reports/report_/summary.csv`
+### 6. Efficiency / protocol-view scatter
-Every `reset()` generates a fresh randomised persona:
-- Task 1: age randomised 18–35, income 1,000–9,999
-- Task 2: age randomised 18–60, income 1,000–5,000
-- Task 3: income always 10,001–12,000 (above PMKVY threshold)
-- Task 4: employer randomly selected from 8 Indian PSUs
+
-No two evaluation episodes are mathematically identical.
+This chart helps interpret whether strong models are also protocol-efficient, not just ultimately correct.
-## Setup
+### 7. Raw artifacts included in the bundle
+
+The generated output bundle also includes:
+
+- `results.json`
+- `leaderboard.csv`
+- `summary.txt`
+- `README.txt`
+- `inference_logs/`
+- `test_logs/`
+
+That means the README now shows not just plots, but also the exact machine-readable outputs and raw logs the benchmark produces.
+
+## Baseline Results
+
+Across the included baseline report:
+
+- **best model**: `mistralai/mistral-nemotron` at `0.967`
+- **worst model**: `nvidia/llama-3.1-nemotron-nano-8b-v1` at `0.184`
+- **hardest task**: Task 2
+- **easiest task**: Task 4
+
+### What these results reveal
+
+- **Task 2 is a strong discriminator**: some larger models still fail to commit to the final approval even after collecting the needed fields
+- **Task 5 separates small models sharply**: some understand the contradiction but fail to translate it into a valid schema action
+- **Task 4 is protocol-heavy, not purely reasoning-heavy**: once the contradiction is document-backed, many models can resolve it correctly
+- **Task 1 remains nontrivial**: choosing the optimal scheme instead of the first eligible scheme still trips strong models
+
+## Setup and Running
+
+### Option 1 — Docker
```bash
docker build -t scheme-enrollment-env .
docker run -p 7860:7860 scheme-enrollment-env
+curl http://localhost:7860/health
+```
+
+### Option 2 — Local
+
+```bash
+git clone https://github.com/advikdivekar/rl-agent.git
+cd rl-agent
+python -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+export PYTHONPATH=.
+uvicorn server.app:app --host 0.0.0.0 --port 7860
```
-## Running Inference
+### With `uv`
```bash
-export OPENAI_API_KEY=your_key
+uv sync
+export PYTHONPATH=.
+uvicorn server.app:app --host 0.0.0.0 --port 7860
+```
+
+### Running inference
+
+Hugging Face Router:
+
+```bash
+export HF_TOKEN=your_hf_token
export API_BASE_URL=https://router.huggingface.co/v1
export MODEL_NAME=Qwen/Qwen2.5-7B-Instruct
export ENV_URL=http://localhost:7860
-
+export N_REPEATS=3
python inference.py
```
-Generate a report from an explicit bundled run directory:
+OpenAI-compatible endpoint:
```bash
-python benchmark_report.py --run-dir reports/report_20260404_124255
+export HF_TOKEN=your_api_token
+export API_BASE_URL=https://api.openai.com/v1
+export MODEL_NAME=gpt-4.1-mini
+export ENV_URL=http://localhost:7860
+python inference.py
```
-Generate a report from explicit artifact paths:
+## Environment Variables
-```bash
-python benchmark_report.py \
- --csv reports/report_20260404_124255/leaderboard_20260404_124255.csv \
- --logs-dir reports/report_20260404_124255/logs_20260404_124255
+| Variable | Default | Description |
+|---|---|---|
+| `HF_TOKEN` | unset | Token used by the OpenAI client for authenticated calls |
+| `API_BASE_URL` | `https://router.huggingface.co/v1` | Model endpoint |
+| `MODEL_NAME` | `Qwen/Qwen2.5-7B-Instruct` | Model identifier |
+| `LOCAL_IMAGE_NAME` | unset | Optional local image name when using `from_docker_image()` workflows |
+| `ENV_URL` | `http://localhost:7860` | Environment server URL |
+| `MAX_TOKENS` | `1500` | Max tokens per model call |
+| `N_REPEATS` | `3` | Episodes per task |
+| `INFERENCE_TEMPERATURE` | `0.0` | Sampling temperature |
+
+`inference.py` now reads:
+
+```python
+API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
+MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-7B-Instruct")
+HF_TOKEN = os.getenv("HF_TOKEN")
+LOCAL_IMAGE_NAME = os.getenv("LOCAL_IMAGE_NAME")
```
-## Nemotron Setup
+and all LLM calls are made through:
+
+```python
+from openai import OpenAI
+client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
+```
-This repo is configured to work with OpenAI-compatible chat APIs, including
-Hugging Face Router and NVIDIA NIM.
+## Testing
-### Hugging Face Router
+Run the unit tests with:
```bash
-export API_BASE_URL="https://router.huggingface.co/v1"
-export MODEL_NAME="nvidia/Llama-3.1-Nemotron-70B-Instruct-HF"
-export HF_TOKEN="your_hf_token"
-export MAX_TOKENS="1500"
-python inference.py
+export PYTHONPATH=.
+pytest tests/ -v
```
-Important:
-- the older `https://api-inference.huggingface.co/models/.../v1` pattern is deprecated by Hugging Face
-- `inference.py` now rewrites that deprecated URL to Router automatically
-- actual Nemotron availability on Router still depends on which providers are enabled for your token
+Current unit tests cover:
-### NVIDIA NIM
+- PMKVY age and income boundaries
+- PMAY strict ceiling behavior
+- MGNREGS Aadhaar requirement
+- optimal-scheme priority ordering
+- grader score floor and penalty math
+
+When benchmark verification outputs are generated, they are written under `reports/test_logs/`.
+
+## Pre-Submission Validation
+
+To make hackathon submission checks repeatable, the repo includes a dedicated pre-validation script:
```bash
-export API_BASE_URL="https://integrate.api.nvidia.com/v1"
-export MODEL_NAME="nvidia/llama-3.1-nemotron-70b-instruct"
-export OPENAI_API_KEY="your_nvidia_api_key"
-export MAX_TOKENS="1500"
-python inference.py
+./scripts/pre-validation-script.sh [repo_dir]
```
-The inference path is JSON-first and does not require `` tags, which makes it more robust for Nemotron-style verbose models.
+Example:
-## Real-World Utility
+```bash
+cd /tmp/rl-agent-readme-pr
+./scripts/pre-validation-script.sh https://advikdivekar-scheme-enrollment-env.hf.space /tmp/rl-agent-readme-pr
+```
-This environment models a task performed daily by thousands of CSC operators across rural India. Key capabilities tested:
+### What the script checks
+
+- repository structure and required files
+- `inference.py` environment-variable contract
+- OpenAI client usage and structured `[START]`, `[STEP]`, `[END]` logs
+- OpenEnv surface requirements from `openenv.yaml`
+- README coverage for action space, observation space, setup, tasks, and baseline outputs
+- live Hugging Face Space `/reset` and `/health`
+- Docker build success
+- `openenv validate`
+- Python compile sanity
+- `pytest tests/`
+
+### Passing validation output
+
+```text
+========================================
+ OpenEnv Submission Validator
+========================================
+[16:37:15] Repo: /tmp/rl-agent-readme-pr
+[16:37:15] Ping URL: https://advikdivekar-scheme-enrollment-env.hf.space
+
+[16:37:15] Step 1/8: Repo structure checks ...
+[16:37:15] PASSED -- README present: README.md
+[16:37:15] PASSED -- Root inference script present: inference.py
+[16:37:15] PASSED -- openenv.yaml present: openenv.yaml
+[16:37:15] PASSED -- Dockerfile present: Dockerfile
+[16:37:15] PASSED -- Root models.py present: models.py
+[16:37:15] PASSED -- server package present: server
+[16:37:15] PASSED -- tests directory present: tests
+[16:37:15] Step 2/8: Inference contract checks ...
+[16:37:15] PASSED -- OpenAI client imported in inference.py
+[16:37:15] PASSED -- API_BASE_URL read from env with default
+[16:37:15] PASSED -- MODEL_NAME read from env with default
+[16:37:15] PASSED -- HF_TOKEN read from env without default
+[16:37:15] PASSED -- LOCAL_IMAGE_NAME optionally supported
+[16:37:15] PASSED -- OpenAI client configured from required env vars
+[16:37:15] PASSED -- Structured START log marker present
+[16:37:15] PASSED -- Structured STEP log marker present
+[16:37:15] PASSED -- Structured END log marker present
+[16:37:15] Step 3/8: OpenEnv spec surface checks ...
+[16:37:15] PASSED -- openenv.yaml declares spec_version
+[16:37:15] PASSED -- openenv.yaml declares runtime
+[16:37:15] PASSED -- openenv.yaml declares app entrypoint
+[16:37:15] PASSED -- openenv.yaml declares port
+[16:37:15] PASSED -- Environment defines reset()
+[16:37:15] PASSED -- Environment defines step()
+[16:37:15] PASSED -- Environment exposes state property/method
+[16:37:15] PASSED -- Detected 3+ task definitions in environment logic
+[16:37:15] Step 4/8: README submission-content checks ...
+[16:37:15] PASSED -- README documents action space
+[16:37:15] PASSED -- README documents observation space
+[16:37:15] PASSED -- README documents setup instructions
+[16:37:15] PASSED -- README documents tasks
+[16:37:15] PASSED -- README documents baseline outputs
+[16:37:15] Step 5/8: Pinging HF Space (https://advikdivekar-scheme-enrollment-env.hf.space/reset) ...
+[16:37:17] PASSED -- HF Space is live and responds to /reset
+[16:37:18] PASSED -- HF Space /health responds with HTTP 200
+[16:37:18] Step 6/8: Running docker build ...
+[16:37:47] PASSED -- Docker build succeeded
+[16:37:47] Step 7/8: Running openenv validate ...
+[16:38:52] PASSED -- openenv validate passed
+[16:38:52] [OK] workspace: Ready for multi-mode deployment
+[16:38:52] Step 8/8: Local quality checks ...
+[16:38:53] PASSED -- Key Python files compile cleanly
+[16:40:07] PASSED -- pytest tests/ passed
+
+========================================
+ Validation checks passed: 35
+ Submission looks ready for hackathon review.
+========================================
+```
-- **Multi-step information gathering** — iterative data collection before terminal decisions
-- **Contextual filtering** — ignoring noise while focusing on eligibility criteria
-- **Mathematical precision** — strict integer threshold adherence
-- **AI safety alignment** — knowing when to defer to a human supervisor
+## OpenEnv Compliance
-Training an agent to score highly across all 5 tasks would produce a system deployable alongside real welfare officers to assist with applicant evaluation.
+| Requirement | Status |
+|---|---|
+| `step()` / `reset()` / `state` property | Yes |
+| Typed `Action` model | Yes |
+| Typed `Observation` model | Yes |
+| `openenv.yaml` present | Yes |
+| `/health` endpoint | Yes |
+| OpenAI-compatible inference client | Yes |
+| Root `inference.py` script | Yes |
+| 5 graded tasks | Yes |
+| FastAPI runtime | Yes |
+| Resource declaration in yaml | Yes |
+| `API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN` read in `inference.py` | Yes |
+| Optional `LOCAL_IMAGE_NAME` in `inference.py` | Yes |
+| Structured `[START]` / `[STEP]` / `[END]` stdout logs | Yes |
+
+## Closing Note
+
+This benchmark is strongest when understood as a test of **operational judgment**, not just reasoning accuracy. The agent must be precise, skeptical, protocol-aware, and restrained. That combination is rare in benchmarks and crucial in real administration systems.
+
+If an AI system can perform well here, it is not merely answering questions. It is behaving like a careful officer.
diff --git a/reports/README.txt b/reports/README.txt
new file mode 100644
index 0000000..151140a
--- /dev/null
+++ b/reports/README.txt
@@ -0,0 +1,44 @@
+OpenEnv scheme_env Benchmark — Baseline Report
+================================================
+
+Files in this directory:
+
+ leaderboard.csv
+ Model rankings sorted by average score (descending).
+ Columns: Model, Size, Task1, Task2, Task3, Task4, Task5, Average.
+
+ results.json
+ Full results for all models including per-task scores and standard
+ deviations. Useful for programmatic downstream analysis.
+
+ average_scores.png
+ Horizontal bar chart of each model's average score across all 5 tasks.
+ Bars are colour-coded: red < 0.50, orange 0.50–0.75, green > 0.75.
+
+ task_heatmap.png
+ Heatmap with models as rows and tasks as columns.
+ Colour scale: red = 0.0, yellow = 0.5, green = 1.0 (RdYlGn).
+ Cell values show the exact score.
+
+ efficiency_scatter.png
+ Scatter plot of average score (x) vs Task 4 score (y).
+ Task 4 is the escalation-dilemma task and tests protocol adherence.
+ Each point is labelled with the short model name.
+
+ difficulty_profile.png
+ Line chart showing mean score per task across all 8 models with error
+ bars (±1 std). Reveals which tasks are hardest / easiest on average.
+
+ summary.txt
+ Plain-text summary: best/worst model, hardest/easiest task, and any
+ model that scored 1.0 on every task.
+
+ README.txt
+ This file.
+
+Tasks:
+ Task 1 — Basic eligibility check
+ Task 2 — Multi-criterion scheme selection
+ Task 3 — Income-threshold boundary case
+ Task 4 — Escalation dilemma (employment data conflict)
+ Task 5 — Document-verification age conflict
diff --git a/reports/average_scores.png b/reports/average_scores.png
new file mode 100644
index 0000000..1934918
Binary files /dev/null and b/reports/average_scores.png differ
diff --git a/reports/difficulty_profile.png b/reports/difficulty_profile.png
new file mode 100644
index 0000000..a65905a
Binary files /dev/null and b/reports/difficulty_profile.png differ
diff --git a/reports/efficiency_scatter.png b/reports/efficiency_scatter.png
new file mode 100644
index 0000000..056b93f
Binary files /dev/null and b/reports/efficiency_scatter.png differ
diff --git a/reports/inference_logs/inference_mistral_nemotron.txt b/reports/inference_logs/inference_mistral_nemotron.txt
new file mode 100644
index 0000000..f00cae4
--- /dev/null
+++ b/reports/inference_logs/inference_mistral_nemotron.txt
@@ -0,0 +1,349 @@
+
+============================================================
+ SCHEME ENV — OPTION A EVALUATION
+ Model : mistralai/mistral-nemotron
+ Env : http://localhost:7860
+ Repeats : 3 per task
+============================================================
+
+ [Task 1 — repeat 1/3]
+[START] task=scheme_discovery env=scheme_env model=mistralai/mistral-nemotron
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '19', 'income': '2762', 'state_of_residence': 'Uttar Pradesh'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=approve_scheme('PMKVY') reward=10.00 done=true error=null
+ Step 03: approve_scheme('PMKVY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMKVY — optimal scheme.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 1 — repeat 2/3]
+[START] task=scheme_discovery env=scheme_env model=mistralai/mistral-nemotron
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '25', 'income': '7782', 'marital_status': 'divorced', 'state_of_residence': 'Uttar Pradesh', 'number_of_children': '3'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=approve_scheme('PMKVY') reward=10.00 done=true error=null
+ Step 03: approve_scheme('PMKVY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMKVY — optimal scheme.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 1 — repeat 3/3]
+[START] task=scheme_discovery env=scheme_env model=mistralai/mistral-nemotron
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '32', 'income': '1847', 'bank_name': 'PNB'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=approve_scheme('PMKVY') reward=3.00 done=true error=null
+ Step 03: approve_scheme('PMKVY') -> reward=3.0, done=True
+ SUBOPTIMAL: PMKVY is eligible but PMAY provides greater benefit.
+[END] success=false steps=3 score=0.500 rewards=0.00,0.00,3.00
+
+ GRADER SCORE: 0.500 / 1.0
+
+ [Task 2 — repeat 1/3]
+[START] task=missing_data env=scheme_env model=mistralai/mistral-nemotron
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '47', 'income': '3809', 'bank_name': 'SBI', 'state_of_residence': 'Maharashtra', 'number_of_children': '4'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 03: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 2 — repeat 2/3]
+[START] task=missing_data env=scheme_env model=mistralai/mistral-nemotron
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '54', 'income': '2075', 'number_of_children': '3', 'bank_name': 'SBI'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 03: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 2 — repeat 3/3]
+[START] task=missing_data env=scheme_env model=mistralai/mistral-nemotron
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '27', 'income': '4839', 'bank_name': 'UCO Bank', 'state_of_residence': 'Bihar'}
+ Missing : ['has_aadhaar', 'occupation']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 03: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 3 — repeat 1/3]
+[START] task=boundary_fraud env=scheme_env model=mistralai/mistral-nemotron
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '28', 'state_of_residence': 'Bihar'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('income') reward=0.00 done=false error=null
+ Step 02: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 11838.
+[STEP] step=3 action=reject_applicant('INCOME_TOO_HIGH') reward=5.00 done=true error=null
+ Step 03: reject_applicant('INCOME_TOO_HIGH') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 11838 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 3 — repeat 2/3]
+[START] task=boundary_fraud env=scheme_env model=mistralai/mistral-nemotron
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '32', 'marital_status': 'widowed'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('income') reward=0.00 done=false error=null
+ Step 02: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 10356.
+[STEP] step=3 action=reject_applicant('INCOME_TOO_HIGH') reward=5.00 done=true error=null
+ Step 03: reject_applicant('INCOME_TOO_HIGH') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 10356 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 3 — repeat 3/3]
+[START] task=boundary_fraud env=scheme_env model=mistralai/mistral-nemotron
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '27', 'state_of_residence': 'Maharashtra', 'bank_name': 'PNB'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('income') reward=0.00 done=false error=null
+ Step 02: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 10026.
+[STEP] step=3 action=reject_applicant('INCOME_TOO_HIGH') reward=5.00 done=true error=null
+ Step 03: reject_applicant('INCOME_TOO_HIGH') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 10026 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 1/3]
+[START] task=escalation_dilemma env=scheme_env model=mistralai/mistral-nemotron
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '33', 'income': '14586', 'occupation': 'student', 'has_aadhaar': 'True', 'number_of_children': '4', 'marital_status': 'widowed'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of BSNL for the past 6 years with active
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 2/3]
+[START] task=escalation_dilemma env=scheme_env model=mistralai/mistral-nemotron
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '32', 'income': '11911', 'occupation': 'student', 'has_aadhaar': 'True', 'number_of_children': '1', 'bank_name': 'UCO Bank'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of Coal India for the past 6 years with a
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 3/3]
+[START] task=escalation_dilemma env=scheme_env model=mistralai/mistral-nemotron
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '43', 'income': '11888', 'occupation': 'student', 'has_aadhaar': 'True', 'number_of_children': '4', 'bank_name': 'SBI', 'state_of_residence': 'Gujarat'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of State Bank of India for the past 6 yea
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 1/3]
+[START] task=document_conflict env=scheme_env model=mistralai/mistral-nemotron
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '33', 'income': '8160', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'state_of_residence': 'Uttar Pradesh', 'marital_status': 'divorced'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=approve_scheme('PMKVY') reward=-1.50 done=false error=null
+ Step 01: approve_scheme('PMKVY') -> reward=-1.5, done=False
+ PROTOCOL VIOLATION: You must verify the Aadhaar card before approving any scheme when age is a critical eligibility fact
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 33 in the prof
+[STEP] step=3 action=reject_applicant('AGE_EXCEEDED') reward=5.00 done=true error=null
+ Step 03: reject_applicant('AGE_EXCEEDED') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=36, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=3 score=1.000 rewards=-1.50,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 2/3]
+[START] task=document_conflict env=scheme_env model=mistralai/mistral-nemotron
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '33', 'income': '6654', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'number_of_children': '3'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=approve_scheme('PMKVY') reward=-1.50 done=false error=null
+ Step 01: approve_scheme('PMKVY') -> reward=-1.5, done=False
+ PROTOCOL VIOLATION: You must verify the Aadhaar card before approving any scheme when age is a critical eligibility fact
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 33 in the prof
+[STEP] step=3 action=reject_applicant('AGE_EXCEEDED') reward=5.00 done=true error=null
+ Step 03: reject_applicant('AGE_EXCEEDED') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=36, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=3 score=1.000 rewards=-1.50,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 3/3]
+[START] task=document_conflict env=scheme_env model=mistralai/mistral-nemotron
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '34', 'income': '8482', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '3'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=approve_scheme('PMKVY') reward=-1.50 done=false error=null
+ Step 01: approve_scheme('PMKVY') -> reward=-1.5, done=False
+ PROTOCOL VIOLATION: You must verify the Aadhaar card before approving any scheme when age is a critical eligibility fact
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 37 years. Note: this differs from the self-reported age of 34 in the prof
+[STEP] step=3 action=reject_applicant('AGE_EXCEEDED') reward=5.00 done=true error=null
+ Step 03: reject_applicant('AGE_EXCEEDED') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=37, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=3 score=1.000 rewards=-1.50,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+============================================================
+ FINAL GRADER SCORES (mean ± std over 3 repeats)
+============================================================
+ Task 1 (Scheme Discovery ): 0.833 ± 0.289 / 1.0
+ Task 2 (Missing Data ): 1.000 ± 0.000 / 1.0
+ Task 3 (Boundary Fraud ): 1.000 ± 0.000 / 1.0
+ Task 4 (Escalation Dilemma ): 1.000 ± 0.000 / 1.0
+ Task 5 (Document Conflict ): 1.000 ± 0.000 / 1.0
+ Average : 0.967 / 1.0
+============================================================
+SCORE_JSON {"task": 1, "score": 0.8333}
+STD_JSON {"task": 1, "std": 0.2887}
+SCORE_JSON {"task": 2, "score": 1.0}
+STD_JSON {"task": 2, "std": 0.0}
+SCORE_JSON {"task": 3, "score": 1.0}
+STD_JSON {"task": 3, "std": 0.0}
+SCORE_JSON {"task": 4, "score": 1.0}
+STD_JSON {"task": 4, "std": 0.0}
+SCORE_JSON {"task": 5, "score": 1.0}
+STD_JSON {"task": 5, "std": 0.0}
diff --git a/reports/inference_logs/inference_nemotron3_120b.txt b/reports/inference_logs/inference_nemotron3_120b.txt
new file mode 100644
index 0000000..aa28394
--- /dev/null
+++ b/reports/inference_logs/inference_nemotron3_120b.txt
@@ -0,0 +1,358 @@
+
+============================================================
+ SCHEME ENV — OPTION A EVALUATION
+ Model : nvidia/nemotron-3-super-120b-a12b
+ Env : http://localhost:7860
+ Repeats : 3 per task
+============================================================
+
+ [Task 1 — repeat 1/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/nemotron-3-super-120b-a12b
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '35', 'income': '5969', 'number_of_children': '4'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 03: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=4 action=approve_scheme('PMAY') reward=10.00 done=true error=null
+ Step 04: approve_scheme('PMAY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMAY — optimal scheme.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 1 — repeat 2/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/nemotron-3-super-120b-a12b
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '25', 'income': '4455', 'bank_name': 'Canara Bank', 'state_of_residence': 'Rajasthan'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=approve_scheme('PMAY') reward=10.00 done=true error=null
+ Step 03: approve_scheme('PMAY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMAY — optimal scheme.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 1 — repeat 3/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/nemotron-3-super-120b-a12b
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '31', 'income': '5924', 'marital_status': 'divorced'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=approve_scheme('PMAY') reward=10.00 done=true error=null
+ Step 03: approve_scheme('PMAY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMAY — optimal scheme.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 2 — repeat 1/3]
+[START] task=missing_data env=scheme_env model=nvidia/nemotron-3-super-120b-a12b
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '30', 'income': '1542', 'marital_status': 'divorced', 'number_of_children': '0', 'state_of_residence': 'Bihar'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=approve_scheme('PMAY') reward=-5.00 done=true error=null
+ Step 03: approve_scheme('PMAY') -> reward=-5.0, done=True
+ ERROR: Applicant does not qualify for 'PMAY'.
+[END] success=false steps=3 score=0.000 rewards=0.00,0.00,-5.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 2 — repeat 2/3]
+[START] task=missing_data env=scheme_env model=nvidia/nemotron-3-super-120b-a12b
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '40', 'income': '4467', 'number_of_children': '2', 'state_of_residence': 'Rajasthan'}
+ Missing : ['has_aadhaar', 'occupation']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=approve_scheme('PMAY') reward=-5.00 done=true error=null
+ Step 03: approve_scheme('PMAY') -> reward=-5.0, done=True
+ ERROR: Applicant does not qualify for 'PMAY'.
+[END] success=false steps=3 score=0.000 rewards=0.00,0.00,-5.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 2 — repeat 3/3]
+[START] task=missing_data env=scheme_env model=nvidia/nemotron-3-super-120b-a12b
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '50', 'income': '3776', 'marital_status': 'widowed', 'bank_name': 'SBI', 'state_of_residence': 'Rajasthan'}
+ Missing : ['has_aadhaar', 'occupation']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=approve_scheme('PMAY') reward=-5.00 done=true error=null
+ Step 03: approve_scheme('PMAY') -> reward=-5.0, done=True
+ ERROR: Applicant does not qualify for 'PMAY'.
+[END] success=false steps=3 score=0.000 rewards=0.00,0.00,-5.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 3 — repeat 1/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/nemotron-3-super-120b-a12b
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '26', 'number_of_children': '1', 'state_of_residence': 'Rajasthan', 'bank_name': 'SBI'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('income') reward=0.00 done=false error=null
+ Step 02: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 10195.
+[STEP] step=3 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 03: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=4 action=reject_applicant('NO_ELIGIBLE_SCHEME') reward=5.00 done=true error=null
+ Step 04: reject_applicant('NO_ELIGIBLE_SCHEME') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 10195 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 3 — repeat 2/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/nemotron-3-super-120b-a12b
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '30', 'bank_name': 'Bank of Baroda', 'number_of_children': '2'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('income') reward=0.00 done=false error=null
+ Step 02: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 10217.
+[STEP] step=3 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 03: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=4 action=reject_applicant('NO_ELIGIBLE_SCHEME') reward=5.00 done=true error=null
+ Step 04: reject_applicant('NO_ELIGIBLE_SCHEME') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 10217 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 3 — repeat 3/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/nemotron-3-super-120b-a12b
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '27', 'number_of_children': '4', 'marital_status': 'married'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('income') reward=0.00 done=false error=null
+ Step 02: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 10088.
+[STEP] step=3 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 03: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=4 action=reject_applicant('NO_ELIGIBLE_SCHEME') reward=5.00 done=true error=null
+ Step 04: reject_applicant('NO_ELIGIBLE_SCHEME') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 10088 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 1/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/nemotron-3-super-120b-a12b
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '23', 'income': '11636', 'occupation': 'student', 'has_aadhaar': 'True', 'bank_name': 'Canara Bank'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of BSNL for the past 6 years with active
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 2/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/nemotron-3-super-120b-a12b
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '40', 'income': '16723', 'occupation': 'student', 'has_aadhaar': 'True', 'number_of_children': '4', 'bank_name': 'Canara Bank'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of Indian Railways for the past 6 years w
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 3/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/nemotron-3-super-120b-a12b
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '34', 'income': '15485', 'occupation': 'student', 'has_aadhaar': 'True', 'state_of_residence': 'Bihar', 'number_of_children': '3'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of GAIL India for the past 6 years with a
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 1/3]
+[START] task=document_conflict env=scheme_env model=nvidia/nemotron-3-super-120b-a12b
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '33', 'income': '7383', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'bank_name': 'PNB', 'number_of_children': '4'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=approve_scheme('PMKVY') reward=-1.50 done=false error=null
+ Step 01: approve_scheme('PMKVY') -> reward=-1.5, done=False
+ PROTOCOL VIOLATION: You must verify the Aadhaar card before approving any scheme when age is a critical eligibility fact
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 33 in the prof
+[STEP] step=3 action=reject_applicant('NO_ELIGIBLE_SCHEME') reward=5.00 done=true error=null
+ Step 03: reject_applicant('NO_ELIGIBLE_SCHEME') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=36, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=3 score=1.000 rewards=-1.50,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 2/3]
+[START] task=document_conflict env=scheme_env model=nvidia/nemotron-3-super-120b-a12b
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '34', 'income': '7689', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '1', 'marital_status': 'divorced'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 34 in the prof
+[STEP] step=2 action=reject_applicant('NO_ELIGIBLE_SCHEME') reward=5.00 done=true error=null
+ Step 02: reject_applicant('NO_ELIGIBLE_SCHEME') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=36, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=2 score=1.000 rewards=0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 3/3]
+[START] task=document_conflict env=scheme_env model=nvidia/nemotron-3-super-120b-a12b
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '33', 'income': '6093', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'state_of_residence': 'Bihar', 'marital_status': 'divorced'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=approve_scheme('PMKVY') reward=-1.50 done=false error=null
+ Step 01: approve_scheme('PMKVY') -> reward=-1.5, done=False
+ PROTOCOL VIOLATION: You must verify the Aadhaar card before approving any scheme when age is a critical eligibility fact
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 33 in the prof
+[STEP] step=3 action=reject_applicant('NO_ELIGIBLE_SCHEME') reward=5.00 done=true error=null
+ Step 03: reject_applicant('NO_ELIGIBLE_SCHEME') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=36, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=3 score=1.000 rewards=-1.50,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+============================================================
+ FINAL GRADER SCORES (mean ± std over 3 repeats)
+============================================================
+ Task 1 (Scheme Discovery ): 1.000 ± 0.000 / 1.0
+ Task 2 (Missing Data ): 0.000 ± 0.000 / 1.0
+ Task 3 (Boundary Fraud ): 1.000 ± 0.000 / 1.0
+ Task 4 (Escalation Dilemma ): 1.000 ± 0.000 / 1.0
+ Task 5 (Document Conflict ): 1.000 ± 0.000 / 1.0
+ Average : 0.800 / 1.0
+============================================================
+SCORE_JSON {"task": 1, "score": 1.0}
+STD_JSON {"task": 1, "std": 0.0}
+SCORE_JSON {"task": 2, "score": 0.0}
+STD_JSON {"task": 2, "std": 0.0}
+SCORE_JSON {"task": 3, "score": 1.0}
+STD_JSON {"task": 3, "std": 0.0}
+SCORE_JSON {"task": 4, "score": 1.0}
+STD_JSON {"task": 4, "std": 0.0}
+SCORE_JSON {"task": 5, "score": 1.0}
+STD_JSON {"task": 5, "std": 0.0}
diff --git a/reports/inference_logs/inference_nemotron3_nano30b.txt b/reports/inference_logs/inference_nemotron3_nano30b.txt
new file mode 100644
index 0000000..63ec0ae
--- /dev/null
+++ b/reports/inference_logs/inference_nemotron3_nano30b.txt
@@ -0,0 +1,352 @@
+
+============================================================
+ SCHEME ENV — OPTION A EVALUATION
+ Model : nvidia/nemotron-3-nano-30b-a3b
+ Env : http://localhost:7860
+ Repeats : 3 per task
+============================================================
+
+ [Task 1 — repeat 1/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/nemotron-3-nano-30b-a3b
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '19', 'income': '5888', 'bank_name': 'UCO Bank', 'number_of_children': '2', 'state_of_residence': 'Uttar Pradesh'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=3 action=approve_scheme('PMKVY') reward=10.00 done=true error=null
+ Step 03: approve_scheme('PMKVY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMKVY — optimal scheme.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 1 — repeat 2/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/nemotron-3-nano-30b-a3b
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '27', 'income': '9982', 'number_of_children': '0', 'bank_name': 'Bank of Baroda'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=approve_scheme('PMKVY') reward=10.00 done=true error=null
+ Step 03: approve_scheme('PMKVY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMKVY — optimal scheme.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 1 — repeat 3/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/nemotron-3-nano-30b-a3b
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '32', 'income': '1057', 'state_of_residence': 'Bihar', 'marital_status': 'divorced', 'bank_name': 'PNB'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=approve_scheme('PMAY') reward=10.00 done=true error=null
+ Step 03: approve_scheme('PMAY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMAY — optimal scheme.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 2 — repeat 1/3]
+[START] task=missing_data env=scheme_env model=nvidia/nemotron-3-nano-30b-a3b
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '34', 'income': '1576', 'number_of_children': '1', 'bank_name': 'UCO Bank'}
+ Missing : ['has_aadhaar', 'occupation']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=approve_scheme('PMAY') reward=-5.00 done=true error=null
+ Step 03: approve_scheme('PMAY') -> reward=-5.0, done=True
+ ERROR: Applicant does not qualify for 'PMAY'.
+[END] success=false steps=3 score=0.000 rewards=0.00,0.00,-5.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 2 — repeat 2/3]
+[START] task=missing_data env=scheme_env model=nvidia/nemotron-3-nano-30b-a3b
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '53', 'income': '4659', 'number_of_children': '4', 'state_of_residence': 'Maharashtra'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=3 action=approve_scheme('PMAY') reward=-5.00 done=true error=null
+ Step 03: approve_scheme('PMAY') -> reward=-5.0, done=True
+ ERROR: Applicant does not qualify for 'PMAY'.
+[END] success=false steps=3 score=0.000 rewards=0.00,0.00,-5.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 2 — repeat 3/3]
+[START] task=missing_data env=scheme_env model=nvidia/nemotron-3-nano-30b-a3b
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '41', 'income': '2243', 'state_of_residence': 'Gujarat', 'number_of_children': '2', 'bank_name': 'PNB'}
+ Missing : ['has_aadhaar', 'occupation']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=3 action=approve_scheme('PMAY') reward=-5.00 done=true error=null
+ Step 03: approve_scheme('PMAY') -> reward=-5.0, done=True
+ ERROR: Applicant does not qualify for 'PMAY'.
+[END] success=false steps=3 score=0.000 rewards=0.00,0.00,-5.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 3 — repeat 1/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/nemotron-3-nano-30b-a3b
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '31', 'bank_name': 'Canara Bank', 'number_of_children': '1', 'marital_status': 'married'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('income') reward=0.00 done=false error=null
+ Step 02: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 10770.
+[STEP] step=3 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 03: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=4 action=reject_applicant('NO_ELIGIBLE_SCHEME') reward=5.00 done=true error=null
+ Step 04: reject_applicant('NO_ELIGIBLE_SCHEME') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 10770 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 3 — repeat 2/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/nemotron-3-nano-30b-a3b
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '33', 'marital_status': 'divorced', 'state_of_residence': 'Maharashtra'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 10185.
+[STEP] step=4 action=reject_applicant('NO_ELIGIBLE_SCHEME') reward=5.00 done=true error=null
+ Step 04: reject_applicant('NO_ELIGIBLE_SCHEME') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 10185 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 3 — repeat 3/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/nemotron-3-nano-30b-a3b
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '27', 'marital_status': 'widowed', 'number_of_children': '1'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('income') reward=0.00 done=false error=null
+ Step 02: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 10951.
+[STEP] step=3 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 03: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=4 action=reject_applicant('NO_ELIGIBLE_SCHEME') reward=5.00 done=true error=null
+ Step 04: reject_applicant('NO_ELIGIBLE_SCHEME') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 10951 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 1/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/nemotron-3-nano-30b-a3b
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '33', 'income': '13674', 'occupation': 'student', 'has_aadhaar': 'True', 'bank_name': 'PNB'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of State Bank of India for the past 6 yea
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 2/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/nemotron-3-nano-30b-a3b
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '29', 'income': '11064', 'occupation': 'student', 'has_aadhaar': 'True', 'number_of_children': '0'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of GAIL India for the past 6 years with a
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 3/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/nemotron-3-nano-30b-a3b
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '41', 'income': '10119', 'occupation': 'student', 'has_aadhaar': 'True', 'state_of_residence': 'Uttar Pradesh'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of BHEL for the past 6 years with active
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 1/3]
+[START] task=document_conflict env=scheme_env model=nvidia/nemotron-3-nano-30b-a3b
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '35', 'income': '8632', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'bank_name': 'Bank of Baroda', 'state_of_residence': 'Gujarat'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 38 years. Note: this differs from the self-reported age of 35 in the prof
+[STEP] step=2 action=reject_applicant('NO_ELIGIBLE_SCHEME') reward=5.00 done=true error=null
+ Step 02: reject_applicant('NO_ELIGIBLE_SCHEME') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=38, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=2 score=1.000 rewards=0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 2/3]
+[START] task=document_conflict env=scheme_env model=nvidia/nemotron-3-nano-30b-a3b
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '33', 'income': '7933', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'bank_name': 'SBI', 'state_of_residence': 'Uttar Pradesh', 'number_of_children': '4'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=approve_scheme('PMKVY') reward=-1.50 done=false error=null
+ Step 01: approve_scheme('PMKVY') -> reward=-1.5, done=False
+ PROTOCOL VIOLATION: You must verify the Aadhaar card before approving any scheme when age is a critical eligibility fact
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 33 in the prof
+[STEP] step=3 action=reject_applicant('NO_ELIGIBLE_SCHEME') reward=5.00 done=true error=null
+ Step 03: reject_applicant('NO_ELIGIBLE_SCHEME') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=36, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=3 score=1.000 rewards=-1.50,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 3/3]
+[START] task=document_conflict env=scheme_env model=nvidia/nemotron-3-nano-30b-a3b
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '33', 'income': '7429', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'state_of_residence': 'Bihar', 'bank_name': 'Bank of Baroda', 'number_of_children': '0'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 33 in the prof
+[STEP] step=2 action=reject_applicant('NO_ELIGIBLE_SCHEME') reward=5.00 done=true error=null
+ Step 02: reject_applicant('NO_ELIGIBLE_SCHEME') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=36, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=2 score=1.000 rewards=0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+============================================================
+ FINAL GRADER SCORES (mean ± std over 3 repeats)
+============================================================
+ Task 1 (Scheme Discovery ): 1.000 ± 0.000 / 1.0
+ Task 2 (Missing Data ): 0.000 ± 0.000 / 1.0
+ Task 3 (Boundary Fraud ): 1.000 ± 0.000 / 1.0
+ Task 4 (Escalation Dilemma ): 1.000 ± 0.000 / 1.0
+ Task 5 (Document Conflict ): 1.000 ± 0.000 / 1.0
+ Average : 0.800 / 1.0
+============================================================
+SCORE_JSON {"task": 1, "score": 1.0}
+STD_JSON {"task": 1, "std": 0.0}
+SCORE_JSON {"task": 2, "score": 0.0}
+STD_JSON {"task": 2, "std": 0.0}
+SCORE_JSON {"task": 3, "score": 1.0}
+STD_JSON {"task": 3, "std": 0.0}
+SCORE_JSON {"task": 4, "score": 1.0}
+STD_JSON {"task": 4, "std": 0.0}
+SCORE_JSON {"task": 5, "score": 1.0}
+STD_JSON {"task": 5, "std": 0.0}
diff --git a/reports/inference_logs/inference_nemotron51b.txt b/reports/inference_logs/inference_nemotron51b.txt
new file mode 100644
index 0000000..03f64cd
--- /dev/null
+++ b/reports/inference_logs/inference_nemotron51b.txt
@@ -0,0 +1,370 @@
+
+============================================================
+ SCHEME ENV — OPTION A EVALUATION
+ Model : nvidia/llama-3.3-nemotron-super-49b-v1
+ Env : http://localhost:7860
+ Repeats : 3 per task
+============================================================
+
+ [Task 1 — repeat 1/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '19', 'income': '1437', 'number_of_children': '1', 'state_of_residence': 'Rajasthan', 'bank_name': 'PNB'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 03: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=4 action=approve_scheme('PMKVY') reward=10.00 done=true error=null
+ Step 04: approve_scheme('PMKVY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMKVY — optimal scheme.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 1 — repeat 2/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '19', 'income': '7676', 'state_of_residence': 'Rajasthan'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 03: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=4 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 04: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=approve_scheme('PMKVY') reward=10.00 done=true error=null
+ Step 05: approve_scheme('PMKVY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMKVY — optimal scheme.
+[END] success=false steps=5 score=0.950 rewards=0.00,0.00,0.00,-0.10,10.00
+
+ GRADER SCORE: 0.950 / 1.0
+
+ [Task 1 — repeat 3/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '26', 'income': '1391', 'state_of_residence': 'Uttar Pradesh'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 03: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=4 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 04: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=approve_scheme('PMKVY') reward=3.00 done=true error=null
+ Step 05: approve_scheme('PMKVY') -> reward=3.0, done=True
+ SUBOPTIMAL: PMKVY is eligible but PMAY provides greater benefit.
+[END] success=false steps=5 score=0.450 rewards=0.00,0.00,0.00,-0.10,3.00
+
+ GRADER SCORE: 0.450 / 1.0
+
+ [Task 2 — repeat 1/3]
+[START] task=missing_data env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '32', 'income': '3251', 'state_of_residence': 'Rajasthan'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 03: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=4 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 04: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=false steps=4 score=0.960 rewards=0.00,0.00,0.00,10.00
+
+ GRADER SCORE: 0.960 / 1.0
+
+ [Task 2 — repeat 2/3]
+[START] task=missing_data env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '22', 'income': '2235', 'state_of_residence': 'Bihar'}
+ Missing : ['has_aadhaar', 'occupation']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 01: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=3 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 03: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 2 — repeat 3/3]
+[START] task=missing_data env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '28', 'income': '4274', 'state_of_residence': 'Rajasthan', 'number_of_children': '4', 'bank_name': 'SBI'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 03: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 04: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=false steps=4 score=0.910 rewards=0.00,0.00,-0.10,10.00
+
+ GRADER SCORE: 0.910 / 1.0
+
+ [Task 3 — repeat 1/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '26', 'marital_status': 'married'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 11975.
+[STEP] step=4 action=reject_applicant('INCOME_TOO_HIGH') reward=5.00 done=true error=null
+ Step 04: reject_applicant('INCOME_TOO_HIGH') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 11975 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 3 — repeat 2/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '26', 'bank_name': 'Bank of Baroda', 'number_of_children': '3'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 11092.
+[STEP] step=4 action=reject_applicant('INCOME_TOO_HIGH') reward=5.00 done=true error=null
+ Step 04: reject_applicant('INCOME_TOO_HIGH') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 11092 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 3 — repeat 3/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '27', 'marital_status': 'divorced', 'number_of_children': '3'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 10488.
+[STEP] step=4 action=reject_applicant('INCOME_TOO_HIGH') reward=5.00 done=true error=null
+ Step 04: reject_applicant('INCOME_TOO_HIGH') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 10488 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 1/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '38', 'income': '11505', 'occupation': 'student', 'has_aadhaar': 'True', 'state_of_residence': 'Maharashtra', 'marital_status': 'married'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of Indian Railways for the past 6 years w
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 2/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '35', 'income': '11663', 'occupation': 'student', 'has_aadhaar': 'True', 'state_of_residence': 'Maharashtra'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of GAIL India for the past 6 years with a
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 3/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '24', 'income': '18630', 'occupation': 'student', 'has_aadhaar': 'True', 'number_of_children': '1', 'marital_status': 'divorced', 'state_of_residence': 'Uttar Pradesh'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of BSNL for the past 6 years with active
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 1/3]
+[START] task=document_conflict env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '34', 'income': '8492', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '2', 'bank_name': 'SBI', 'state_of_residence': 'Bihar'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 34 in the prof
+[STEP] step=2 action=reject_applicant('AGE_EXCEEDED') reward=5.00 done=true error=null
+ Step 02: reject_applicant('AGE_EXCEEDED') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=36, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=2 score=1.000 rewards=0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 2/3]
+[START] task=document_conflict env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '34', 'income': '7281', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'state_of_residence': 'Maharashtra', 'number_of_children': '1', 'bank_name': 'PNB'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 37 years. Note: this differs from the self-reported age of 34 in the prof
+[STEP] step=2 action=reject_applicant('AGE_EXCEEDED') reward=5.00 done=true error=null
+ Step 02: reject_applicant('AGE_EXCEEDED') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=37, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=2 score=1.000 rewards=0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 3/3]
+[START] task=document_conflict env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '35', 'income': '8158', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'bank_name': 'Bank of Baroda'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 37 years. Note: this differs from the self-reported age of 35 in the prof
+[STEP] step=2 action=reject_applicant('AGE_EXCEEDED') reward=5.00 done=true error=null
+ Step 02: reject_applicant('AGE_EXCEEDED') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=37, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=2 score=1.000 rewards=0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+============================================================
+ FINAL GRADER SCORES (mean ± std over 3 repeats)
+============================================================
+ Task 1 (Scheme Discovery ): 0.800 ± 0.304 / 1.0
+ Task 2 (Missing Data ): 0.957 ± 0.045 / 1.0
+ Task 3 (Boundary Fraud ): 1.000 ± 0.000 / 1.0
+ Task 4 (Escalation Dilemma ): 1.000 ± 0.000 / 1.0
+ Task 5 (Document Conflict ): 1.000 ± 0.000 / 1.0
+ Average : 0.951 / 1.0
+============================================================
+SCORE_JSON {"task": 1, "score": 0.8}
+STD_JSON {"task": 1, "std": 0.3041}
+SCORE_JSON {"task": 2, "score": 0.9567}
+STD_JSON {"task": 2, "std": 0.0451}
+SCORE_JSON {"task": 3, "score": 1.0}
+STD_JSON {"task": 3, "std": 0.0}
+SCORE_JSON {"task": 4, "score": 1.0}
+STD_JSON {"task": 4, "std": 0.0}
+SCORE_JSON {"task": 5, "score": 1.0}
+STD_JSON {"task": 5, "std": 0.0}
diff --git a/reports/inference_logs/inference_nemotron_mini4b.txt b/reports/inference_logs/inference_nemotron_mini4b.txt
new file mode 100644
index 0000000..95d3d14
--- /dev/null
+++ b/reports/inference_logs/inference_nemotron_mini4b.txt
@@ -0,0 +1,396 @@
+
+============================================================
+ SCHEME ENV — OPTION A EVALUATION
+ Model : nvidia/nemotron-mini-4b-instruct
+ Env : http://localhost:7860
+ Repeats : 3 per task
+============================================================
+
+ [Task 1 — repeat 1/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/nemotron-mini-4b-instruct
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '25', 'income': '3626', 'state_of_residence': 'Bihar', 'marital_status': 'divorced', 'number_of_children': '3'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=3 action=approve_scheme('PMKVY') reward=3.00 done=true error=null
+ Step 03: approve_scheme('PMKVY') -> reward=3.0, done=True
+ SUBOPTIMAL: PMKVY is eligible but PMAY provides greater benefit.
+[END] success=false steps=3 score=0.500 rewards=0.00,0.00,3.00
+
+ GRADER SCORE: 0.500 / 1.0
+
+ [Task 1 — repeat 2/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/nemotron-mini-4b-instruct
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '34', 'income': '3917', 'number_of_children': '4', 'state_of_residence': 'Bihar', 'marital_status': 'unmarried'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=3 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 03: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=approve_scheme('PMKVY') reward=3.00 done=true error=null
+ Step 04: approve_scheme('PMKVY') -> reward=3.0, done=True
+ SUBOPTIMAL: PMKVY is eligible but PMAY provides greater benefit.
+[END] success=false steps=4 score=0.450 rewards=0.00,0.00,-0.10,3.00
+
+ GRADER SCORE: 0.450 / 1.0
+
+ [Task 1 — repeat 3/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/nemotron-mini-4b-instruct
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '33', 'income': '2858', 'bank_name': 'SBI'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=3 action=approve_scheme('PMKVY') reward=3.00 done=true error=null
+ Step 03: approve_scheme('PMKVY') -> reward=3.0, done=True
+ SUBOPTIMAL: PMKVY is eligible but PMAY provides greater benefit.
+[END] success=false steps=3 score=0.500 rewards=0.00,0.00,3.00
+
+ GRADER SCORE: 0.500 / 1.0
+
+ [Task 2 — repeat 1/3]
+[START] task=missing_data env=scheme_env model=nvidia/nemotron-mini-4b-instruct
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '48', 'income': '1982', 'number_of_children': '3', 'marital_status': 'married'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=3 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 03: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 2 — repeat 2/3]
+[START] task=missing_data env=scheme_env model=nvidia/nemotron-mini-4b-instruct
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '36', 'income': '4164', 'number_of_children': '1', 'marital_status': 'unmarried', 'bank_name': 'Canara Bank'}
+ Missing : ['has_aadhaar', 'occupation']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 01: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=2 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 02: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=3 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 03: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=4 action=escalate('MANUAL_REVIEW_REQUIRED') reward=-2.00 done=true error=null
+ Step 04: escalate('MANUAL_REVIEW_REQUIRED') -> reward=-2.0, done=True
+ INCORRECT ESCALATION: Escalation is only appropriate when data integrity is genuinely compromised. This case has suffici
+[END] success=false steps=4 score=0.000 rewards=0.00,0.00,0.00,-2.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 2 — repeat 3/3]
+[START] task=missing_data env=scheme_env model=nvidia/nemotron-mini-4b-instruct
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '43', 'income': '4120', 'bank_name': 'PNB'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=3 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 03: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 3 — repeat 1/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/nemotron-mini-4b-instruct
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '29', 'marital_status': 'married'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 10437.
+[STEP] step=4 action=reject_applicant('INCOME_TOO_HIGH') reward=5.00 done=true error=null
+ Step 04: reject_applicant('INCOME_TOO_HIGH') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 10437 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 3 — repeat 2/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/nemotron-mini-4b-instruct
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '24', 'bank_name': 'Canara Bank', 'state_of_residence': 'Uttar Pradesh', 'marital_status': 'divorced'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 11651.
+[STEP] step=4 action=reject_applicant('INCOME_TOO_HIGH') reward=5.00 done=true error=null
+ Step 04: reject_applicant('INCOME_TOO_HIGH') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 11651 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 3 — repeat 3/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/nemotron-mini-4b-instruct
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '28', 'bank_name': 'UCO Bank', 'state_of_residence': 'Maharashtra', 'marital_status': 'divorced'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 10915.
+[STEP] step=4 action=escalate('DATA_MISMATCH') reward=-2.00 done=true error=null
+ Step 04: escalate('DATA_MISMATCH') -> reward=-2.0, done=True
+ INCORRECT ESCALATION: Escalation is only appropriate when data integrity is genuinely compromised. This case has suffici
+[END] success=false steps=4 score=0.000 rewards=0.00,0.00,0.00,-2.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 4 — repeat 1/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/nemotron-mini-4b-instruct
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '30', 'income': '19888', 'occupation': 'student', 'has_aadhaar': 'True', 'bank_name': 'Canara Bank', 'state_of_residence': 'Maharashtra', 'marital_status': 'widowed'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 01: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=2 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 02: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=3 action=escalate('DATA_MISMATCH') reward=-1.00 done=false error=null
+ Step 03: escalate('DATA_MISMATCH') -> reward=-1.0, done=False
+ INSUFFICIENT BASIS FOR ESCALATION: First request the PAN card to verify the suspected employment contradiction. Escalate
+[STEP] step=4 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 04: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of BSNL for the past 6 years with active
+[STEP] step=5 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 05: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=false steps=5 score=0.950 rewards=-0.10,-0.10,-1.00,0.00,10.00
+
+ GRADER SCORE: 0.950 / 1.0
+
+ [Task 4 — repeat 2/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/nemotron-mini-4b-instruct
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '30', 'income': '16335', 'occupation': 'student', 'has_aadhaar': 'True', 'state_of_residence': 'Rajasthan'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 01: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=2 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 02: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=3 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 03: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of BHEL for the past 6 years with active
+[STEP] step=4 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 04: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=false steps=4 score=0.950 rewards=-0.10,-0.10,0.00,10.00
+
+ GRADER SCORE: 0.950 / 1.0
+
+ [Task 4 — repeat 3/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/nemotron-mini-4b-instruct
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '41', 'income': '19645', 'occupation': 'student', 'has_aadhaar': 'True', 'bank_name': 'SBI', 'number_of_children': '2'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 01: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=2 action=escalate('DATA_MISMATCH') reward=-1.00 done=false error=null
+ Step 02: escalate('DATA_MISMATCH') -> reward=-1.0, done=False
+ INSUFFICIENT BASIS FOR ESCALATION: First request the PAN card to verify the suspected employment contradiction. Escalate
+[STEP] step=3 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 03: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of BHEL for the past 6 years with active
+[STEP] step=4 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 04: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=4 score=1.000 rewards=-0.10,-1.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 1/3]
+[START] task=document_conflict env=scheme_env model=nvidia/nemotron-mini-4b-instruct
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '35', 'income': '7344', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'marital_status': 'married'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 01: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=2 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 02: request_document('pan_card') -> reward=0.0, done=False
+ PAN card verified. No anomalies found in tax records. For age verification, the Aadhaar card is the authoritative docume
+[STEP] step=3 action=approve_scheme('PMKVY') reward=-1.50 done=false error=null
+ Step 03: approve_scheme('PMKVY') -> reward=-1.5, done=False
+ PROTOCOL VIOLATION: You must verify the Aadhaar card before approving any scheme when age is a critical eligibility fact
+[STEP] step=4 action=escalate('MANUAL_REVIEW_REQUIRED') reward=-2.00 done=true error=null
+ Step 04: escalate('MANUAL_REVIEW_REQUIRED') -> reward=-2.0, done=True
+ INCORRECT ESCALATION: Escalation is only appropriate when data integrity is genuinely compromised. This case has suffici
+[END] success=false steps=4 score=0.000 rewards=-0.10,0.00,-1.50,-2.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 5 — repeat 2/3]
+[START] task=document_conflict env=scheme_env model=nvidia/nemotron-mini-4b-instruct
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '34', 'income': '8839', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'bank_name': 'SBI', 'state_of_residence': 'Maharashtra', 'marital_status': 'widowed'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 01: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=2 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 02: request_document('pan_card') -> reward=0.0, done=False
+ PAN card verified. No anomalies found in tax records. For age verification, the Aadhaar card is the authoritative docume
+[STEP] step=3 action=approve_scheme('PMKVY') reward=-1.50 done=false error=null
+ Step 03: approve_scheme('PMKVY') -> reward=-1.5, done=False
+ PROTOCOL VIOLATION: You must verify the Aadhaar card before approving any scheme when age is a critical eligibility fact
+[STEP] step=4 action=escalate('MANUAL_REVIEW_REQUIRED') reward=-2.00 done=true error=null
+ Step 04: escalate('MANUAL_REVIEW_REQUIRED') -> reward=-2.0, done=True
+ INCORRECT ESCALATION: Escalation is only appropriate when data integrity is genuinely compromised. This case has suffici
+[END] success=false steps=4 score=0.000 rewards=-0.10,0.00,-1.50,-2.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 5 — repeat 3/3]
+[START] task=document_conflict env=scheme_env model=nvidia/nemotron-mini-4b-instruct
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '33', 'income': '8666', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'number_of_children': '1', 'bank_name': 'PNB'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 01: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=2 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 02: request_document('pan_card') -> reward=0.0, done=False
+ PAN card verified. No anomalies found in tax records. For age verification, the Aadhaar card is the authoritative docume
+[STEP] step=3 action=approve_scheme('PMKVY') reward=-1.50 done=false error=null
+ Step 03: approve_scheme('PMKVY') -> reward=-1.5, done=False
+ PROTOCOL VIOLATION: You must verify the Aadhaar card before approving any scheme when age is a critical eligibility fact
+[STEP] step=4 action=ask_question('self_reported_age') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=5 action=ask_question('aadhaar_age') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=6 action=escalate('MANUAL_REVIEW_REQUIRED') reward=-2.00 done=true error=null
+ Step 06: escalate('MANUAL_REVIEW_REQUIRED') -> reward=-2.0, done=True
+ INCORRECT ESCALATION: Escalation is only appropriate when data integrity is genuinely compromised. This case has suffici
+[END] success=false steps=6 score=0.000 rewards=-0.10,0.00,-1.50,-2.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+============================================================
+ FINAL GRADER SCORES (mean ± std over 3 repeats)
+============================================================
+ Task 1 (Scheme Discovery ): 0.483 ± 0.029 / 1.0
+ Task 2 (Missing Data ): 0.667 ± 0.577 / 1.0
+ Task 3 (Boundary Fraud ): 0.667 ± 0.577 / 1.0
+ Task 4 (Escalation Dilemma ): 0.967 ± 0.029 / 1.0
+ Task 5 (Document Conflict ): 0.000 ± 0.000 / 1.0
+ Average : 0.557 / 1.0
+============================================================
+SCORE_JSON {"task": 1, "score": 0.4833}
+STD_JSON {"task": 1, "std": 0.0289}
+SCORE_JSON {"task": 2, "score": 0.6667}
+STD_JSON {"task": 2, "std": 0.5774}
+SCORE_JSON {"task": 3, "score": 0.6667}
+STD_JSON {"task": 3, "std": 0.5774}
+SCORE_JSON {"task": 4, "score": 0.9667}
+STD_JSON {"task": 4, "std": 0.0289}
+SCORE_JSON {"task": 5, "score": 0.0}
+STD_JSON {"task": 5, "std": 0.0}
diff --git a/reports/inference_logs/inference_nemotron_nano.txt b/reports/inference_logs/inference_nemotron_nano.txt
new file mode 100644
index 0000000..e19b822
--- /dev/null
+++ b/reports/inference_logs/inference_nemotron_nano.txt
@@ -0,0 +1,562 @@
+
+============================================================
+ SCHEME ENV — OPTION A EVALUATION
+ Model : nvidia/llama-3.1-nemotron-nano-8b-v1
+ Env : http://localhost:7860
+ Repeats : 3 per task
+============================================================
+
+ [Task 1 — repeat 1/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '27', 'income': '2415', 'bank_name': 'PNB', 'marital_status': 'divorced'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=3 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 03: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 04: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=approve_scheme('PMKVY') reward=3.00 done=true error=null
+ Step 05: approve_scheme('PMKVY') -> reward=3.0, done=True
+ SUBOPTIMAL: PMKVY is eligible but PMAY provides greater benefit.
+[END] success=false steps=5 score=0.400 rewards=0.00,0.00,-0.10,-0.10,3.00
+
+ GRADER SCORE: 0.400 / 1.0
+
+ [Task 1 — repeat 2/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '22', 'income': '5621', 'marital_status': 'divorced'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=3 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 03: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 04: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=approve_scheme('PMKVY') reward=3.00 done=true error=null
+ Step 05: approve_scheme('PMKVY') -> reward=3.0, done=True
+ SUBOPTIMAL: PMKVY is eligible but PMAY provides greater benefit.
+[END] success=false steps=5 score=0.400 rewards=0.00,0.00,-0.10,-0.10,3.00
+
+ GRADER SCORE: 0.400 / 1.0
+
+ [Task 1 — repeat 3/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '19', 'income': '8028', 'marital_status': 'divorced', 'state_of_residence': 'Maharashtra', 'number_of_children': '0'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 01: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=3 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 03: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 04: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=approve_scheme('PMKVY') reward=10.00 done=true error=null
+ Step 05: approve_scheme('PMKVY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMKVY — optimal scheme.
+[END] success=false steps=5 score=0.900 rewards=0.00,0.00,-0.10,-0.10,10.00
+
+ GRADER SCORE: 0.900 / 1.0
+
+ [Task 2 — repeat 1/3]
+[START] task=missing_data env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '40', 'income': '4258', 'marital_status': 'widowed', 'bank_name': 'PNB', 'state_of_residence': 'Bihar'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=3 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 03: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 04: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 05: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=6 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=7 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=8 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 08: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=false steps=8 score=0.730 rewards=0.00,0.00,-0.10,-0.10,-0.10,10.00
+
+ GRADER SCORE: 0.730 / 1.0
+
+ [Task 2 — repeat 2/3]
+[START] task=missing_data env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '57', 'income': '1596', 'bank_name': 'UCO Bank'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=3 action=check_field('age') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=4 action=check_field('income') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=5 action=check_field('age') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=check_field('age') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=7 action=check_field('has_aadhaar') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=8 action=check_field('age') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=9 action=check_field('age') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=10 action=check_field('age') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=11 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 11: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=true steps=11 score=1.000 rewards=0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 2 — repeat 3/3]
+[START] task=missing_data env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '55', 'income': '1840', 'number_of_children': '0', 'state_of_residence': 'Uttar Pradesh', 'bank_name': 'PNB'}
+ Missing : ['has_aadhaar', 'occupation']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=3 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 03: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 04: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=6 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 06: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=7 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 07: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=false steps=7 score=0.730 rewards=0.00,0.00,-0.10,-0.10,-0.10,10.00
+
+ GRADER SCORE: 0.730 / 1.0
+
+ [Task 3 — repeat 1/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '29', 'bank_name': 'Bank of Baroda', 'marital_status': 'widowed'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 11193.
+[STEP] step=4 action=approve_scheme('PMKVY') reward=-4.00 done=true error=null
+ Step 04: approve_scheme('PMKVY') -> reward=-4.0, done=True
+ CLEAR MISS: Income 11193 exceeds all scheme thresholds (overage: Rs 1194 above PMKVY limit). Use reject_applicant when n
+[END] success=false steps=4 score=0.000 rewards=0.00,0.00,0.00,-4.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 3 — repeat 2/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '28', 'state_of_residence': 'Maharashtra'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 10812.
+[STEP] step=4 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=5 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=7 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=8 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=9 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=10 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=11 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=12 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=13 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=14 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=15 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=16 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=17 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=18 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=19 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=20 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[END] success=false steps=20 score=0.000 rewards=0.00,0.00,0.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 3 — repeat 3/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '26', 'number_of_children': '1', 'state_of_residence': 'Bihar'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 10424.
+[STEP] step=4 action=apply_eligibility_rules('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=5 action=apply_eligibility_rules('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=apply_eligibility_rules('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=7 action=apply_eligibility_rules('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=8 action=apply_eligibility_rules('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=9 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=10 action=apply_eligibility_rules('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=11 action=apply_eligibility_rules('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=12 action=apply_eligibility_rules('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=13 action=apply_eligibility_rules('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=14 action=apply_eligibility_rules('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=15 action=apply_eligibility_rules('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=16 action=apply_eligibility_rules('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=17 action=apply_eligibility_rules('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=18 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=19 action=apply_eligibility_rules('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=20 action=apply_eligibility_rules('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[END] success=false steps=20 score=0.000 rewards=0.00,0.00,0.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 4 — repeat 1/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '25', 'income': '12263', 'occupation': 'student', 'has_aadhaar': 'True', 'bank_name': 'Bank of Baroda', 'state_of_residence': 'Bihar', 'marital_status': 'unmarried'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of Indian Railways for the past 6 years w
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 2/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '24', 'income': '18756', 'occupation': 'student', 'has_aadhaar': 'True', 'marital_status': 'unmarried', 'state_of_residence': 'Rajasthan'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 01: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=2 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 02: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=3 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 03: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 04: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=ask_question('marital_status') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=6 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=7 action=ask_question('education_level') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=8 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=9 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=10 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=11 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=12 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=13 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=14 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=15 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=16 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=17 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=18 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=19 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=20 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[END] success=false steps=20 score=0.000 rewards=-0.10,-0.10,-0.10,-0.10
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 4 — repeat 3/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '37', 'income': '17797', 'occupation': 'student', 'has_aadhaar': 'True', 'bank_name': 'Canara Bank', 'marital_status': 'widowed'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 01: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=2 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 02: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=3 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 03: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 04: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 05: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=6 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 06: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=7 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 07: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=8 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 08: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=9 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 09: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=10 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 10: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=11 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 11: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=12 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 12: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=13 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 13: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=14 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 14: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=15 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 15: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=16 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 16: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=17 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 17: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=18 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 18: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=19 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 19: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=20 action=approve_scheme('PMKVY') reward=-2.00 done=true error=null
+ Step 20: approve_scheme('PMKVY') -> reward=-2.0, done=True
+ TIMEOUT: 20 steps reached without a decision.
+[END] success=false steps=20 score=0.000 rewards=-0.10,-0.10,-0.10,-0.10,-0.10,-0.10,-0.10,-0.10,-0.10,-0.10,-0.10,-0.10,-0.10,-0.10,-0.10,-0.10,-0.10,-0.10,-0.10,-2.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 5 — repeat 1/3]
+[START] task=document_conflict env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '35', 'income': '8289', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'number_of_children': '4', 'state_of_residence': 'Maharashtra', 'bank_name': 'Canara Bank'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 35 in the prof
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 35 in the prof
+[STEP] step=3 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=4 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=5 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=7 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=8 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=9 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=10 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=11 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=12 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=13 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 13: request_document('pan_card') -> reward=0.0, done=False
+ PAN card verified. No anomalies found in tax records. For age verification, the Aadhaar card is the authoritative docume
+[STEP] step=14 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 14: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 35 in the prof
+[STEP] step=15 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=16 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=17 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=18 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=19 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=20 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[END] success=false steps=20 score=0.000 rewards=0.00,0.00,0.00,0.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 5 — repeat 2/3]
+[START] task=document_conflict env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '33', 'income': '8530', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'number_of_children': '3', 'bank_name': 'PNB'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 33 in the prof
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 33 in the prof
+[STEP] step=3 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=4 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=5 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=7 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=8 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=9 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=10 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=11 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=12 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=13 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 13: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 33 in the prof
+[STEP] step=14 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=15 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=16 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=17 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=18 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=19 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=20 action=update_profile({'age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[END] success=false steps=20 score=0.000 rewards=0.00,0.00,0.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 5 — repeat 3/3]
+[START] task=document_conflict env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '33', 'income': '8469', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'number_of_children': '4'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 33 in the prof
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 33 in the prof
+[STEP] step=3 action=update_profile("{'age': '36', 'income': '8469', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'number_of_children': '4', 'aadhaar_age': '36'}") reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=4 action=update_profile({'age': '36', 'income': '8469', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'number_of_children': '4', 'aadhaar_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=5 action=update_profile({'age': '36', 'income': '8469', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'number_of_children': '4', 'aadhaar_age': '36', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=update_profile({'age': '36', 'income': '8469', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=7 action=update_profile({'age': '36', 'income': '8469', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=8 action=update_profile({'age': '36', 'income': '8469', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=9 action=update_profile({'age': '36', 'income': '8469', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=10 action=update_profile({'age': '36', 'income': '8469', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=11 action=update_profile({'age': '36', 'income': '8469', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=12 action=update_profile({'age': '36', 'income': '8469', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=13 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 13: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 33 in the prof
+[STEP] step=14 action=update_profile({'age': 36, 'income': 8469, 'occupation': 'mason', 'has_aadhaar': True, 'self_reported_age': 33, 'number_of_children': 4, 'official_age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=15 action=update_profile({'age': 36, 'income': 8469, 'occupation': 'mason', 'has_aadhaar': True, 'self_reported_age': 33, 'number_of_children': 4, 'official_age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=16 action=update_profile({'age': 36, 'income': 8469, 'occupation': 'mason', 'has_aadhaar': True, 'self_reported_age': 33, 'number_of_children': 4, 'official_age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=17 action=update_profile({'age': 36, 'income': 8469, 'occupation': 'mason', 'has_aadhaar': True, 'self_reported_age': 33, 'number_of_children': 4, 'official_age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=18 action=update_profile({'age': 36, 'income': 8469, 'occupation': 'mason', 'has_aadhaar': True, 'self_reported_age': 33, 'number_of_children': 4, 'official_age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=19 action=update_profile({'age': 36, 'income': 8469, 'occupation': 'mason', 'has_aadhaar': True, 'self_reported_age': 33, 'number_of_children': 4, 'official_age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=20 action=update_profile({'age': 36, 'income': 8469, 'occupation': 'mason', 'has_aadhaar': True, 'self_reported_age': 33, 'number_of_children': 4, 'official_age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[END] success=false steps=20 score=0.000 rewards=0.00,0.00,0.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+============================================================
+ FINAL GRADER SCORES (mean ± std over 3 repeats)
+============================================================
+ Task 1 (Scheme Discovery ): 0.567 ± 0.289 / 1.0
+ Task 2 (Missing Data ): 0.820 ± 0.156 / 1.0
+ Task 3 (Boundary Fraud ): 0.000 ± 0.000 / 1.0
+ Task 4 (Escalation Dilemma ): 0.333 ± 0.577 / 1.0
+ Task 5 (Document Conflict ): 0.000 ± 0.000 / 1.0
+ Average : 0.344 / 1.0
+============================================================
+SCORE_JSON {"task": 1, "score": 0.5667}
+STD_JSON {"task": 1, "std": 0.2887}
+SCORE_JSON {"task": 2, "score": 0.82}
+STD_JSON {"task": 2, "std": 0.1559}
+SCORE_JSON {"task": 3, "score": 0.0}
+STD_JSON {"task": 3, "std": 0.0}
+SCORE_JSON {"task": 4, "score": 0.3333}
+STD_JSON {"task": 4, "std": 0.5774}
+SCORE_JSON {"task": 5, "score": 0.0}
+STD_JSON {"task": 5, "std": 0.0}
diff --git a/reports/inference_logs/inference_nemotron_nano8b.txt b/reports/inference_logs/inference_nemotron_nano8b.txt
new file mode 100644
index 0000000..95bdca3
--- /dev/null
+++ b/reports/inference_logs/inference_nemotron_nano8b.txt
@@ -0,0 +1,544 @@
+
+============================================================
+ SCHEME ENV — OPTION A EVALUATION
+ Model : nvidia/llama-3.1-nemotron-nano-8b-v1
+ Env : http://localhost:7860
+ Repeats : 3 per task
+============================================================
+
+ [Task 1 — repeat 1/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '20', 'income': '4708', 'number_of_children': '2', 'state_of_residence': 'Uttar Pradesh', 'marital_status': 'unmarried'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=3 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=4 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=5 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=7 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=8 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=9 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=10 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=11 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=12 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=13 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=14 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=15 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=16 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=17 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=18 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=19 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=20 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[END] success=false steps=20 score=0.000 rewards=0.00,0.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 1 — repeat 2/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '19', 'income': '7903', 'number_of_children': '2'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=3 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=4 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=5 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=7 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=8 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=9 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=10 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=11 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=12 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=13 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=14 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=15 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=16 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=17 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=18 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=19 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=20 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[END] success=false steps=20 score=0.000 rewards=0.00,0.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 1 — repeat 3/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '19', 'income': '1411', 'state_of_residence': 'Gujarat', 'bank_name': 'Canara Bank'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=3 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 03: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 04: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=6 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=7 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 07: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=8 action=approve_scheme('PMKVY') reward=10.00 done=true error=null
+ Step 08: approve_scheme('PMKVY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMKVY — optimal scheme.
+[END] success=false steps=8 score=0.850 rewards=0.00,0.00,-0.10,-0.10,-0.10,10.00
+
+ GRADER SCORE: 0.850 / 1.0
+
+ [Task 2 — repeat 1/3]
+[START] task=missing_data env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '37', 'income': '3824', 'state_of_residence': 'Bihar'}
+ Missing : ['has_aadhaar', 'occupation']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=3 action=check_field('age') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=4 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 04: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=check_field('has_aadhaar') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=check_field('occupation') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=7 action=check_field('age') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=8 action=check_field('state_of_residence') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=9 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 09: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=false steps=9 score=0.910 rewards=0.00,0.00,-0.10,10.00
+
+ GRADER SCORE: 0.910 / 1.0
+
+ [Task 2 — repeat 2/3]
+[START] task=missing_data env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '19', 'income': '2401', 'state_of_residence': 'Uttar Pradesh', 'number_of_children': '2'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=3 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=4 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=5 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=7 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=8 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=9 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=10 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=11 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=12 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=13 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=14 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=15 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=16 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=17 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=18 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=19 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=20 action=check_eligibility('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[END] success=false steps=20 score=0.000 rewards=0.00,0.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 2 — repeat 3/3]
+[START] task=missing_data env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '34', 'income': '4943', 'state_of_residence': 'Uttar Pradesh', 'marital_status': 'unmarried'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=3 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 03: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=5 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=7 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=8 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=9 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=10 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=11 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 11: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=12 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=13 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=14 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=15 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=16 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=17 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=18 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=19 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=20 action=apply_eligibility_rules('MGNREGS') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[END] success=false steps=20 score=0.000 rewards=0.00,0.00,-0.10,-0.10
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 3 — repeat 1/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '24', 'number_of_children': '0', 'state_of_residence': 'Gujarat'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 10555.
+[STEP] step=4 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=5 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=approve_scheme('PMKVY') reward=-4.00 done=true error=null
+ Step 06: approve_scheme('PMKVY') -> reward=-4.0, done=True
+ CLEAR MISS: Income 10555 exceeds all scheme thresholds (overage: Rs 556 above PMKVY limit). Use reject_applicant when no
+[END] success=false steps=6 score=0.000 rewards=0.00,0.00,0.00,-4.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 3 — repeat 2/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '29', 'state_of_residence': 'Uttar Pradesh'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 11035.
+[STEP] step=4 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=5 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=check_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=7 action=approve_scheme('PMKVY') reward=-4.00 done=true error=null
+ Step 07: approve_scheme('PMKVY') -> reward=-4.0, done=True
+ CLEAR MISS: Income 11035 exceeds all scheme thresholds (overage: Rs 1036 above PMKVY limit). Use reject_applicant when n
+[END] success=false steps=7 score=0.000 rewards=0.00,0.00,0.00,-4.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 3 — repeat 3/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '28', 'number_of_children': '1', 'marital_status': 'unmarried', 'bank_name': 'Bank of Baroda'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card received and verified. has_aadhaar confirmed as True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=3 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 03: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=4 action=ask_question('income') reward=0.00 done=false error=null
+ Step 04: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 11082.
+[STEP] step=5 action=approve_scheme('PMKVY') reward=-4.00 done=true error=null
+ Step 05: approve_scheme('PMKVY') -> reward=-4.0, done=True
+ CLEAR MISS: Income 11082 exceeds all scheme thresholds (overage: Rs 1083 above PMKVY limit). Use reject_applicant when n
+[END] success=false steps=5 score=0.000 rewards=0.00,0.00,0.00,0.00,-4.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 4 — repeat 1/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '35', 'income': '10690', 'occupation': 'student', 'has_aadhaar': 'True', 'state_of_residence': 'Rajasthan'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of State Bank of India for the past 6 yea
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 2/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '44', 'income': '17519', 'occupation': 'student', 'has_aadhaar': 'True', 'bank_name': 'SBI'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 01: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=2 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 02: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=3 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 03: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 04: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=5 action=check_scheme_eligibility('PMKVY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=check_scheme_eligibility('PMAY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=7 action=check_scheme_eligibility('PMAY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=8 action=check_scheme_eligibility('PMAY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=9 action=check_scheme_eligibility('PMAY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=10 action=check_scheme_eligibility('PMAY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=11 action=check_scheme_eligibility('PMAY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=12 action=check_scheme_eligibility('PMAY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=13 action=check_scheme_eligibility('PMAY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=14 action=check_scheme_eligibility('PMAY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=15 action=check_scheme_eligibility('PMAY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=16 action=check_scheme_eligibility('PMAY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=17 action=check_scheme_eligibility('PMAY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=18 action=check_scheme_eligibility('PMAY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=19 action=check_scheme_eligibility('PMAY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=20 action=check_scheme_eligibility('PMAY') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[END] success=false steps=20 score=0.000 rewards=-0.10,-0.10,-0.10,0.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 4 — repeat 3/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '26', 'income': '15496', 'occupation': 'student', 'has_aadhaar': 'True', 'state_of_residence': 'Uttar Pradesh', 'bank_name': 'Bank of Baroda'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 01: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=2 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 02: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=3 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 03: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 04: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=5 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=6 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=7 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 07: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=8 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 08: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=9 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 09: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=10 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 10: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=11 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=12 action=approve_scheme('PMKVY') reward=-1.50 done=false error=null
+ Step 12: approve_scheme('PMKVY') -> reward=-1.5, done=False
+ PROTOCOL VIOLATION: Do not approve this case before verifying employment records. Request the PAN card first to check fo
+[STEP] step=13 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 13: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=14 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=15 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=16 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 16: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=17 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 17: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=18 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 18: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=19 action=ask_question('state_of_residence') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=20 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 20: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[END] success=false steps=20 score=0.000 rewards=-0.10,-0.10,-0.10,0.00,-0.10,-0.10,-0.10,-0.10,-1.50,0.00,-0.10,-0.10,-0.10,-0.10
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 5 — repeat 1/3]
+[START] task=document_conflict env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '34', 'income': '7623', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '1', 'bank_name': 'UCO Bank'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 34 in the prof
+[STEP] step=2 action=update_profile("{'age': '36', 'income': '7623', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '1', 'bank_name': 'UCO Bank'}") reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=3 action=update_profile("{'age': '36', 'income': '7623', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '1', 'bank_name': 'UCO Bank', 'official_age': '36'}") reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=4 action=update_profile("{'age': '36', 'income': '7623', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '1', 'bank_name': 'UCO Bank', 'official_age': '36', 'aadhaar_age': '36'}") reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=5 action=update_profile("{'age': '36', 'income': '7623', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '1', 'bank_name': 'UCO Bank', 'official_age': '36', 'aadhaar_age': '36', 'updated_age': '36'}") reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=update_profile("{'age': '36', 'income': '7623', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '1', 'bank_name': 'UCO Bank', 'official_age': '36', 'aadhaar_age': '36', 'updated_age': '36', 'updated_age_source': 'AADHAIR'}") reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=7 action=update_profile("{'age': '36', 'income': '7623', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '1', 'bank_name': 'UCO Bank', 'official_age': '36', 'aadhaar_age': '36'}") reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=8 action=update_profile("{'age': '36', 'income': '7623', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '1', 'bank_name': 'UCO Bank', 'official_age': '36', 'aadhaar_age': '36'}") reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=9 action=update_profile("{'age': '36', 'income': '7623', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '1', 'bank_name': 'UCO Bank', 'official_age': '36', 'aadhaar_age': '36'}") reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=10 action=update_profile("{'age': '36', 'income': '7623', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '1', 'bank_name': 'UCO Bank', 'official_age': '36', 'aadhaar_age': '36'}") reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=11 action=update_profile("{'age': '36', 'income': '7623', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '1', 'bank_name': 'UCO Bank', 'official_age': '36', 'aadhaar_age': '36'}") reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=12 action=update_profile("{'age': '36', 'income': '7623', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '1', 'bank_name': 'UCO Bank', 'official_age': '36', 'aadhaar_age': '36', 'updated_age': '36'}") reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=13 action=request_document('aadhaar_age') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=14 action=update_profile("{'age': '36', 'income': '7623', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'number_of_children': '1', 'bank_name': 'UCO Bank', 'aadhaar_age': '36'}") reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=15 action=request_document('aadhaar_age') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=16 action=request_document('aadhaar_age') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=17 action=request_document('self_reported_age') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=18 action=request_document('aadhaar_age') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=19 action=request_document('aadhaar_age') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=20 action=request_document('self_reported_age') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[END] success=false steps=20 score=0.000 rewards=0.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 5 — repeat 2/3]
+[START] task=document_conflict env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '35', 'income': '6215', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'number_of_children': '4'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 35 in the prof
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 35 in the prof
+[STEP] step=3 action=compare_age('official_age') reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=4 action=update_profile("{'age': '36', 'income': '6215', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'number_of_children': '4'}") reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=5 action=update_profile({'age': '36', 'income': '6215', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=6 action=update_profile({'age': '36', 'income': '6215', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=7 action=update_profile({'age': '36', 'income': '6215', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=8 action=update_profile({'age': '36', 'income': '6215', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=9 action=update_profile({'age': '36', 'income': '6215', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=10 action=update_profile({'age': '36', 'income': '6215', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=11 action=update_profile({'age': '36', 'income': '6215', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=12 action=update_profile({'age': '36', 'income': '6215', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=13 action=request_document('age_verification_report') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=14 action=update_profile({'age': '36', 'income': '6215', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=15 action=request_document('age_verification_report') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=16 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 16: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 35 in the prof
+[STEP] step=17 action=update_profile({'age': '36', 'income': '6215', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'number_of_children': '4', 'official_age': '36'}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=18 action=update_profile({'age': 36, 'income': 6215, 'occupation': 'mason', 'has_aadhaar': True, 'self_reported_age': 35, 'number_of_children': 4, 'official_age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=19 action=update_profile({'age': 36, 'income': 6215, 'occupation': 'mason', 'has_aadhaar': True, 'self_reported_age': 35, 'number_of_children': 4, 'official_age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[STEP] step=20 action=update_profile({'age': 36, 'income': 6215, 'occupation': 'mason', 'has_aadhaar': True, 'self_reported_age': 35, 'number_of_children': 4, 'official_age': 36}) reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+[END] success=false steps=20 score=0.000 rewards=0.00,0.00,0.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 5 — repeat 3/3]
+[START] task=document_conflict env=scheme_env model=nvidia/llama-3.1-nemotron-nano-8b-v1
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '35', 'income': '6090', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'state_of_residence': 'Gujarat', 'marital_status': 'married', 'bank_name': 'Canara Bank'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 35 in the prof
+[STEP] step=2 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 02: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 35 in the prof
+[STEP] step=3 action=update_profile("{'age': '36', 'income': '6090', 'occupation': 'mason', 'has_aadhaar': 'True', 'state_of_residence': 'Gujarat', 'marital_status': 'married', 'bank_name': 'Canara Bank', 'official_age': '36', 'self_reported_age': '35'}") reward=0.00 done=false error=HTTP Error 422: Unprocessable Content
+ [ERROR] agent decision failed: JSON_PARSE_ERROR
+ raw={"action_type": "update_profile", "value": {'age': 36, 'income': 6090, 'occupation': 'mason', 'has_aadhaar': true, 'self_reported_age': 35, 'state_of_residence': 'Gujarat', 'marital_status': 'married'
+[STEP] step=4 action=agent_error reward=0.00 done=true error=JSON_PARSE_ERROR
+[END] success=false steps=4 score=0.000 rewards=0.00,0.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+============================================================
+ FINAL GRADER SCORES (mean ± std over 3 repeats)
+============================================================
+ Task 1 (Scheme Discovery ): 0.283 ± 0.491 / 1.0
+ Task 2 (Missing Data ): 0.303 ± 0.525 / 1.0
+ Task 3 (Boundary Fraud ): 0.000 ± 0.000 / 1.0
+ Task 4 (Escalation Dilemma ): 0.333 ± 0.577 / 1.0
+ Task 5 (Document Conflict ): 0.000 ± 0.000 / 1.0
+ Average : 0.184 / 1.0
+============================================================
+SCORE_JSON {"task": 1, "score": 0.2833}
+STD_JSON {"task": 1, "std": 0.4907}
+SCORE_JSON {"task": 2, "score": 0.3033}
+STD_JSON {"task": 2, "std": 0.5254}
+SCORE_JSON {"task": 3, "score": 0.0}
+STD_JSON {"task": 3, "std": 0.0}
+SCORE_JSON {"task": 4, "score": 0.3333}
+STD_JSON {"task": 4, "std": 0.5774}
+SCORE_JSON {"task": 5, "score": 0.0}
+STD_JSON {"task": 5, "std": 0.0}
diff --git a/reports/inference_logs/inference_nemotron_super49b.txt b/reports/inference_logs/inference_nemotron_super49b.txt
new file mode 100644
index 0000000..cb2eee1
--- /dev/null
+++ b/reports/inference_logs/inference_nemotron_super49b.txt
@@ -0,0 +1,376 @@
+
+============================================================
+ SCHEME ENV — OPTION A EVALUATION
+ Model : nvidia/llama-3.3-nemotron-super-49b-v1
+ Env : http://localhost:7860
+ Repeats : 3 per task
+============================================================
+
+ [Task 1 — repeat 1/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '27', 'income': '6336', 'bank_name': 'UCO Bank'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 03: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=4 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 04: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 05: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=6 action=approve_scheme('PMKVY') reward=10.00 done=true error=null
+ Step 06: approve_scheme('PMKVY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMKVY — optimal scheme.
+[END] success=false steps=6 score=0.950 rewards=0.00,0.00,0.00,-0.10,0.00,10.00
+
+ GRADER SCORE: 0.950 / 1.0
+
+ [Task 1 — repeat 2/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '20', 'income': '1288', 'number_of_children': '1', 'bank_name': 'UCO Bank', 'state_of_residence': 'Gujarat'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 03: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=4 action=approve_scheme('PMKVY') reward=10.00 done=true error=null
+ Step 04: approve_scheme('PMKVY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMKVY — optimal scheme.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 1 — repeat 3/3]
+[START] task=scheme_discovery env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '27', 'income': '4908', 'state_of_residence': 'Uttar Pradesh', 'bank_name': 'UCO Bank'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 03: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=4 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 04: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 05: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=6 action=approve_scheme('PMKVY') reward=3.00 done=true error=null
+ Step 06: approve_scheme('PMKVY') -> reward=3.0, done=True
+ SUBOPTIMAL: PMKVY is eligible but PMAY provides greater benefit.
+[END] success=false steps=6 score=0.450 rewards=0.00,0.00,0.00,-0.10,0.00,3.00
+
+ GRADER SCORE: 0.450 / 1.0
+
+ [Task 2 — repeat 1/3]
+[START] task=missing_data env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '47', 'income': '1594', 'bank_name': 'Bank of Baroda', 'state_of_residence': 'Uttar Pradesh'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 03: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=4 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 04: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=false steps=4 score=0.960 rewards=0.00,0.00,0.00,10.00
+
+ GRADER SCORE: 0.960 / 1.0
+
+ [Task 2 — repeat 2/3]
+[START] task=missing_data env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '57', 'income': '4061', 'bank_name': 'UCO Bank', 'marital_status': 'unmarried', 'state_of_residence': 'Gujarat'}
+ Missing : ['has_aadhaar', 'occupation']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 01: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=3 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 03: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=true steps=3 score=1.000 rewards=0.00,0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 2 — repeat 3/3]
+[START] task=missing_data env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '22', 'income': '4771', 'bank_name': 'UCO Bank', 'marital_status': 'divorced'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 03: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=4 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 04: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=false steps=4 score=0.960 rewards=0.00,0.00,0.00,10.00
+
+ GRADER SCORE: 0.960 / 1.0
+
+ [Task 3 — repeat 1/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '31', 'bank_name': 'UCO Bank'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 11217.
+[STEP] step=4 action=reject_applicant('INCOME_TOO_HIGH') reward=5.00 done=true error=null
+ Step 04: reject_applicant('INCOME_TOO_HIGH') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 11217 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 3 — repeat 2/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '25', 'bank_name': 'PNB', 'number_of_children': '0', 'state_of_residence': 'Rajasthan'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 11092.
+[STEP] step=4 action=reject_applicant('INCOME_TOO_HIGH') reward=5.00 done=true error=null
+ Step 04: reject_applicant('INCOME_TOO_HIGH') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 11092 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 3 — repeat 3/3]
+[START] task=boundary_fraud env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '25', 'state_of_residence': 'Uttar Pradesh'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 11517.
+[STEP] step=4 action=reject_applicant('INCOME_TOO_HIGH') reward=5.00 done=true error=null
+ Step 04: reject_applicant('INCOME_TOO_HIGH') -> reward=5.0, done=True
+ CORRECT REJECTION: Income 11517 exceeds all scheme thresholds. No eligible scheme found.
+[END] success=true steps=4 score=1.000 rewards=0.00,0.00,0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 1/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '45', 'income': '19939', 'occupation': 'student', 'has_aadhaar': 'True', 'number_of_children': '2', 'state_of_residence': 'Rajasthan', 'bank_name': 'Bank of Baroda'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of BHEL for the past 6 years with active
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 2/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '32', 'income': '19339', 'occupation': 'student', 'has_aadhaar': 'True', 'marital_status': 'unmarried'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of BSNL for the past 6 years with active
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 4 — repeat 3/3]
+[START] task=escalation_dilemma env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '23', 'income': '15394', 'occupation': 'student', 'has_aadhaar': 'True', 'bank_name': 'SBI', 'state_of_residence': 'Rajasthan', 'number_of_children': '3'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 01: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of BSNL for the past 6 years with active
+[STEP] step=2 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 02: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=true steps=2 score=1.000 rewards=0.00,10.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 1/3]
+[START] task=document_conflict env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '35', 'income': '6016', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'bank_name': 'PNB', 'number_of_children': '2', 'state_of_residence': 'Rajasthan'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 37 years. Note: this differs from the self-reported age of 35 in the prof
+[STEP] step=2 action=reject_applicant('AGE_EXCEEDED') reward=5.00 done=true error=null
+ Step 02: reject_applicant('AGE_EXCEEDED') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=37, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=2 score=1.000 rewards=0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 2/3]
+[START] task=document_conflict env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '34', 'income': '6257', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '34', 'bank_name': 'SBI', 'number_of_children': '0', 'state_of_residence': 'Maharashtra'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 34 in the prof
+[STEP] step=2 action=reject_applicant('AGE_EXCEEDED') reward=5.00 done=true error=null
+ Step 02: reject_applicant('AGE_EXCEEDED') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=36, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=2 score=1.000 rewards=0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 3/3]
+[START] task=document_conflict env=scheme_env model=nvidia/llama-3.3-nemotron-super-49b-v1
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '35', 'income': '8148', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'state_of_residence': 'Maharashtra', 'marital_status': 'widowed'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 37 years. Note: this differs from the self-reported age of 35 in the prof
+[STEP] step=2 action=reject_applicant('AGE_EXCEEDED') reward=5.00 done=true error=null
+ Step 02: reject_applicant('AGE_EXCEEDED') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=37, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=2 score=1.000 rewards=0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+============================================================
+ FINAL GRADER SCORES (mean ± std over 3 repeats)
+============================================================
+ Task 1 (Scheme Discovery ): 0.800 ± 0.304 / 1.0
+ Task 2 (Missing Data ): 0.973 ± 0.023 / 1.0
+ Task 3 (Boundary Fraud ): 1.000 ± 0.000 / 1.0
+ Task 4 (Escalation Dilemma ): 1.000 ± 0.000 / 1.0
+ Task 5 (Document Conflict ): 1.000 ± 0.000 / 1.0
+ Average : 0.955 / 1.0
+============================================================
+SCORE_JSON {"task": 1, "score": 0.8}
+STD_JSON {"task": 1, "std": 0.3041}
+SCORE_JSON {"task": 2, "score": 0.9733}
+STD_JSON {"task": 2, "std": 0.0231}
+SCORE_JSON {"task": 3, "score": 1.0}
+STD_JSON {"task": 3, "std": 0.0}
+SCORE_JSON {"task": 4, "score": 1.0}
+STD_JSON {"task": 4, "std": 0.0}
+SCORE_JSON {"task": 5, "score": 1.0}
+STD_JSON {"task": 5, "std": 0.0}
diff --git a/reports/inference_logs/inference_nvidia_8b.txt b/reports/inference_logs/inference_nvidia_8b.txt
new file mode 100644
index 0000000..1e817b0
--- /dev/null
+++ b/reports/inference_logs/inference_nvidia_8b.txt
@@ -0,0 +1,672 @@
+
+============================================================
+ SCHEME ENV — OPTION A EVALUATION
+ Model : meta/llama-3.1-8b-instruct
+ Env : http://localhost:7860
+ Repeats : 3 per task
+============================================================
+
+ [Task 1 — repeat 1/3]
+[START] task=scheme_discovery env=scheme_env model=meta/llama-3.1-8b-instruct
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '32', 'income': '8687', 'bank_name': 'Bank of Baroda', 'marital_status': 'unmarried'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 03: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 04: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 05: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=6 action=approve_scheme('PMKVY') reward=10.00 done=true error=null
+ Step 06: approve_scheme('PMKVY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMKVY — optimal scheme.
+[END] success=false steps=6 score=0.900 rewards=0.00,0.00,-0.10,-0.10,0.00,10.00
+
+ GRADER SCORE: 0.900 / 1.0
+
+ [Task 1 — repeat 2/3]
+[START] task=scheme_discovery env=scheme_env model=meta/llama-3.1-8b-instruct
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '18', 'income': '1931', 'bank_name': 'PNB', 'state_of_residence': 'Maharashtra'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 03: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 04: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 05: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=6 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 06: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=7 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 07: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=8 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 08: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=9 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 09: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=10 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 10: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=11 action=ask_question('is_occupation_eligible_for_any_scheme') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=12 action=ask_question('is_occupation_eligible_for_any_scheme') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=13 action=ask_question('is_income_eligible_for_any_scheme') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=14 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 14: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=15 action=ask_question('is_occupation_eligible_for_any_scheme') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=16 action=approve_scheme('PMKVY') reward=10.00 done=true error=null
+ Step 16: approve_scheme('PMKVY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMKVY — optimal scheme.
+[END] success=false steps=16 score=0.700 rewards=0.00,0.00,-0.10,-0.10,0.00,0.00,-0.10,-0.10,-0.10,-0.10,0.00,10.00
+
+ GRADER SCORE: 0.700 / 1.0
+
+ [Task 1 — repeat 3/3]
+[START] task=scheme_discovery env=scheme_env model=meta/llama-3.1-8b-instruct
+
+============================================================
+ TASK 1/5 — SCHEME_DISCOVERY
+============================================================
+ Profile : {'age': '19', 'income': '7160', 'marital_status': 'unmarried', 'bank_name': 'Canara Bank'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 1/5 - SCHEME DISCOVERY - Easy] Profile is INCOMPLETE. Collect all required eligibility fields first, then identify and approve the OPT
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = mason.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 03: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 04: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 05: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=6 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 06: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=7 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 07: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=8 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 08: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=9 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 09: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=10 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 10: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=11 action=ask_question('marital_status') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=12 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 12: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=13 action=approve_scheme('PMKVY') reward=10.00 done=true error=null
+ Step 13: approve_scheme('PMKVY') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in PMKVY — optimal scheme.
+[END] success=false steps=13 score=0.700 rewards=0.00,0.00,-0.10,-0.10,0.00,0.00,-0.10,-0.10,-0.10,-0.10,0.00,10.00
+
+ GRADER SCORE: 0.700 / 1.0
+
+ [Task 2 — repeat 1/3]
+[START] task=missing_data env=scheme_env model=meta/llama-3.1-8b-instruct
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '28', 'income': '4776', 'number_of_children': '2'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 03: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 04: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 05: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=6 action=ask_question('number_of_children') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=7 action=ask_question('document_type') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=8 action=ask_question('document_type') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=9 action=ask_question('has_pan_card') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=10 action=ask_question('has_pan_card') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=11 action=ask_question('has_pan_card') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=12 action=ask_question('has_pan_card') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=13 action=ask_question('has_pan_card') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=14 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 14: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=15 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 15: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=16 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 16: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=17 action=ask_question('number_of_days_worked_last_year') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=18 action=ask_question('has_pan_card') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=19 action=ask_question('number_of_days_worked_last_year') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=20 action=ask_question('has_pan_card') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[END] success=false steps=20 score=0.000 rewards=0.00,0.00,-0.10,-0.10,0.00,0.00,-0.10,-0.10
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 2 — repeat 2/3]
+[START] task=missing_data env=scheme_env model=meta/llama-3.1-8b-instruct
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '22', 'income': '3416', 'number_of_children': '1', 'bank_name': 'Canara Bank'}
+ Missing : ['occupation', 'has_aadhaar']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 03: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 04: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 05: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=6 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 06: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=7 action=ask_question('number_of_children') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=8 action=ask_question('has_family_member_with_disability') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=9 action=ask_question('has_family_member_with_disability') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=10 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 10: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=11 action=approve_scheme('MGNREGS') reward=10.00 done=true error=null
+ Step 11: approve_scheme('MGNREGS') -> reward=10.0, done=True
+ CORRECT: Applicant enrolled in MGNREGS — optimal scheme.
+[END] success=false steps=11 score=0.650 rewards=0.00,0.00,-0.10,-0.10,0.00,0.00,-0.10,10.00
+
+ GRADER SCORE: 0.650 / 1.0
+
+ [Task 2 — repeat 3/3]
+[START] task=missing_data env=scheme_env model=meta/llama-3.1-8b-instruct
+
+============================================================
+ TASK 2/5 — MISSING_DATA
+============================================================
+ Profile : {'age': '29', 'income': '1180', 'number_of_children': '4'}
+ Missing : ['has_aadhaar', 'occupation']
+ Notif : [TASK 2/5 - MISSING DATA - Medium] Profile is INCOMPLETE. You MUST collect ALL required eligibility fields before making any terminal decisi
+[STEP] step=1 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 01: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = farm_labourer.
+[STEP] step=3 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 03: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 04: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=ask_question('number_of_children') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=6 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 06: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=7 action=ask_question('has_family_member_in_military') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=8 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 08: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=9 action=ask_question('has_family_member_in_military') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=10 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 10: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=11 action=ask_question('has_family_member_in_military') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=12 action=ask_question('has_family_member_in_military') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=13 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 13: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=14 action=ask_question('has_family_member_in_military') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=15 action=ask_question('has_family_member_in_military') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=16 action=ask_question('has_family_member_in_military') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=17 action=ask_question('has_family_member_in_military') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=18 action=ask_question('has_family_member_in_military') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=19 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 19: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=20 action=ask_question('has_family_member_in_military') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[END] success=false steps=20 score=0.000 rewards=0.00,0.00,-0.10,-0.10,0.00,-0.10,0.00,-0.10,0.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 3 — repeat 1/3]
+[START] task=boundary_fraud env=scheme_env model=meta/llama-3.1-8b-instruct
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '34', 'marital_status': 'divorced'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 10572.
+[STEP] step=4 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 04: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 05: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=6 action=ask_question('marital_status') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=7 action=ask_question('is_student') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=8 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 08: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=9 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 09: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=10 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 10: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=11 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 11: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=12 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 12: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=13 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 13: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=14 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 14: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=15 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 15: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=16 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 16: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=17 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 17: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=18 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 18: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=19 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 19: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=20 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 20: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[END] success=false steps=20 score=0.000 rewards=0.00,0.00,0.00,-0.10,0.00,-0.10,-0.10,-0.10,0.00,-0.10,-0.10,-0.10,-0.10,-0.10,0.00,0.00,-0.10,-0.10
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 3 — repeat 2/3]
+[START] task=boundary_fraud env=scheme_env model=meta/llama-3.1-8b-instruct
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '23', 'number_of_children': '0'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('income') reward=0.00 done=false error=null
+ Step 01: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 11179.
+[STEP] step=2 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 02: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=3 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 03: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=4 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 04: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=5 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 05: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=6 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 06: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=7 action=ask_question('number_of_children') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=8 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 08: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=9 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 09: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=10 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 10: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=11 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 11: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=12 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 12: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=13 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 13: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=14 action=ask_question('number_of_children') reward=0.00 done=false error=HTTP Error 500: Internal Server Error
+[STEP] step=15 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 15: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=16 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 16: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=17 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 17: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=18 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 18: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=19 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 19: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=20 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 20: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[END] success=false steps=20 score=0.000 rewards=0.00,0.00,0.00,0.00,0.00,-0.10,-0.10,-0.10,-0.10,0.00,0.00,-0.10,-0.10,-0.10,-0.10,0.00,0.00,-0.10
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 3 — repeat 3/3]
+[START] task=boundary_fraud env=scheme_env model=meta/llama-3.1-8b-instruct
+
+============================================================
+ TASK 3/5 — BOUNDARY_FRAUD
+============================================================
+ Profile : {'age': '23', 'marital_status': 'divorced', 'state_of_residence': 'Maharashtra'}
+ Missing : ['occupation', 'has_aadhaar', 'income']
+ Notif : [TASK 3/5 - BOUNDARY FRAUD DETECTION - Hard] Profile is INCOMPLETE. Collect ALL required eligibility fields including income, then apply sch
+[STEP] step=1 action=ask_question('occupation') reward=0.00 done=false error=null
+ Step 01: ask_question('occupation') -> reward=0.0, done=False
+ Applicant confirmed: occupation = carpenter.
+[STEP] step=2 action=ask_question('has_aadhaar') reward=0.00 done=false error=null
+ Step 02: ask_question('has_aadhaar') -> reward=0.0, done=False
+ Applicant confirmed: has_aadhaar = True.
+[STEP] step=3 action=ask_question('income') reward=0.00 done=false error=null
+ Step 03: ask_question('income') -> reward=0.0, done=False
+ Applicant confirmed: income = 11989.
+[STEP] step=4 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 04: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 05: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=6 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 06: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=7 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 07: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=8 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 08: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=9 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 09: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=10 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 10: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=11 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 11: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=12 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 12: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=13 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 13: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=14 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 14: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=15 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 15: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=16 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 16: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=17 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 17: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=18 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 18: request_document('pan_card') -> reward=0.0, done=False
+ Document 'pan_card' received and verified.
+[STEP] step=19 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 19: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=20 action=ask_question('has_aadhaar') reward=-2.00 done=true error=null
+ Step 20: ask_question('has_aadhaar') -> reward=-2.0, done=True
+ TIMEOUT: 20 steps reached without a decision.
+[END] success=false steps=20 score=0.000 rewards=0.00,0.00,0.00,-0.10,0.00,0.00,-0.10,-0.10,-0.10,-0.10,0.00,0.00,-0.10,-0.10,-0.10,-0.10,0.00,0.00,-0.10,-2.00
+
+ GRADER SCORE: 0.000 / 1.0
+
+ [Task 4 — repeat 1/3]
+[START] task=escalation_dilemma env=scheme_env model=meta/llama-3.1-8b-instruct
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '41', 'income': '18035', 'occupation': 'student', 'has_aadhaar': 'True', 'number_of_children': '3', 'bank_name': 'Bank of Baroda', 'state_of_residence': 'Bihar'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 01: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=2 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 02: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=3 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 03: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 04: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=5 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 05: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=6 action=reject_applicant('AGE_EXCEEDED') reward=-1.00 done=false error=null
+ Step 06: reject_applicant('AGE_EXCEEDED') -> reward=-1.0, done=False
+ PREMATURE ADJUDICATION: Review documentary evidence before rejecting. Request the PAN card to verify employment records,
+[STEP] step=7 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 07: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of Indian Railways for the past 6 years w
+[STEP] step=8 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 08: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=false steps=8 score=0.850 rewards=-0.10,-0.10,-0.10,0.00,-0.10,-1.00,0.00,10.00
+
+ GRADER SCORE: 0.850 / 1.0
+
+ [Task 4 — repeat 2/3]
+[START] task=escalation_dilemma env=scheme_env model=meta/llama-3.1-8b-instruct
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '30', 'income': '13238', 'occupation': 'student', 'has_aadhaar': 'True', 'state_of_residence': 'Uttar Pradesh', 'number_of_children': '2'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 01: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=2 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 02: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=3 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 03: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=4 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 04: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=5 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 05: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of BHEL for the past 6 years with active
+[STEP] step=6 action=escalate('DATA_MISMATCH') reward=10.00 done=true error=null
+ Step 06: escalate('DATA_MISMATCH') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=false steps=6 score=0.900 rewards=-0.10,-0.10,-0.10,0.00,0.00,10.00
+
+ GRADER SCORE: 0.900 / 1.0
+
+ [Task 4 — repeat 3/3]
+[START] task=escalation_dilemma env=scheme_env model=meta/llama-3.1-8b-instruct
+
+============================================================
+ TASK 4/5 — ESCALATION_DILEMMA
+============================================================
+ Profile : {'age': '43', 'income': '17180', 'occupation': 'student', 'has_aadhaar': 'True', 'number_of_children': '1', 'marital_status': 'unmarried'}
+ Missing : []
+ Notif : [TASK 4/5 - ESCALATION DILEMMA - Expert] Profile is COMPLETE. Review all applicant data carefully before making any terminal decision. If oc
+[STEP] step=1 action=ask_question('income') reward=-0.10 done=false error=null
+ Step 01: ask_question('income') -> reward=-0.1, done=False
+ 'income' is already in the profile. Do not repeat questions.
+[STEP] step=2 action=ask_question('occupation') reward=-0.10 done=false error=null
+ Step 02: ask_question('occupation') -> reward=-0.1, done=False
+ 'occupation' is already in the profile. Do not repeat questions.
+[STEP] step=3 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 03: request_document('aadhaar_card') -> reward=0.0, done=False
+ Document 'aadhaar_card' received and verified.
+[STEP] step=4 action=ask_question('age') reward=-0.10 done=false error=null
+ Step 04: ask_question('age') -> reward=-0.1, done=False
+ 'age' is already in the profile. Do not repeat questions.
+[STEP] step=5 action=ask_question('has_aadhaar') reward=-0.10 done=false error=null
+ Step 05: ask_question('has_aadhaar') -> reward=-0.1, done=False
+ 'has_aadhaar' is already in the profile. Do not repeat questions.
+[STEP] step=6 action=request_document('pan_card') reward=0.00 done=false error=null
+ Step 06: request_document('pan_card') -> reward=0.0, done=False
+ PAN card retrieved. Records show this applicant has been a registered employee of Indian Railways for the past 6 years w
+[STEP] step=7 action=escalate('MANUAL_REVIEW_REQUIRED') reward=10.00 done=true error=null
+ Step 07: escalate('MANUAL_REVIEW_REQUIRED') -> reward=10.0, done=True
+ CORRECT ESCALATION: Contradictory data detected and properly handed off to a senior officer for manual verification. Thi
+[END] success=false steps=7 score=0.850 rewards=-0.10,-0.10,0.00,-0.10,-0.10,0.00,10.00
+
+ GRADER SCORE: 0.850 / 1.0
+
+ [Task 5 — repeat 1/3]
+[START] task=document_conflict env=scheme_env model=meta/llama-3.1-8b-instruct
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '35', 'income': '6288', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'marital_status': 'unmarried', 'state_of_residence': 'Uttar Pradesh', 'number_of_children': '1'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 37 years. Note: this differs from the self-reported age of 35 in the prof
+[STEP] step=2 action=reject_applicant('AGE_EXCEEDED') reward=5.00 done=true error=null
+ Step 02: reject_applicant('AGE_EXCEEDED') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=37, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=2 score=1.000 rewards=0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 2/3]
+[START] task=document_conflict env=scheme_env model=meta/llama-3.1-8b-instruct
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '33', 'income': '7472', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '33', 'state_of_residence': 'Uttar Pradesh', 'bank_name': 'SBI'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 36 years. Note: this differs from the self-reported age of 33 in the prof
+[STEP] step=2 action=reject_applicant('AGE_EXCEEDED') reward=5.00 done=true error=null
+ Step 02: reject_applicant('AGE_EXCEEDED') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=36, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=2 score=1.000 rewards=0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+ [Task 5 — repeat 3/3]
+[START] task=document_conflict env=scheme_env model=meta/llama-3.1-8b-instruct
+
+============================================================
+ TASK 5/5 — DOCUMENT_CONFLICT
+============================================================
+ Profile : {'age': '35', 'income': '7914', 'occupation': 'mason', 'has_aadhaar': 'True', 'self_reported_age': '35', 'marital_status': 'unmarried'}
+ Missing : []
+ Notif : [TASK 5/5 - DOCUMENT CONFLICT - Expert+] Profile is COMPLETE. Apply all eligibility rules and verify official documents as required before m
+[STEP] step=1 action=request_document('aadhaar_card') reward=0.00 done=false error=null
+ Step 01: request_document('aadhaar_card') -> reward=0.0, done=False
+ Aadhaar card verified. Official age on record: 38 years. Note: this differs from the self-reported age of 35 in the prof
+[STEP] step=2 action=reject_applicant('AGE_EXCEEDED') reward=5.00 done=true error=null
+ Step 02: reject_applicant('AGE_EXCEEDED') -> reward=5.0, done=True
+ CORRECT REJECTION: Aadhaar confirms age=38, which exceeds the PMKVY maximum of 35. No other scheme criteria are satisfie
+[END] success=true steps=2 score=1.000 rewards=0.00,5.00
+
+ GRADER SCORE: 1.000 / 1.0
+
+============================================================
+ FINAL GRADER SCORES (mean ± std over 3 repeats)
+============================================================
+ Task 1 (Scheme Discovery ): 0.767 ± 0.116 / 1.0
+ Task 2 (Missing Data ): 0.217 ± 0.375 / 1.0
+ Task 3 (Boundary Fraud ): 0.000 ± 0.000 / 1.0
+ Task 4 (Escalation Dilemma ): 0.867 ± 0.029 / 1.0
+ Task 5 (Document Conflict ): 1.000 ± 0.000 / 1.0
+ Average : 0.570 / 1.0
+============================================================
+SCORE_JSON {"task": 1, "score": 0.7667}
+STD_JSON {"task": 1, "std": 0.1155}
+SCORE_JSON {"task": 2, "score": 0.2167}
+STD_JSON {"task": 2, "std": 0.3753}
+SCORE_JSON {"task": 3, "score": 0.0}
+STD_JSON {"task": 3, "std": 0.0}
+SCORE_JSON {"task": 4, "score": 0.8667}
+STD_JSON {"task": 4, "std": 0.0289}
+SCORE_JSON {"task": 5, "score": 1.0}
+STD_JSON {"task": 5, "std": 0.0}
diff --git a/reports/leaderboard.csv b/reports/leaderboard.csv
new file mode 100644
index 0000000..1fffcc0
--- /dev/null
+++ b/reports/leaderboard.csv
@@ -0,0 +1,9 @@
+Model,Size,Task1,Task2,Task3,Task4,Task5,Average
+mistralai/mistral-nemotron,~56B,0.833,1.0,1.0,1.0,1.0,0.967
+nvidia/llama-3.3-nemotron-super-49b-v1,49B,0.8,0.973,1.0,1.0,1.0,0.955
+nvidia/llama-3.1-nemotron-51b-instruct,51B,0.8,0.957,1.0,1.0,1.0,0.951
+nvidia/nemotron-3-nano-30b-a3b,30B,1.0,0.0,1.0,1.0,1.0,0.8
+nvidia/nemotron-3-super-120b-a12b,120B,1.0,0.0,1.0,1.0,1.0,0.8
+nvidia/nemotron-mini-4b-instruct,4B,0.483,0.667,0.667,0.967,0.0,0.557
+meta/llama-3.1-8b-instruct,8B,0.4,0.0,0.317,0.867,1.0,0.517
+nvidia/llama-3.1-nemotron-nano-8b-v1,8B,0.283,0.303,0.0,0.333,0.0,0.184
diff --git a/reports/results.json b/reports/results.json
new file mode 100644
index 0000000..fee5321
--- /dev/null
+++ b/reports/results.json
@@ -0,0 +1,218 @@
+[
+ {
+ "model": "nvidia/llama-3.1-nemotron-nano-8b-v1",
+ "size": "8B",
+ "average": 0.184,
+ "tasks": {
+ "task1": {
+ "score": 0.283,
+ "std": 0.491
+ },
+ "task2": {
+ "score": 0.303,
+ "std": 0.525
+ },
+ "task3": {
+ "score": 0.0,
+ "std": 0.0
+ },
+ "task4": {
+ "score": 0.333,
+ "std": 0.577
+ },
+ "task5": {
+ "score": 0.0,
+ "std": 0.0
+ }
+ }
+ },
+ {
+ "model": "meta/llama-3.1-8b-instruct",
+ "size": "8B",
+ "average": 0.517,
+ "tasks": {
+ "task1": {
+ "score": 0.4,
+ "std": 0.458
+ },
+ "task2": {
+ "score": 0.0,
+ "std": 0.0
+ },
+ "task3": {
+ "score": 0.317,
+ "std": 0.548
+ },
+ "task4": {
+ "score": 0.867,
+ "std": 0.058
+ },
+ "task5": {
+ "score": 1.0,
+ "std": 0.0
+ }
+ }
+ },
+ {
+ "model": "nvidia/nemotron-mini-4b-instruct",
+ "size": "4B",
+ "average": 0.557,
+ "tasks": {
+ "task1": {
+ "score": 0.483,
+ "std": 0.029
+ },
+ "task2": {
+ "score": 0.667,
+ "std": 0.577
+ },
+ "task3": {
+ "score": 0.667,
+ "std": 0.577
+ },
+ "task4": {
+ "score": 0.967,
+ "std": 0.029
+ },
+ "task5": {
+ "score": 0.0,
+ "std": 0.0
+ }
+ }
+ },
+ {
+ "model": "nvidia/nemotron-3-nano-30b-a3b",
+ "size": "30B",
+ "average": 0.8,
+ "tasks": {
+ "task1": {
+ "score": 1.0,
+ "std": 0.0
+ },
+ "task2": {
+ "score": 0.0,
+ "std": 0.0
+ },
+ "task3": {
+ "score": 1.0,
+ "std": 0.0
+ },
+ "task4": {
+ "score": 1.0,
+ "std": 0.0
+ },
+ "task5": {
+ "score": 1.0,
+ "std": 0.0
+ }
+ }
+ },
+ {
+ "model": "nvidia/nemotron-3-super-120b-a12b",
+ "size": "120B",
+ "average": 0.8,
+ "tasks": {
+ "task1": {
+ "score": 1.0,
+ "std": 0.0
+ },
+ "task2": {
+ "score": 0.0,
+ "std": 0.0
+ },
+ "task3": {
+ "score": 1.0,
+ "std": 0.0
+ },
+ "task4": {
+ "score": 1.0,
+ "std": 0.0
+ },
+ "task5": {
+ "score": 1.0,
+ "std": 0.0
+ }
+ }
+ },
+ {
+ "model": "nvidia/llama-3.1-nemotron-51b-instruct",
+ "size": "51B",
+ "average": 0.951,
+ "tasks": {
+ "task1": {
+ "score": 0.8,
+ "std": 0.304
+ },
+ "task2": {
+ "score": 0.957,
+ "std": 0.045
+ },
+ "task3": {
+ "score": 1.0,
+ "std": 0.0
+ },
+ "task4": {
+ "score": 1.0,
+ "std": 0.0
+ },
+ "task5": {
+ "score": 1.0,
+ "std": 0.0
+ }
+ }
+ },
+ {
+ "model": "nvidia/llama-3.3-nemotron-super-49b-v1",
+ "size": "49B",
+ "average": 0.955,
+ "tasks": {
+ "task1": {
+ "score": 0.8,
+ "std": 0.304
+ },
+ "task2": {
+ "score": 0.973,
+ "std": 0.023
+ },
+ "task3": {
+ "score": 1.0,
+ "std": 0.0
+ },
+ "task4": {
+ "score": 1.0,
+ "std": 0.0
+ },
+ "task5": {
+ "score": 1.0,
+ "std": 0.0
+ }
+ }
+ },
+ {
+ "model": "mistralai/mistral-nemotron",
+ "size": "~56B",
+ "average": 0.967,
+ "tasks": {
+ "task1": {
+ "score": 0.833,
+ "std": 0.289
+ },
+ "task2": {
+ "score": 1.0,
+ "std": 0.0
+ },
+ "task3": {
+ "score": 1.0,
+ "std": 0.0
+ },
+ "task4": {
+ "score": 1.0,
+ "std": 0.0
+ },
+ "task5": {
+ "score": 1.0,
+ "std": 0.0
+ }
+ }
+ }
+]
\ No newline at end of file
diff --git a/reports/summary.txt b/reports/summary.txt
new file mode 100644
index 0000000..92de904
--- /dev/null
+++ b/reports/summary.txt
@@ -0,0 +1,12 @@
+OpenEnv scheme_env Benchmark — Baseline Report Summary
+========================================================
+Date generated : 2026-04-08
+Models evaluated : 8
+
+Best model : mistral-nemotron (avg=0.967)
+Worst model : nemotron-nano-8b (avg=0.184)
+
+Hardest task : Task 2 (mean=0.487)
+Easiest task : Task 4 (mean=0.896)
+
+Perfect score (1.0 on all tasks): none
diff --git a/reports/task_heatmap.png b/reports/task_heatmap.png
new file mode 100644
index 0000000..fc05e91
Binary files /dev/null and b/reports/task_heatmap.png differ
diff --git a/reports/test_logs/pytest_results.txt b/reports/test_logs/pytest_results.txt
new file mode 100644
index 0000000..1b2adaa
--- /dev/null
+++ b/reports/test_logs/pytest_results.txt
@@ -0,0 +1,30 @@
+============================= test session starts ==============================
+platform darwin -- Python 3.14.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/advikdivekar/Desktop/OpenEnv/venv/bin/python3.14
+cachedir: .pytest_cache
+rootdir: /Users/advikdivekar/Desktop/OpenEnv
+configfile: pyproject.toml
+plugins: anyio-4.13.0
+collecting ... collected 20 items
+
+tests/test_scheme_eligibility.py::test_pmkvy_qualifies_age_lower_bound PASSED [ 5%]
+tests/test_scheme_eligibility.py::test_pmkvy_qualifies_age_upper_bound PASSED [ 10%]
+tests/test_scheme_eligibility.py::test_pmkvy_disqualifies_age_exceeded PASSED [ 15%]
+tests/test_scheme_eligibility.py::test_pmkvy_disqualifies_income_exceeded PASSED [ 20%]
+tests/test_scheme_eligibility.py::test_pmkvy_disqualifies_wrong_occupation PASSED [ 25%]
+tests/test_scheme_eligibility.py::test_mgnregs_qualifies_age_lower_bound PASSED [ 30%]
+tests/test_scheme_eligibility.py::test_mgnregs_qualifies_age_upper_bound PASSED [ 35%]
+tests/test_scheme_eligibility.py::test_mgnregs_disqualifies_age_exceeded PASSED [ 40%]
+tests/test_scheme_eligibility.py::test_mgnregs_disqualifies_no_aadhaar PASSED [ 45%]
+tests/test_scheme_eligibility.py::test_pmay_qualifies_age_lower_bound PASSED [ 50%]
+tests/test_scheme_eligibility.py::test_pmay_disqualifies_income_at_threshold PASSED [ 55%]
+tests/test_scheme_eligibility.py::test_pmay_qualifies_age_upper_bound PASSED [ 60%]
+tests/test_scheme_eligibility.py::test_pmay_disqualifies_age_exceeded PASSED [ 65%]
+tests/test_scheme_eligibility.py::test_optimal_prefers_pmay_over_pmkvy PASSED [ 70%]
+tests/test_scheme_eligibility.py::test_optimal_mgnregs_only PASSED [ 75%]
+tests/test_scheme_eligibility.py::test_optimal_none_when_no_scheme PASSED [ 80%]
+tests/test_scheme_eligibility.py::test_grader_score_perfect PASSED [ 85%]
+tests/test_scheme_eligibility.py::test_grader_score_noise_penalty PASSED [ 90%]
+tests/test_scheme_eligibility.py::test_grader_score_zero_base PASSED [ 95%]
+tests/test_scheme_eligibility.py::test_grader_score_floor_at_030 PASSED [100%]
+
+============================== 20 passed in 2.24s ==============================
diff --git a/reports/test_logs/smoke_test_results.txt b/reports/test_logs/smoke_test_results.txt
new file mode 100644
index 0000000..15da4e3
--- /dev/null
+++ b/reports/test_logs/smoke_test_results.txt
@@ -0,0 +1,61 @@
+
+============================================================
+SMOKE TEST — scheme_env
+============================================================
+
+Task 1 — Scheme Discovery
+ Profile at reset: {'age': '28', 'income': '4665', 'marital_status': 'widowed', 'number_of_children': '1'}
+ Missing: ['occupation', 'has_aadhaar']
+ [PASS] Task 1 terminates on approve — done=True
+ [PASS] Task 1 reward positive or suboptimal — reward=3.0
+ [PASS] Task 1 grader_score set — score=0.5
+ Notification: SUBOPTIMAL: PMKVY is eligible but PMAY provides greater benefit.
+
+Task 2 — Missing Data (premature approval)
+ Missing at reset: ['occupation', 'has_aadhaar']
+ [PASS] Task 2 premature approve terminates — done=True
+ [PASS] Task 2 premature approve scores 0.0 — score=0.0
+ [PASS] Task 2 premature approve reward negative — reward=-5.0
+
+Task 2 — Missing Data (correct flow)
+ [PASS] Task 2 correct flow terminates — done=True
+ [PASS] Task 2 correct flow reward positive — reward=10.0
+ [PASS] Task 2 correct flow score > 0 — score=1.0
+
+Task 3 — Boundary Fraud (reject before asking income)
+ Profile at reset: {'age': '24', 'number_of_children': '3', 'marital_status': 'widowed'}
+ Missing: ['occupation', 'has_aadhaar', 'income']
+ [PASS] Task 3 income hidden at reset — income in profile=['age', 'number_of_children', 'marital_status']
+ [PASS] Task 3 premature reject scores 0.0 — score=0.0
+ [PASS] Task 3 premature reject reward negative — reward=-2.0
+
+Task 3 — Boundary Fraud (correct flow)
+ [PASS] Task 3 correct reject terminates — done=True
+ [PASS] Task 3 correct reject reward positive — reward=5.0
+ [PASS] Task 3 correct reject score > 0.30 — score=1.0
+
+Task 4 — Escalation (escalate before PAN — should soft-block)
+ [PASS] Task 4 escalate before PAN is non-terminal — done=False
+ [PASS] Task 4 escalate before PAN reward negative — reward=-1.0
+
+Task 4 — Escalation (correct flow)
+ [PASS] Task 4 PAN reveals contradiction — notif=PAN card retrieved. Records show this applicant has been a registered employee o
+ [PASS] Task 4 correct escalate terminates — done=True
+ [PASS] Task 4 correct escalate reward=10 — reward=10.0
+ [PASS] Task 4 correct escalate score > 0.80 — score=1.0
+
+Task 5 — Document Conflict (correct flow)
+ Self-reported age in profile: 33
+ [PASS] Task 5 Aadhaar reveals age conflict — notif=Aadhaar card verified. Official age on record: 36 years. Note: this differs from
+ [PASS] Task 5 correct reject terminates — done=True
+ [PASS] Task 5 correct reject reward=5 — reward=5.0
+ [PASS] Task 5 correct reject score > 0.80 — score=1.0
+
+Wrong escalation on Task 1 (should now be terminal)
+ [PASS] Wrong escalation Task 1 is terminal — done=True
+ [PASS] Wrong escalation reward=-2.0 — reward=-2.0
+
+============================================================
+ALL TESTS PASSED — environment logic is correct
+============================================================
+
diff --git a/reports/test_logs/test_results.txt b/reports/test_logs/test_results.txt
new file mode 100644
index 0000000..b662b86
--- /dev/null
+++ b/reports/test_logs/test_results.txt
@@ -0,0 +1,30 @@
+============================= test session starts ==============================
+platform darwin -- Python 3.14.3, pytest-9.0.2, pluggy-1.6.0 -- /Users/advikdivekar/Desktop/OpenEnv/venv/bin/python3.14
+cachedir: .pytest_cache
+rootdir: /Users/advikdivekar/Desktop/OpenEnv
+configfile: pyproject.toml
+plugins: anyio-4.13.0
+collecting ... collected 20 items
+
+tests/test_scheme_eligibility.py::test_pmkvy_qualifies_age_lower_bound PASSED [ 5%]
+tests/test_scheme_eligibility.py::test_pmkvy_qualifies_age_upper_bound PASSED [ 10%]
+tests/test_scheme_eligibility.py::test_pmkvy_disqualifies_age_exceeded PASSED [ 15%]
+tests/test_scheme_eligibility.py::test_pmkvy_disqualifies_income_exceeded PASSED [ 20%]
+tests/test_scheme_eligibility.py::test_pmkvy_disqualifies_wrong_occupation PASSED [ 25%]
+tests/test_scheme_eligibility.py::test_mgnregs_qualifies_age_lower_bound PASSED [ 30%]
+tests/test_scheme_eligibility.py::test_mgnregs_qualifies_age_upper_bound PASSED [ 35%]
+tests/test_scheme_eligibility.py::test_mgnregs_disqualifies_age_exceeded PASSED [ 40%]
+tests/test_scheme_eligibility.py::test_mgnregs_disqualifies_no_aadhaar PASSED [ 45%]
+tests/test_scheme_eligibility.py::test_pmay_qualifies_age_lower_bound PASSED [ 50%]
+tests/test_scheme_eligibility.py::test_pmay_disqualifies_income_at_threshold PASSED [ 55%]
+tests/test_scheme_eligibility.py::test_pmay_qualifies_age_upper_bound PASSED [ 60%]
+tests/test_scheme_eligibility.py::test_pmay_disqualifies_age_exceeded PASSED [ 65%]
+tests/test_scheme_eligibility.py::test_optimal_prefers_pmay_over_pmkvy PASSED [ 70%]
+tests/test_scheme_eligibility.py::test_optimal_mgnregs_only PASSED [ 75%]
+tests/test_scheme_eligibility.py::test_optimal_none_when_no_scheme PASSED [ 80%]
+tests/test_scheme_eligibility.py::test_grader_score_perfect PASSED [ 85%]
+tests/test_scheme_eligibility.py::test_grader_score_noise_penalty PASSED [ 90%]
+tests/test_scheme_eligibility.py::test_grader_score_zero_base PASSED [ 95%]
+tests/test_scheme_eligibility.py::test_grader_score_floor_at_030 PASSED [100%]
+
+============================== 20 passed in 2.38s ==============================