|
| 1 | +# Single-Turn Evaluations (Netra SDK) |
| 2 | + |
| 3 | +Use this reference when the user asks to set up or automate single-turn evaluations (input -> output) with Netra. |
| 4 | + |
| 5 | +## Outcome |
| 6 | + |
| 7 | +Set up a repeatable evaluation loop where: |
| 8 | +- Test cases live in a Netra dataset. |
| 9 | +- A task function runs the user's app logic for each dataset item. |
| 10 | +- Netra executes a test run and scores outputs with evaluators. |
| 11 | + |
| 12 | +## Prerequisites |
| 13 | + |
| 14 | +1. Netra SDK installed (`netra-sdk` for Python, `netra-sdk` for TypeScript). |
| 15 | +2. Netra initialized with API key/header. |
| 16 | +3. At least one dataset configured in the dashboard, or created programmatically. |
| 17 | +4. Evaluators attached in the dashboard (recommended), or passed in code. |
| 18 | + |
| 19 | +## Recommended workflow for the agent |
| 20 | + |
| 21 | +1. Confirm language/runtime (Python or TypeScript). |
| 22 | +2. Ensure `Netra.init(...)` / `await Netra.init(...)` is called once at startup. |
| 23 | +3. Fetch dataset via SDK. |
| 24 | +4. Define `task(input)` that returns the system output string/value. |
| 25 | +5. Run test suite with a clear run name and safe concurrency. |
| 26 | +6. Return run id and quick status summary to the user. |
| 27 | +7. Direct the user to Evaluation -> Test Runs for detailed scores. |
| 28 | + |
| 29 | +## Python template |
| 30 | + |
| 31 | +```python |
| 32 | +from netra import Netra |
| 33 | + |
| 34 | +Netra.init( |
| 35 | + app_name="my-app", |
| 36 | + headers="x-api-key=YOUR_NETRA_API_KEY", |
| 37 | +) |
| 38 | + |
| 39 | +def my_task(input_data): |
| 40 | + # Call the user's app/agent here and return generated output |
| 41 | + return f"response for: {input_data}" |
| 42 | + |
| 43 | +dataset = Netra.evaluation.get_dataset(dataset_id="your-dataset-id") |
| 44 | + |
| 45 | +result = Netra.evaluation.run_test_suite( |
| 46 | + name="My Single-Turn Eval", |
| 47 | + data=dataset, |
| 48 | + task=my_task, |
| 49 | + evaluators=["correctness", "relevance"], # optional |
| 50 | + max_concurrency=5, |
| 51 | +) |
| 52 | + |
| 53 | +print(result["runId"]) |
| 54 | +``` |
| 55 | + |
| 56 | +## TypeScript template |
| 57 | + |
| 58 | +```typescript |
| 59 | +import { Netra } from "netra-sdk"; |
| 60 | + |
| 61 | +await Netra.init({ |
| 62 | + appName: "my-app", |
| 63 | + headers: `x-api-key=${process.env.NETRA_API_KEY}`, |
| 64 | +}); |
| 65 | + |
| 66 | +async function myTask(inputData: any): Promise<string> { |
| 67 | + // Call the user's app/agent here and return generated output |
| 68 | + return `response for: ${String(inputData)}`; |
| 69 | +} |
| 70 | + |
| 71 | +const dataset = await Netra.evaluation.getDataset("your-dataset-id"); |
| 72 | + |
| 73 | +const result = await Netra.evaluation.runTestSuite( |
| 74 | + "My Single-Turn Eval", |
| 75 | + dataset, |
| 76 | + myTask, |
| 77 | + ["correctness", "relevance"], // optional |
| 78 | + 5 |
| 79 | +); |
| 80 | + |
| 81 | +console.log(result?.runId); |
| 82 | +``` |
| 83 | + |
| 84 | +## Programmatic dataset management (optional) |
| 85 | + |
| 86 | +Use these SDK APIs when the user wants setup fully in code: |
| 87 | +- Python: `create_dataset`, `add_dataset_item`, `get_dataset`, `run_test_suite` |
| 88 | +- TypeScript: `createDataset`, `addDatasetItem`, `getDataset`, `runTestSuite` |
| 89 | + |
| 90 | +Minimal pattern: |
| 91 | +1. Create dataset. |
| 92 | +2. Add dataset items with `input` and `expected_output`/`expectedOutput`. |
| 93 | +3. Fetch dataset and execute test suite. |
| 94 | + |
| 95 | +### Python example (fully programmatic) |
| 96 | + |
| 97 | +```python |
| 98 | +from netra import Netra |
| 99 | + |
| 100 | +Netra.init( |
| 101 | + app_name="my-app", |
| 102 | + headers="x-api-key=YOUR_NETRA_API_KEY", |
| 103 | +) |
| 104 | + |
| 105 | +created = Netra.evaluation.create_dataset(name="Support QA Dataset") |
| 106 | +dataset_id = created["datasetId"] |
| 107 | + |
| 108 | +Netra.evaluation.add_dataset_item( |
| 109 | + dataset_id=dataset_id, |
| 110 | + item={ |
| 111 | + "input": "What is your refund window?", |
| 112 | + "expected_output": "You can request a refund within 30 days of purchase.", |
| 113 | + }, |
| 114 | +) |
| 115 | + |
| 116 | +Netra.evaluation.add_dataset_item( |
| 117 | + dataset_id=dataset_id, |
| 118 | + item={ |
| 119 | + "input": "Do you support overnight shipping?", |
| 120 | + "expected_output": "Yes, overnight shipping is available in select regions.", |
| 121 | + }, |
| 122 | +) |
| 123 | + |
| 124 | +def task(input_data): |
| 125 | + # Replace with your real app/agent call. |
| 126 | + return f"response for: {input_data}" |
| 127 | + |
| 128 | +dataset = Netra.evaluation.get_dataset(dataset_id=dataset_id) |
| 129 | + |
| 130 | +result = Netra.evaluation.run_test_suite( |
| 131 | + name="Support QA Programmatic Eval", |
| 132 | + data=dataset, |
| 133 | + task=task, |
| 134 | + max_concurrency=3, |
| 135 | +) |
| 136 | + |
| 137 | +print(result["runId"]) |
| 138 | +``` |
| 139 | + |
| 140 | +### TypeScript example (fully programmatic) |
| 141 | + |
| 142 | +```typescript |
| 143 | +import { Netra } from "netra-sdk"; |
| 144 | + |
| 145 | +await Netra.init({ |
| 146 | + appName: "my-app", |
| 147 | + headers: `x-api-key=${process.env.NETRA_API_KEY}`, |
| 148 | +}); |
| 149 | + |
| 150 | +const created = await Netra.evaluation.createDataset("Support QA Dataset"); |
| 151 | +const datasetId = created?.datasetId; |
| 152 | + |
| 153 | +if (!datasetId) { |
| 154 | + throw new Error("Dataset creation failed: missing datasetId"); |
| 155 | +} |
| 156 | + |
| 157 | +await Netra.evaluation.addDatasetItem(datasetId, { |
| 158 | + input: "What is your refund window?", |
| 159 | + expectedOutput: "You can request a refund within 30 days of purchase.", |
| 160 | +}); |
| 161 | + |
| 162 | +await Netra.evaluation.addDatasetItem(datasetId, { |
| 163 | + input: "Do you support overnight shipping?", |
| 164 | + expectedOutput: "Yes, overnight shipping is available in select regions.", |
| 165 | +}); |
| 166 | + |
| 167 | +const dataset = await Netra.evaluation.getDataset(datasetId); |
| 168 | + |
| 169 | +const result = await Netra.evaluation.runTestSuite( |
| 170 | + "Support QA Programmatic Eval", |
| 171 | + dataset, |
| 172 | + async (inputData: any) => { |
| 173 | + // Replace with your real app/agent call. |
| 174 | + return `response for: ${String(inputData)}`; |
| 175 | + }, |
| 176 | + undefined, |
| 177 | + 3 |
| 178 | +); |
| 179 | + |
| 180 | +console.log(result?.runId); |
| 181 | +``` |
| 182 | + |
| 183 | +## What to check after setup |
| 184 | + |
| 185 | +1. `runId` is returned. |
| 186 | +2. Items are mostly `completed` (not `failed`). |
| 187 | +3. Traces are linked for each test item. |
| 188 | +4. Evaluator scores appear in Test Runs. |
| 189 | + |
| 190 | +## Troubleshooting guidance |
| 191 | + |
| 192 | +- No test runs: confirm dataset has evaluators and the correct dataset id is used. |
| 193 | +- Empty/invalid outputs: ensure the `task` function returns output for every item. |
| 194 | +- Too many failures/timeouts: lower concurrency first (`max_concurrency` / `maxConcurrency`). |
| 195 | + |
| 196 | +## References |
| 197 | + |
| 198 | +- https://docs.getnetra.ai/quick-start/QuickStart_Evals |
| 199 | +- https://docs.getnetra.ai/Evaluation/Datasets |
| 200 | +- https://docs.getnetra.ai/sdk-reference/evaluation/python |
| 201 | +- https://docs.getnetra.ai/sdk-reference/evaluation/typescript |
0 commit comments