Skip to content

Commit 4b7d2f8

Browse files
committed
fix: linting
1 parent 092ea7c commit 4b7d2f8

24 files changed

Lines changed: 1285 additions & 700 deletions

File tree

CLAUDE.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,51 @@ agentspec/
4242

4343
## Design Principles
4444

45+
### 0. Thin orchestrator + named helpers (preferred style for all new code)
46+
47+
**This is the default way to write functions in this codebase.** Prefer many small, named functions over one large function — even before the code gets long.
48+
49+
When writing new code, decompose by intent first:
50+
- If a block of logic has a name (even just in a comment), make it a function.
51+
- Orchestrators read like a pipeline of named steps; they contain no implementation details.
52+
- Helpers are pure or near-pure: explicit inputs, explicit output, no side effects on shared state.
53+
54+
**Rule**: if you can label a code block with a comment like `// Phase 3: score results`, that label is the function name — extract it.
55+
56+
**Template** (applied throughout this codebase):
57+
```typescript
58+
// ── Internal interfaces (module-private) ─────────────────────────────────────
59+
interface PhaseAResult { ... }
60+
interface PhaseBResult { ... }
61+
62+
// ── Private helpers ────────────────────────────────────────────────────────────
63+
function phaseA(input: Input): PhaseAResult { ... }
64+
function phaseB(intermediate: PhaseAResult): PhaseBResult { ... }
65+
function phaseC(a: PhaseAResult, b: PhaseBResult): FinalResult { ... }
66+
67+
// ── Public orchestrator ────────────────────────────────────────────────────────
68+
export function doThing(input: Input): FinalResult {
69+
const a = phaseA(input)
70+
const b = phaseB(a)
71+
return phaseC(a, b)
72+
}
73+
```
74+
75+
**Applied examples in this repo:**
76+
| File | Orchestrator | Extracted helpers |
77+
|------|-------------|-------------------|
78+
| `sdk/src/audit/index.ts` | `runAudit()` | `resolveActiveRules` · `collectSuppressions` · `executeRuleChecks` · `computeScoring` · `computeProvedScore` |
79+
| `sdk/src/health/index.ts` | `runHealthCheck()` | `runSubagentChecks` · `runEvalChecks` · `computeHealthStatus` |
80+
| `cli/src/commands/audit.ts` | action closure | `fetchProofRecords` · `printScoreSummary` · `formatEvidenceBreakdown` |
81+
| `cli/src/commands/generate.ts` | action closure | `validateFramework` · `handleK8sGeneration` · `handleLLMGeneration` · `writePushModeEnv` |
82+
| `cli/src/commands/evaluate.ts` | action closure | `resolveChatEndpoint` · `runInference` · `determineCiGateExit` |
83+
| `cli/src/commands/scan.ts` | action closure | `collectAndValidateSourceFiles` · `validateScanResponse` |
84+
| `sdk/src/agent/reporter.ts` | `startPushMode()` | `_pushHeartbeat` (private method) |
85+
86+
**Helpers are always module-private** (not exported) unless reuse across files is proven necessary. Internal `interface` types for inter-helper data shapes are also module-private.
87+
88+
---
89+
4590
### 1. Zod as single source of truth
4691
The `packages/sdk/src/schema/manifest.schema.ts` is the canonical definition.
4792
- Types are inferred from Zod with `z.infer<>`

docs/.vitepress/config.mts

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -76,9 +76,10 @@ export default defineConfig({
7676
text: 'Verification & CI',
7777
collapsed: false,
7878
items: [
79-
{ text: 'Integrate Proof Tools', link: '/guides/proof-integration' },
80-
{ text: 'CI Integration', link: '/guides/ci-integration' },
81-
{ text: 'E2E Testing', link: '/guides/e2e-testing' },
79+
{ text: 'Structure Evaluation Datasets', link: '/guides/evaluation-datasets' },
80+
{ text: 'Integrate Proof Tools', link: '/guides/proof-integration' },
81+
{ text: 'CI Integration', link: '/guides/ci-integration' },
82+
{ text: 'E2E Testing', link: '/guides/e2e-testing' },
8283
],
8384
},
8485
{

docs/guides/evaluation-datasets.md

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# Structure Evaluation Datasets
2+
3+
Organise evaluation data so each dataset file tests one concern and each metric is declared exactly once — in `agent.yaml`, not in the JSONL.
4+
5+
## The rule: one dataset, one concern
6+
7+
Metrics are declared at the dataset level in the manifest, not per sample in the JSONL file.
8+
A dataset file contains only data — inputs, expected outputs, and optional context.
9+
10+
This means: **if you need different metrics, use different dataset files.**
11+
12+
```yaml
13+
spec:
14+
evaluation:
15+
framework: ragas
16+
datasets:
17+
- name: rag-quality
18+
path: $file:evals/rag.jsonl
19+
metrics: [faithfulness, context_recall, answer_relevancy]
20+
21+
- name: safety
22+
path: $file:evals/safety.jsonl
23+
metrics: [toxicity, bias]
24+
25+
- name: accuracy
26+
path: $file:evals/accuracy.jsonl
27+
metrics: [answer_similarity, hallucination]
28+
29+
thresholds:
30+
faithfulness: 0.80
31+
context_recall: 0.75
32+
answer_relevancy: 0.75
33+
toxicity: 0.90
34+
bias: 0.85
35+
answer_similarity: 0.80
36+
hallucination: 0.05
37+
ciGate: true
38+
```
39+
40+
## JSONL sample format
41+
42+
Each line in a dataset file is a JSON object. All fields except `input` and `expected` are optional.
43+
44+
```jsonl
45+
{"input": "What is RAG?", "expected": "Retrieval Augmented Generation", "context": ["RAG combines a retrieval step..."], "tags": ["basics"]}
46+
{"input": "How does vector search work?", "expected": "By comparing embedding distances", "context": ["Vectors are high-dimensional..."], "reference_contexts": ["Embeddings encode semantic meaning..."], "tags": ["rag", "advanced"], "metadata": {"difficulty": "medium"}}
47+
```
48+
49+
| Field | Required | Description |
50+
|---|---|---|
51+
| `input` | yes | User query sent to the agent |
52+
| `expected` | yes | Expected output — used for `answer_similarity` and `string_match` scoring |
53+
| `context` | for RAG metrics | Retrieved chunks the agent used. Required for `faithfulness`, `context_precision`, `hallucination` |
54+
| `reference_contexts` | for `context_recall` | Ground-truth relevant chunks. Required for `context_recall` |
55+
| `tags` | no | Labels for filtering with `--tag` |
56+
| `metadata` | no | Arbitrary key/value pairs reported in output (e.g. `{"difficulty": "hard", "source": "prod-logs"}`) |
57+
58+
## Which metrics need which fields
59+
60+
| Metric | `context` | `reference_contexts` |
61+
|---|---|---|
62+
| `answer_similarity` | no | no |
63+
| `answer_relevancy` | no | no |
64+
| `hallucination` | yes | no |
65+
| `faithfulness` | yes | no |
66+
| `context_precision` | yes | no |
67+
| `context_recall` | yes | yes |
68+
| `toxicity` | no | no |
69+
| `bias` | no | no |
70+
71+
If a dataset declares a RAG metric but its samples have no `context` field, the evaluation framework will error or return meaningless scores. Splitting by concern prevents this.
72+
73+
## Running a dataset
74+
75+
```bash
76+
# Run all samples
77+
agentspec evaluate agent.yaml --url http://localhost:4000 --dataset rag-quality
78+
79+
# Run 20 random samples
80+
agentspec evaluate agent.yaml --url http://localhost:4000 --dataset rag-quality --sample-size 20
81+
82+
# Run only samples tagged "advanced"
83+
agentspec evaluate agent.yaml --url http://localhost:4000 --dataset rag-quality --tag advanced
84+
85+
# Machine-readable output
86+
agentspec evaluate agent.yaml --url http://localhost:4000 --dataset safety --json
87+
```
88+
89+
Exit code `1` when `ciGate: true` and any metric falls below its threshold.
90+
91+
## Recommended file layout
92+
93+
```
94+
evals/
95+
rag.jsonl # faithfulness, context_recall, answer_relevancy
96+
safety.jsonl # toxicity, bias
97+
accuracy.jsonl # answer_similarity, hallucination
98+
regression.jsonl # string_match on known Q&A pairs (no context needed)
99+
```
100+
101+
One JSONL per concern keeps datasets independently runnable, independently versionable, and easy to extend without touching other test suites.
102+
103+
## See also
104+
105+
- [`agentspec evaluate` CLI reference](../reference/cli.md#agentspec-evaluate)
106+
- [Probe coverage & evidence tiers](../concepts/probe-coverage.md)
107+
- [CI integration](./ci-integration.md)

packages/adapter-claude/src/__tests__/claude-adapter.test.ts

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ const baseManifest: AgentSpecManifest = {
2222
},
2323
prompts: {
2424
system: '$file:prompts/system.md',
25+
hotReload: false,
2526
},
2627
},
2728
}
@@ -69,7 +70,7 @@ function makeClaudeResponse(jsonContent: object | string): object {
6970
// ── context-builder tests ─────────────────────────────────────────────────────
7071

7172
describe('buildContext()', () => {
72-
let buildContext: (opts: { manifest: AgentSpecManifest; contextFiles?: string[] }) => string
73+
let buildContext: (opts: { manifest: AgentSpecManifest; contextFiles?: string[]; manifestDir?: string }) => string
7374

7475
beforeEach(async () => {
7576
const mod = await import('../context-builder.js')
@@ -119,7 +120,7 @@ describe('buildContext()', () => {
119120
name: 'log-workout',
120121
description: 'Log a workout',
121122
module: '$file:tool_implementations.py',
122-
} as unknown as AgentSpecManifest['spec']['tools'][0],
123+
} as unknown as NonNullable<AgentSpecManifest['spec']['tools']>[number],
123124
],
124125
},
125126
}
@@ -143,7 +144,7 @@ describe('buildContext()', () => {
143144
name: 'log-workout',
144145
description: 'Log a workout',
145146
module: '$file:tool_implementations.py',
146-
} as unknown as AgentSpecManifest['spec']['tools'][0],
147+
} as unknown as NonNullable<AgentSpecManifest['spec']['tools']>[number],
147148
],
148149
},
149150
}
@@ -233,7 +234,7 @@ describe('loadSkill() guidelines prepend', () => {
233234
describe('generateWithClaude()', () => {
234235
let generateWithClaude: (
235236
manifest: AgentSpecManifest,
236-
opts: { framework: string; model?: string; contextFiles?: string[] },
237+
opts: import('../index.js').ClaudeAdapterOptions,
237238
) => Promise<import('@agentspec/sdk').GeneratedAgent>
238239

239240
const savedKey = process.env['ANTHROPIC_API_KEY']

packages/cli/src/__tests__/commands.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ beforeEach(() => {
130130
if ((code ?? 0) !== 0) {
131131
throw new ExitError(code ?? 0)
132132
}
133-
}) as unknown as (code?: number) => never)
133+
}) as unknown as typeof process.exit)
134134
logSpy = vi.spyOn(console, 'log').mockImplementation(() => {})
135135
errorSpy = vi.spyOn(console, 'error').mockImplementation(() => {})
136136
})

packages/cli/src/__tests__/deploy-k8s.test.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ const minimalManifest: AgentSpecManifest = {
2828
},
2929
prompts: {
3030
system: 'You are a helpful assistant.',
31+
hotReload: false,
3132
},
3233
},
3334
}
@@ -53,10 +54,12 @@ const fullManifest: AgentSpecManifest = {
5354
},
5455
prompts: {
5556
system: '$file:prompts/system.md',
57+
hotReload: false,
5658
},
5759
api: {
5860
type: 'rest',
5961
port: 3000,
62+
streaming: false,
6063
},
6164
requires: {
6265
envVars: ['OPENAI_API_KEY', 'DATABASE_URL', 'REDIS_URL'],

packages/cli/src/__tests__/diff.test.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -292,9 +292,9 @@ describe('diff — CLI integration', () => {
292292
' guardrails: {}',
293293
].join('\n'))
294294

295-
const exitSpy = vi.spyOn(process, 'exit').mockImplementation((_code?: number): never => {
295+
const exitSpy = vi.spyOn(process, 'exit').mockImplementation(((_code?: number): never => {
296296
throw new Error(`process.exit(${_code})`)
297-
})
297+
}) as unknown as typeof process.exit)
298298

299299
try {
300300
await expect(runDiff(from, to, ['--exit-code'])).rejects.toThrow('process.exit(1)')

packages/cli/src/__tests__/evaluate.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ beforeEach(() => {
9898
vi.clearAllMocks()
9999
vi.spyOn(process, 'exit').mockImplementation(((code?: number) => {
100100
if ((code ?? 0) !== 0) throw new ExitError(code ?? 0)
101-
}) as unknown as (code?: number) => never)
101+
}) as unknown as typeof process.exit)
102102
logSpy = vi.spyOn(console, 'log').mockImplementation(() => {})
103103
errorSpy = vi.spyOn(console, 'error').mockImplementation(() => {})
104104
})

packages/cli/src/__tests__/generate-policy.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -364,7 +364,7 @@ describe('registerGeneratePolicyCommand CLI', () => {
364364
vi.clearAllMocks()
365365
vi.spyOn(process, 'exit').mockImplementation(((code?: number) => {
366366
if ((code ?? 0) !== 0) throw new ExitError(code ?? 0)
367-
}) as unknown as (code?: number) => never)
367+
}) as unknown as typeof process.exit)
368368
logSpy = vi.spyOn(console, 'log').mockImplementation(() => {})
369369
errorSpy = vi.spyOn(console, 'error').mockImplementation(() => {})
370370
})

packages/cli/src/__tests__/generate.test.ts

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -378,17 +378,18 @@ describe('generate — listFrameworks error handling', () => {
378378
let outDir: string
379379
let consoleLogSpy: ReturnType<typeof vi.spyOn>
380380
let consoleErrorSpy: ReturnType<typeof vi.spyOn>
381-
let exitSpy: ReturnType<typeof vi.spyOn>
381+
// eslint-disable-next-line @typescript-eslint/no-explicit-any
382+
let exitSpy: any
382383

383384
beforeEach(async () => {
384385
outDir = mkdtempSync(join(tmpdir(), 'agentspec-lfe-test-'))
385386
process.env['ANTHROPIC_API_KEY'] = 'test-key'
386387
consoleLogSpy = vi.spyOn(console, 'log').mockImplementation(() => {})
387388
consoleErrorSpy = vi.spyOn(console, 'error').mockImplementation(() => {})
388389
// Prevent process.exit from actually terminating the test runner
389-
exitSpy = vi.spyOn(process, 'exit').mockImplementation((_code?: number): never => {
390+
exitSpy = vi.spyOn(process, 'exit').mockImplementation(((_code?: number): never => {
390391
throw new Error(`process.exit(${_code})`)
391-
})
392+
}) as unknown as typeof process.exit)
392393
})
393394

394395
afterEach(() => {
@@ -537,16 +538,17 @@ describe('generate — writeGeneratedFiles error catch', () => {
537538
let outDir: string
538539
let consoleLogSpy: ReturnType<typeof vi.spyOn>
539540
let consoleErrorSpy: ReturnType<typeof vi.spyOn>
540-
let exitSpy: ReturnType<typeof vi.spyOn>
541+
// eslint-disable-next-line @typescript-eslint/no-explicit-any
542+
let exitSpy: any
541543

542544
beforeEach(() => {
543545
outDir = mkdtempSync(join(tmpdir(), 'agentspec-wgf-err-'))
544546
process.env['ANTHROPIC_API_KEY'] = 'test-key'
545547
consoleLogSpy = vi.spyOn(console, 'log').mockImplementation(() => {})
546548
consoleErrorSpy = vi.spyOn(console, 'error').mockImplementation(() => {})
547-
exitSpy = vi.spyOn(process, 'exit').mockImplementation((_code?: number): never => {
549+
exitSpy = vi.spyOn(process, 'exit').mockImplementation(((_code?: number): never => {
548550
throw new Error(`process.exit(${_code})`)
549-
})
551+
}) as unknown as typeof process.exit)
550552
vi.clearAllMocks()
551553
})
552554

@@ -562,9 +564,11 @@ describe('generate — writeGeneratedFiles error catch', () => {
562564
// Return a path traversal filename that writeGeneratedFiles will reject
563565
const { generateWithClaude } = await import('@agentspec/adapter-claude')
564566
vi.mocked(generateWithClaude).mockResolvedValueOnce({
567+
framework: 'langgraph',
565568
files: { '../../evil.txt': 'malicious content' },
566569
installCommands: [],
567570
envVars: [],
571+
readme: '',
568572
})
569573

570574
const { registerGenerateCommand } = await import('../commands/generate.js')
@@ -677,9 +681,11 @@ describe('generate --deploy helm', () => {
677681
it('calls generateWithClaude twice when --deploy helm is set', async () => {
678682
const { generateWithClaude } = await import('@agentspec/adapter-claude')
679683
vi.mocked(generateWithClaude).mockResolvedValue({
684+
framework: 'langgraph',
680685
files: { 'agent.py': '# agent', 'agent.yaml': '# manifest' },
681686
installCommands: [],
682687
envVars: [],
688+
readme: '',
683689
})
684690

685691
await runGenerateWithDeploy(outDir, 'helm')

0 commit comments

Comments
 (0)