feat(schema): provider data-policy metadata fields by OriginalGary · Pull Request #5 · OriginalGary/graze

OriginalGary · 2026-05-05T19:18:30Z

Summary

Five data-policy fields added to ProviderSchema + audited across all 152 providers. Two commits:

Initial schema + conservative audit — Zod invariants, 13 providers classified, 139 TODO
Priority-1 audit resolution — 10 more providers verified, tier-dependent pattern documented

Schema touchpoints

Single schema layer extended: src/shared/validation/providerSchema.ts → src/shared/constants/providers.ts. The main RegistryEntry in providerRegistry.ts (operational call config) is unchanged — sensitivity routing will join on provider ID.

Invariants

#	Rule	Error message shape
1	`local=true` → `trains_on_data=false`	`Provider "id": invariant violated — local=true requires trains_on_data=false`
2	`e2ee=true` → `trains_on_data=false`	`Provider "id": invariant violated — e2ee=true requires trains_on_data=false`
3	`data_residency="multi"` → `retention_days` not null	`Provider "id": invariant violated — data_residency="multi" requires retention_days to be specified`

Audit results

Confidently classified: 23 providers (up from 13 in commit 1)

Provider	trains_on_data	data_residency	retention_days	local
anthropic	false	US	30	false
openai	false	US	30	false
glm	false	SG	null	false
glm-cn	false	CN	null	false
glmt	false	SG	null	false
azure-openai	false	multi	30	false
azure-ai	false	multi	30	false
bedrock	false	multi	0	false
vertex	false	multi	30	false
vertex-partner	false	multi	30	false
lm-studio … comfyui (10 local)	false	local	0	true
searxng-search	false	local	0	true

glmt classification: glmt is NOT a separate aggregator — it is a preset variant (thinking mode + higher token budget + longer timeout) on the same Z.ai API endpoint (https://api.z.ai/api/anthropic/v1/messages) as glm. Registry entry confirms baseUrl is identical. Same Z.ai Additional Terms §3.b apply → same SG residency and trains_on_data=false as glm.

Tier-dependent providers (3): gemini, codex, github — inline comments upgraded from TODO to Verified, but conservative default (trains_on_data=true) is preserved. Override mechanism deferred to workstream 4.

Anthropic ZDR: confirmed e2ee=false. Contractual ≠ architectural.

gemini.trains_on_data confirmed true (assertion in test suite prevents regression).

TODO(sam): ~129 providers remain.

Sam-verify priority — remaining

Aggregators (openrouter, laozhang, etc.) — need contractual guarantee from aggregator, not upstream
Remaining OAuth providers (cursor, gitlab-duo, kimi-coding, claude consumer OAuth, etc.)
Enterprise cloud (watsonx, oci, sap, databricks, etc.)
Niche/specialized — all remaining APIKEY, WEB_COOKIE, AUDIO, SEARCH

Tier eligibility summary

Tier	Count	Providers
tier-1 (all)	152	all
tier-2 (trains_on_data=false)	23	anthropic, openai, glm, glm-cn, glmt, azure-openai, azure-ai, bedrock, vertex, vertex-partner, 11 local + searxng-search
tier-3 (local=true)	12	11 LOCAL_PROVIDERS + searxng-search

Surprises for sensitivity-routing prompt

AGGREGATOR_PROVIDER_IDS Set (17 entries) and SELF_HOSTED_CHAT_PROVIDER_IDS Set (8 entries) already exist in providers.ts — sensitivity routing can use these for fast-path decisions
Two schema layers (providers.ts = policy, providerRegistry.ts = call config) — routing joins on provider ID
sdwebui and comfyui are local=true (tier-3) but are NOT in SELF_HOSTED_CHAT_PROVIDER_IDS (image-only)

Tests

32 assertions in tests/unit/provider-metadata-schema.test.ts:

Schema field acceptance, data_residency format validation
All 3 invariant violations with named-provider error messages
Regression guards: gemini.trains_on_data=true, bedrock.retention_days=0, azure-openai invariant-3 path
glm/glm-cn/glmt, vertex/vertex-partner, github/codex tier-dependent fields
Full registry load guard + all-fields-present sweep

🤖 Generated with Claude Code

Adds trains_on_data, data_residency, retention_days, local, and e2ee to ProviderSchema (src/shared/validation/providerSchema.ts). All five fields are required and Zod-validated at module load time with three cross-field invariants enforced with named-provider error messages: 1. local=true → trains_on_data must be false 2. e2ee=true → trains_on_data must be false 3. data_residency="multi" → retention_days must not be null Provider audit: 13 providers confidently classified (anthropic, openai, and all 11 self-hosted local providers including searxng-search). The remaining 139 providers are assigned conservative defaults (trains_on_data=true, data_residency="unknown", retention_days=null) with per-provider TODO(sam) comments pointing to their policy URL. Anthropic ZDR is explicitly marked e2ee=false (contractual ≠ architectural). 22 new tests in provider-metadata-schema.test.ts cover: field acceptance, data_residency format validation, all three invariant violations with named-provider error messages, local provider bulk assertions, the Anthropic ZDR e2ee=false case, full registry load regression guard, and validateProviders throw-on-violation paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-05-05T19:24:33Z

CI Coverage Report

Coverage job: success
PR test policy: failure

Coverage artifact was not available for this run.

PR Test Policy

This PR changes production code in src/, open-sse/, electron/, or bin/ without accompanying automated tests.

… bedrock, vertex, tier-dependent) Applies verified data-policy values to 10 providers previously at conservative TODO defaults: glm, glmt — Z.ai international, trains_on_data=false, data_residency=SG glm-cn — Z.ai China endpoint, trains_on_data=false, data_residency=CN azure-openai — trains_on_data=false, data_residency=multi, retention_days=30 azure-ai — same Azure AOAI data-privacy policy as azure-openai bedrock — trains_on_data=false, data_residency=multi, retention_days=0 (zero-persistence architecture, not just contractual) vertex — trains_on_data=false, data_residency=multi, retention_days=30 vertex-partner — same Vertex AI DPA covers partner models in Model Garden Tier-dependent providers (gemini, codex, github) keep trains_on_data=true but have their inline comments upgraded from TODO to Verified, with policy source URLs and an explanation that the conservative default applies because Graze cannot determine subscription tier from an API key or OAuth token. github gets retention_days=28 and data_residency=US from its published policy. Adds §5a Tier-dependent providers to GRAZE.md documenting the pattern, Graze's stance, and the deferred override mechanism. 10 new test assertions: gemini.trains_on_data=true regression guard, bedrock.retention_days=0, azure-openai invariant-3 path, glm/glm-cn/glmt residency, vertex/vertex-partner, github tier-dependent fields, codex. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

OriginalGary merged commit c77b999 into main May 5, 2026
3 of 4 checks passed

OriginalGary deleted the feat/provider-metadata branch May 5, 2026 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(schema): provider data-policy metadata fields#5

feat(schema): provider data-policy metadata fields#5
OriginalGary merged 2 commits intomainfrom
feat/provider-metadata

OriginalGary commented May 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

OriginalGary commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Schema touchpoints

Invariants

Audit results

Sam-verify priority — remaining

Tier eligibility summary

Surprises for sensitivity-routing prompt

Tests

Uh oh!

github-actions Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Coverage Report

PR Test Policy

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

OriginalGary commented May 5, 2026 •

edited

Loading

github-actions Bot commented May 5, 2026 •

edited

Loading