From be5d8c88f8b0708b5468507c8d410b489ad0b687 Mon Sep 17 00:00:00 2001 From: Alexey Kazakov Date: Tue, 12 May 2026 16:45:44 -0700 Subject: [PATCH 1/2] docs: add web search and web fetch design proposal MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Design for operator-managed web search via spec.webSearch (Brave, Tavily, DuckDuckGo, Gemini) and web fetch via spec.webFetch - Search API keys use proxy credential injection — secrets never reach gateway - Three provider categories: standalone API, key-free, LLM-as-search - Companion questions doc with all design decisions and rationale Signed-off-by: Alexey Kazakov Co-authored-by: Cursor --- docs/proposals/web-search-design.md | 374 +++++++++++++++++++++++++ docs/proposals/web-search-questions.md | 201 +++++++++++++ 2 files changed, 575 insertions(+) create mode 100644 docs/proposals/web-search-design.md create mode 100644 docs/proposals/web-search-questions.md diff --git a/docs/proposals/web-search-design.md b/docs/proposals/web-search-design.md new file mode 100644 index 0000000..65a4580 --- /dev/null +++ b/docs/proposals/web-search-design.md @@ -0,0 +1,374 @@ +# Web Search and Web Fetch Support + +**Status:** Final +**Date:** 2026-05-12 +**Decisions:** [web-search-questions.md](web-search-questions.md) + +## Overview + +Add operator-managed web search and web fetch configuration to the Claw CRD. OpenClaw has a built-in `web_search` agent tool backed by 12 bundled provider plugins and a `web_fetch` tool for arbitrary URL fetching. Today the operator has no way to configure either — users would have to manually edit `openclaw.json` inside the pod, which is fragile, not declarative, and doesn't integrate with the operator's security model. + +This feature enables users to declare a web search provider and/or enable web fetch in the Claw CR. The operator injects the necessary configuration into `operator.json`, sets up proxy credential injection and domain allowlisting, and manages secrets — all through the same reconciliation pipeline used for credentials, channels, and MCP servers. + +## Design Principles + +1. **Security first:** Search API keys flow through the MITM proxy's credential injection — secrets never reach the gateway container. The proxy domain allowlist is the L7 gate. No new egress surface is opened by default. + +2. **Thin operator, smart app:** The operator sets `tools.web.search.provider`, maps secrets to proxy routes, and injects a placeholder API key into the config. It does *not* replicate OpenClaw's plugin config schemas. Provider-specific tuning uses an opaque `config` field, keeping the operator decoupled from upstream changes. + +3. **Consistent patterns:** Follow the same reconciliation, ConfigMap injection, proxy route, and condition patterns as credentials, channels, and MCP servers. + +## Architecture + +### How OpenClaw Consumes Web Search Config + +OpenClaw reads web search configuration from `openclaw.json`: + +```json5 +{ + tools: { + web: { + search: { + enabled: true, + provider: "brave" + }, + fetch: { + enabled: true + } + } + }, + plugins: { + entries: { + brave: { + config: { + webSearch: { + apiKey: "ah-ah-ah-you-didnt-say-the-magic-word" + } + } + } + } + } +} +``` + +Each search provider also checks specific environment variables as credential fallbacks (`BRAVE_API_KEY`, `TAVILY_API_KEY`, etc.), but the operator uses the config path with a placeholder API key — the real key is injected by the proxy. + +### Provider Categories + +Phase 1 supports four providers across three categories: + +| Provider | Category | Proxy Route | Auth Injection | Secret Required | +|----------|----------|-------------|----------------|:---:| +| **Brave** | Standalone API | `api.search.brave.com` | `api_key` (header: `X-Subscription-Token`) | Yes | +| **Tavily** | Standalone API | `api.tavily.com` | `bearer` | Yes | +| **DuckDuckGo** | Key-free | `html.duckduckgo.com` | `none` (passthrough) | No | +| **Gemini** | LLM-as-search | *(none — reuses google credential)* | *(existing google route)* | No | + +**Category 1: Standalone search APIs** (Brave, Tavily) — dedicated API key, dedicated domain. The operator adds a proxy route with credential injection. A placeholder API key is set in the config; the proxy strips it and injects the real key. + +**Category 2: Key-free** (DuckDuckGo) — no API key. The operator adds a passthrough proxy route and sets the provider in config. Zero secrets. + +**Category 3: LLM-as-search** (Gemini) — reuses existing LLM provider credentials. Gemini search grounding sends a regular Gemini API call with `tools: [{ google_search }]`. Traffic flows through the existing `google` provider credential route. The operator only sets `tools.web.search.provider: "gemini"` — no new proxy route or secret. + +### Security Model + +**Search API keys** are handled via proxy credential injection (same as LLM providers): + +1. The user's Secret is mounted as a `secretKeyRef` env var on the **proxy** container (not the gateway), using `credEnvVarName("websearch")` → `CRED_WEBSEARCH` +2. The operator adds a proxy route for the search domain with the appropriate injector (`api_key` or `bearer`) referencing that env var +3. The operator sets a static placeholder API key (`"ah-ah-ah-you-didnt-say-the-magic-word"`) in the gateway config via `plugins.entries..config.webSearch.apiKey` +4. OpenClaw sees a non-empty key and makes the HTTP call through the proxy +5. The MITM proxy intercepts the request, strips the placeholder auth header, and injects the real credential from `CRED_WEBSEARCH` +6. The request reaches the upstream with the real key + +The gateway container **never sees the real API key**. + +The search secret's `ResourceVersion` is stamped on the proxy pod template (alongside existing credential stamps) to trigger rollouts when the search API key Secret changes. + +**Web fetch** is gated by the proxy allowlist. When `spec.webFetch.enabled: true`, the operator sets `tools.web.fetch.enabled: true` in the config. The agent can only fetch URLs on domains already permitted by the proxy (LLM providers, search APIs, builtin passthroughs). Users can open additional domains via `spec.credentials` entries with `type: none`. + +### Reconciliation Flow + +``` +Claw CR spec.webSearch / spec.webFetch + │ + ▼ + validateWebSearchConfig() ◄── validate provider name + │ check secret exists (API-keyed) + │ check google credential exists (gemini) + │ set WebSearchConfigured condition + │ + ▼ + applyProxyResources() + ├─► generateProxyConfig() ◄── add search domain as credential + │ injection route (brave, tavily) + │ or passthrough (duckduckgo) + │ webSearch is passed alongside + │ credentials and mcpServers + │ + ▼ + buildKustomizedObjects() + │ + ▼ + configureDeployments() + ├─► configureProxyForWebSearch() ◄── mount search secret as + │ CRED_WEBSEARCH env var on + │ proxy container (secretKeyRef) + │ + ▼ + stampSecretVersionAnnotation() ◄── includes web search secret + │ in proxy pod template stamps + │ + ▼ + enrichConfigAndNetworkPolicy() + ├─► injectWebSearchIntoConfigMap() ◄── operator.json: + │ tools.web.search.provider + │ tools.web.fetch.enabled + │ plugins.entries..config.webSearch + │ + ▼ + merge.js (init-config) ◄── merged into PVC openclaw.json +``` + +## CRD Schema + +### New Fields on ClawSpec + +```go +type ClawSpec struct { + ConfigMode ConfigMode `json:"configMode,omitempty"` + Credentials []CredentialSpec `json:"credentials,omitempty"` + McpServers map[string]McpServerSpec `json:"mcpServers,omitempty"` + + // WebSearch configures the web search provider for the OpenClaw agent. + // +optional + WebSearch *WebSearchSpec `json:"webSearch,omitempty"` + + // WebFetch enables the web_fetch tool for arbitrary URL fetching. + // Fetched URLs are gated by the proxy allowlist — only domains + // permitted by credentials, search providers, or builtins are reachable. + // +optional + WebFetch *WebFetchSpec `json:"webFetch,omitempty"` +} +``` + +### WebSearchSpec + +```go +// WebSearchSpec configures the operator-managed web search provider. +type WebSearchSpec struct { + // Provider selects the web search provider. + // Known values: brave, tavily, duckduckgo, gemini. + // +kubebuilder:validation:MinLength=1 + Provider string `json:"provider"` + + // SecretRef references a Secret key holding the search API key. + // Required for API-keyed providers (brave, tavily). + // Not needed for key-free (duckduckgo) or LLM-as-search (gemini). + // +optional + SecretRef *SecretRefEntry `json:"secretRef,omitempty"` + + // Config is provider-specific configuration merged into + // plugins.entries..config.webSearch in operator.json. + // Use for provider-specific tuning (mode, maxResults, etc.). + // +kubebuilder:pruning:PreserveUnknownFields + // +optional + Config *runtime.RawExtension `json:"config,omitempty"` +} +``` + +### WebFetchSpec + +```go +// WebFetchSpec configures the web_fetch tool. +type WebFetchSpec struct { + // Enabled activates the web_fetch tool. Fetched URLs are gated by + // the proxy allowlist. + // +kubebuilder:default=true + Enabled bool `json:"enabled"` +} +``` + +### CEL Validation + +On `WebSearchSpec`: +- `secretRef` is required when `provider` is `brave` or `tavily` + +```go +// +kubebuilder:validation:XValidation:rule="self.provider in ['duckduckgo','gemini'] || has(self.secretRef)",message="secretRef is required for API-keyed search providers" +``` + +Note: we intentionally do *not* reject `secretRef` on key-free/LLM-as-search providers. The operator ignores it for those providers, but refusing it at admission time would mean hard-coding the provider categorization into the CEL rule, making it fragile when adding new providers. The reconciler validation (below) already handles the semantics per category. + +### Reconciler Validation + +- **Brave, Tavily:** Secret referenced by `secretRef` must exist and contain the specified key +- **Gemini:** A credential with `provider: "google"` must exist in `spec.credentials` +- **DuckDuckGo:** No validation needed +- Failures set `WebSearchConfigured=False` with a descriptive message and `Ready=False` + +## Proxy Configuration + +For API-keyed providers, the operator adds the search domain as a credential injection route in the proxy config. This is handled alongside existing credential routes in `generateProxyConfig`. + +```go +var knownSearchProviders = map[string]searchProviderInfo{ + "brave": { + Domain: "api.search.brave.com", + Injector: "api_key", + Header: "X-Subscription-Token", + }, + "tavily": { + Domain: "api.tavily.com", + Injector: "bearer", + }, + "duckduckgo": { + Domain: "html.duckduckgo.com", + Injector: "none", + }, + // gemini: no entry — reuses existing google provider credential route +} +``` + +**Proxy deployment changes:** The search secret is mounted on the proxy container (not the gateway) as a `secretKeyRef` env var named `CRED_WEBSEARCH` (using the existing `credEnvVarName` helper with `"websearch"` as the credential name). The proxy reads this env var at runtime to inject the real credential. This is done by a new `configureProxyForWebSearch` function, following the same pattern as `configureProxyForCredentials`. + +**Secret version stamping:** `stampSecretVersionAnnotation` must also stamp the web search secret's `ResourceVersion` on the proxy pod template, so that Secret changes trigger a proxy rollout. This requires extending the function to also check `spec.webSearch.secretRef` alongside `spec.credentials`. + +**`generateProxyConfig` signature:** The function currently takes `(credentials []resolvedCredential, mcpServers map[string]McpServerSpec)`. It needs to also accept `webSearch *WebSearchSpec` (or the full `ClawSpec`) to emit the search domain routes. The search provider route is appended to the routes list alongside credential routes, using the `knownSearchProviders` mapping table above. + +## ConfigMap Injection + +`injectWebSearchIntoConfigMap` sets up to three blocks in `operator.json`: + +1. **`tools.web.search`** — provider selection: + ```json + { "enabled": true, "provider": "brave" } + ``` + +2. **`plugins.entries..config.webSearch`** — placeholder API key + user config (API-keyed providers only): + ```json + { "apiKey": "ah-ah-ah-you-didnt-say-the-magic-word" } + ``` + Only injected for Brave and Tavily. User-provided `spec.webSearch.config` is deep-merged into this block. For **DuckDuckGo**, no plugin entry is needed (key-free). For **Gemini**, no `apiKey` is injected — the Gemini search provider falls back to `models.providers.google.apiKey` which the operator already sets for the google LLM provider. If the user provides `spec.webSearch.config` for Gemini, it is merged into `plugins.entries.google.config.webSearch` (OpenClaw's google extension id) for provider-specific tuning (model, baseUrl, etc.). + +3. **`tools.web.fetch`** — when `spec.webFetch` is set: + ```json + { "enabled": true } + ``` + +This follows the same pattern as `injectChannelsIntoConfigMap`, which sets both `channels` and `plugins.entries`. + +## Status Condition + +New `WebSearchConfigured` condition type: + +- `True` — search provider validated and config injected +- `False` — validation failed (missing secret, unknown provider, missing google credential for gemini) +- Not set when `spec.webSearch` is nil + +Failures also set `Ready=False`. + +## Examples + +### Brave Search (API key, proxy-injected) + +```yaml +apiVersion: claw.sandbox.redhat.com/v1alpha1 +kind: Claw +metadata: + name: my-instance +spec: + credentials: + - name: anthropic + type: apiKey + secretRef: + - name: anthropic-api-key + key: api-key + provider: anthropic + webSearch: + provider: brave + secretRef: + name: brave-search-key + key: api-key + webFetch: + enabled: true +``` + +### DuckDuckGo (key-free) + +```yaml +spec: + webSearch: + provider: duckduckgo +``` + +### Gemini Search Grounding (reuses LLM credential) + +```yaml +spec: + credentials: + - name: google + type: apiKey + secretRef: + - name: google-api-key + key: api-key + provider: google + webSearch: + provider: gemini +``` + +### Tavily with custom config + +```yaml +spec: + webSearch: + provider: tavily + secretRef: + name: tavily-key + key: api-key + config: + maxResults: 10 +``` + +### Web fetch only (no search) + +```yaml +spec: + credentials: + - name: docs-site + type: none + domain: docs.python.org + webFetch: + enabled: true +``` + +## Implementation Plan + +### Files to Change + +| File | Change | +|------|--------| +| `api/v1alpha1/claw_types.go` | Add `WebSearchSpec`, `WebFetchSpec`, new fields on `ClawSpec`, `WebSearchConfigured` condition constant, CEL validation | +| `api/v1alpha1/zz_generated.deepcopy.go` | Regenerated (`make generate`) | +| `config/crd/bases/` | Regenerated CRD YAML (`make manifests`) | +| New: `internal/controller/claw_web_search.go` | `validateWebSearchConfig`, `injectWebSearchIntoConfigMap`, `configureProxyForWebSearch` (mount search secret on proxy), known provider mapping table (`knownSearchProviders`) | +| `internal/controller/claw_proxy.go` | Extend `generateProxyConfig` to accept `*WebSearchSpec` and emit search domain routes; extend `stampSecretVersionAnnotation` to also stamp `spec.webSearch.secretRef` | +| `internal/controller/claw_resource_controller.go` | Wire validation into reconcile loop, call `configureProxyForWebSearch` in `configureDeployments`, call `injectWebSearchIntoConfigMap` in `enrichConfigAndNetworkPolicy`, pass `webSearch` to `generateProxyConfig` | +| New: `internal/controller/claw_web_search_test.go` | Tests for validation, ConfigMap injection, proxy route generation, proxy deployment mounting | +| `docs/provider-setup.md` | User-facing documentation for web search and web fetch setup | + +### Steps + +1. Add CRD types (`WebSearchSpec`, `WebFetchSpec`, condition constant, CEL rule) and run `make manifests generate` +2. Create `claw_web_search.go` with `knownSearchProviders` mapping, `validateWebSearchConfig`, `configureProxyForWebSearch`, and `injectWebSearchIntoConfigMap` +3. Extend `generateProxyConfig` to accept `*WebSearchSpec` and emit search domain routes +4. Extend `stampSecretVersionAnnotation` to stamp the web search secret's `ResourceVersion` +5. Wire into reconciler: validation → proxy config (with web search) → deployment mounting → secret stamping → ConfigMap injection → condition +6. Tests (validation, ConfigMap injection, proxy routes, proxy deployment mounting, secret version stamping) +7. Documentation + +## Future Considerations + +- Additional search providers (Exa, Firecrawl, Perplexity, Grok) — one table entry each +- Firecrawl as a `webFetch` provider for JS-heavy pages +- SearXNG support (self-hosted, requires base URL and potentially a sidecar) diff --git a/docs/proposals/web-search-questions.md b/docs/proposals/web-search-questions.md new file mode 100644 index 0000000..193d9a4 --- /dev/null +++ b/docs/proposals/web-search-questions.md @@ -0,0 +1,201 @@ +# Web Search Support — Design Questions + +**Status:** All decisions made +**Related:** [Design document](web-search-design.md) + +Each question has options with trade-offs and a recommendation. Go through them one by one to form the design, then update the design document. + +--- + +## Q1: How should search API keys be delivered to the gateway? + +OpenClaw's `web_search` tool makes HTTP calls directly from the gateway process (Node.js), reading the API key from environment variables (e.g., `BRAVE_API_KEY`) or from `plugins.entries..config.webSearch.apiKey` in config. This is different from LLM provider traffic, which flows through the MITM proxy for credential injection. + +The key tension: our security model keeps secrets off the gateway container whenever possible (the proxy injects them), but OpenClaw's search implementation reads the key from the gateway's own env/config. + +### Option B: Proxy credential injection (secret stays on proxy) + +Add the search provider's domain as a credential route in the proxy config. The proxy injects the API key header on outbound requests. Set a placeholder value in `plugins.entries..config.webSearch.apiKey` so OpenClaw thinks it has a key and makes the call. + +- **Pro:** Consistent with the operator's core security model. Secret never touches the gateway. +- **Pro:** The proxy's L7 allowlist already controls which domains are reachable — adding credential injection for the search domain is natural. +- **Pro:** Source code analysis confirms all major providers use header-based auth that the proxy can inject: + - Brave: `X-Subscription-Token` header → `api_key` injector + - Tavily: `Authorization: Bearer` header (via `postTrustedWebToolsJson`) → `bearer` injector + - Exa, Firecrawl, Perplexity: `Authorization: Bearer` header → `bearer` injector +- **Pro:** LLM-as-search providers (Gemini, Grok) reuse existing LLM credentials already proxy-injected — no new handling needed. + +**Decision:** Option B — proxy credential injection for API-keyed providers. Source analysis confirmed all target providers use header-based auth compatible with the proxy's existing `api_key` and `bearer` injectors. Secrets stay off the gateway, consistent with the operator's core security model. LLM-as-search providers (Gemini) need no new secret handling since their LLM credential is already proxy-injected. + +_Considered and rejected: Option A — gateway env var (breaks "no secrets on gateway" principle, unnecessary given header-based auth works), Option C — hybrid (two code paths for no benefit since all providers are header-injectable)_ + +--- + +## Q2: Which providers should the operator support in phase 1? + +The operator needs a known-provider mapping table (provider name → API domain, env var name, injector type). Supporting all 12 OpenClaw providers upfront is a large surface. We need to pick a useful initial set. + +### Option: Brave + Tavily + DuckDuckGo + Gemini (four providers) + +Covers three categories with minimal effort: + +| Provider | Category | New proxy route | New secret | Effort | +|----------|----------|:-:|:-:|--------| +| Brave | Standalone API | Yes (`api.search.brave.com`) | Yes (proxy `api_key`) | Medium | +| Tavily | Standalone API | Yes (`api.tavily.com`) | Yes (proxy `bearer`) | Medium | +| DuckDuckGo | Key-free | Yes (`html.duckduckgo.com`) | No | Low | +| Gemini | LLM-as-search | No (reuses google credential) | No | Low | + +- **Pro:** Covers the most common use cases: Brave (#1 popularity), Tavily (#2, popular in AI agent frameworks), DuckDuckGo (free fallback), Gemini (free for existing google credential users). +- **Pro:** Gemini is nearly zero implementation cost — pure config injection, no new proxy route or secret. Only needs cross-field validation that a `google` provider credential exists. +- **Pro:** Demonstrates all three provider categories, making it easy to add more providers later. + +**Decision:** Brave + Tavily + DuckDuckGo + Gemini. Covers standalone APIs, key-free, and LLM-as-search categories. Additional providers (Exa, Firecrawl, Perplexity, Grok) are trivial to add later — one table entry each for standalone APIs, or the same config-only pattern as Gemini for LLM-as-search. + +_Considered and rejected: Option A — Brave + DuckDuckGo only (misses Tavily and Gemini), Option D — all 12 providers (unnecessary scope, many are niche)_ + +--- + +## Q3: Should search domains get proxy passthrough or credential injection? + +The MITM proxy controls which domains the gateway can reach. When adding a search provider's domain, we choose between a simple passthrough (no credential manipulation) and a credential injection route. + +This follows directly from Q1. Since we chose proxy credential injection for API keys, the proxy must do credential injection (not just passthrough) for API-keyed providers. + +### Option B: Credential injection on proxy + +The proxy injects the search API key into outbound requests to the search domain. Per-provider behavior: + +- **Brave:** `api_key` injector with `header: "X-Subscription-Token"` on `api.search.brave.com` +- **Tavily:** `bearer` injector on `api.tavily.com` +- **DuckDuckGo:** `none` injector (passthrough) on `html.duckduckgo.com` — key-free +- **Gemini:** No new route — traffic already covered by existing `google` provider credential on `.googleapis.com` + +- **Pro:** Secret stays off gateway. Consistent with Q1 decision. +- **Pro:** Source analysis confirms all target providers use header-based auth the proxy already supports. +- **Pro:** Proxy acts as both L7 domain gate and credential injector — single enforcement point. + +**Decision:** Option B — credential injection for API-keyed providers (Brave, Tavily), passthrough for key-free (DuckDuckGo), no new route for LLM-as-search (Gemini). Direct consequence of Q1 decision. + +_Considered and rejected: Option A — passthrough only (would require putting the secret on the gateway as an env var, contradicts Q1 decision)_ + +--- + +## Q4: Single web search provider or allow a list? + +OpenClaw supports configuring one `tools.web.search.provider` at a time (with auto-detection fallback when no provider is explicitly set). Should the operator mirror this or allow multiple? + +### Option A: Single provider (`spec.webSearch` is a struct) + +```yaml +spec: + webSearch: + provider: brave + secretRef: ... +``` + +- **Pro:** Matches OpenClaw's runtime behavior — only one provider is active at a time. +- **Pro:** Simpler CRD, simpler validation, simpler implementation. +- **Pro:** Clear intent — the user knows exactly which provider will be used. +- **Con:** Can't configure a fallback provider (e.g., Brave primary, DuckDuckGo fallback). OpenClaw's auto-detect fallback chain only works when `provider` is not explicitly set. + +**Decision:** Option A — single provider struct. Matches OpenClaw's one-active-provider model. Deterministic: the user declares a provider and that's what gets used. Simple CRD and validation. + +_Considered and rejected: Option B — ordered list (OpenClaw only accepts one provider value; operator-side fallback adds complexity for no runtime benefit), Option C — auto-detect fallback flag (over-engineering; auto-detect is OpenClaw's default when no provider is set, not something the operator should toggle)_ + +--- + +## Q5: Dedicated status condition or reuse existing? + +The operator uses status conditions to signal feature-specific health. Should web search get its own condition? + +### Option A: New `WebSearchConfigured` condition + +- **Pro:** Consistent with `CredentialsResolved`, `ProxyConfigured`, `McpServersConfigured`. Clear signal for web search issues. +- **Pro:** Condition is only present when `spec.webSearch` is set, reducing noise. + +**Decision:** Option A — new `WebSearchConfigured` condition. Follows the established pattern. Only present when the feature is configured, so no noise for users who don't use web search. Consistent with `McpServersConfigured`. + +_Considered and rejected: Option B — fold into CredentialsResolved (conflates LLM credential and search secret failures), Option C — no dedicated condition (no way to distinguish search issues from other problems)_ + +--- + +## Q6: Should the operator validate that LLM-as-search providers have a matching credential? + +When a user sets `provider: gemini`, the operator could check that a `google` provider credential exists in `spec.credentials`. Without it, Gemini search grounding won't work at runtime (no API key available). Validation requirements vary by provider category: + +- **API-keyed** (Brave, Tavily): validate `secretRef` is set and the referenced Secret exists +- **LLM-as-search** (Gemini): validate that the corresponding LLM provider credential exists in `spec.credentials` +- **Key-free** (DuckDuckGo): no validation needed — no secret, no dependency + +### Option A: Validate and fail + +Check requirements per category. Set `WebSearchConfigured=False` if missing. + +- **Pro:** Fails fast with a clear message. User doesn't have to debug why search silently doesn't work. +- **Pro:** The operator already validates credentials — this is a natural extension. +- **Pro:** Cross-reference is trivial: a static map (`gemini → google`, future `grok → xai`). + +**Decision:** Option A — validate and fail. Per-category validation: `secretRef` existence for API-keyed providers, LLM credential cross-reference for LLM-as-search providers, no validation for key-free providers. Fail-fast with `WebSearchConfigured=False` is consistent with how `resolveCredentials` works today. + +_Considered and rejected: Option B — validate and warn (soft failures are easy to miss), Option C — don't validate (silent runtime failure when Gemini has no google credential)_ + +--- + +## Q7: How should `web.fetch` be handled? + +OpenClaw has a separate `web_fetch` tool (lightweight URL fetching, distinct from `web_search`). It can optionally use Firecrawl as a provider. NemoClaw enables `web.fetch` alongside search. Should the operator configure `tools.web.fetch` as well? + +### Option B: Separate `spec.webFetch` field (designed and implemented now) + +Add a dedicated `spec.webFetch` field alongside `spec.webSearch`, keeping the two concerns cleanly separated in the CRD while implementing both in the same change. + +- **Pro:** Clean separation of concerns. Each has distinct security characteristics (known API endpoints vs. arbitrary URLs). +- **Pro:** Users can enable fetch without search and vice versa. +- **Pro:** Implementing alongside search reuses the same ConfigMap injection pattern and proxy infrastructure. + +**Decision:** Option B — separate `spec.webFetch` field, designed and implemented alongside web search. Clean CRD separation, but delivered in the same change. + +_Considered and rejected: Option A — enable fetch implicitly alongside search (conflates security profiles; users may want search without arbitrary URL access), Option C — boolean toggle on webSearch spec (same conflation problem)_ + +--- + +## Q8: What should `spec.webFetch` look like? + +`web_fetch` in OpenClaw allows agents to fetch arbitrary URLs. Unlike `web_search` (which calls a known API endpoint), `web_fetch` can target any URL the agent provides. This has significant security implications since all outbound traffic goes through the MITM proxy, which acts as an L7 domain allowlist. + +### Option A: Simple boolean toggle + +```yaml +spec: + webFetch: + enabled: true +``` + +The operator sets `tools.web.fetch.enabled: true` in `operator.json`. No new proxy routes — `web_fetch` uses OpenClaw's built-in HTTP client, which goes through the proxy. Requests to domains not in the proxy allowlist will be blocked (403). + +- **Pro:** Simplest possible design. One field. +- **Pro:** The proxy allowlist is the security gate — only domains already permitted (LLM providers, search APIs, builtin passthroughs) are fetchable. +- **Con:** Very limited usefulness — the agent can only fetch URLs on domains already allowed for other reasons. Can't fetch arbitrary documentation sites, GitHub issues, etc. + +**Decision:** Option A — simple boolean toggle. The proxy allowlist already controls reachable domains, and users can add passthrough domains via existing `spec.credentials` entries with `type: none`. Firecrawl support can be added later as a provider option. + +_Considered and rejected: Option B — Firecrawl provider (scope creep, can be added later), Option C — allowedDomains list (overlaps with existing `credentials` type: none pattern)_ + +--- + +## Q9: How should the placeholder API key work for proxy credential injection? + +With Q1 decided as proxy credential injection, the gateway needs *something* in its config that looks like a valid API key — otherwise OpenClaw won't attempt the search call at all. But the real key lives on the proxy, not the gateway. + +### Option A: Static placeholder string + +Set `plugins.entries..config.webSearch.apiKey` to a fixed placeholder like `"proxy-injected"`. The gateway sees a non-empty key, makes the HTTP call, and the proxy strips the placeholder and injects the real key. + +- **Pro:** Simple. Same pattern used for LLM provider `apiKey: "ah-ah-ah-you-didnt-say-the-magic-word"` in `injectProvidersIntoConfigMap`. +- **Pro:** Proxy already handles this — it replaces whatever auth header it finds with the real credential. +- **Con:** If OpenClaw ever validates key format (e.g., checks for a `BSA` prefix on Brave keys), the placeholder might fail client-side validation. + +**Decision:** Option A — static placeholder string `"ah-ah-ah-you-didnt-say-the-magic-word"`, matching the existing LLM provider pattern. The proxy strips and replaces regardless of placeholder format. + +_Considered and rejected: Option B — provider-specific placeholder format (unnecessary; the proxy strips and replaces regardless of what the gateway sends)_ From f5fbf167ce23931cb9b61f653d10e9409f502ff9 Mon Sep 17 00:00:00 2001 From: Alexey Kazakov Date: Tue, 12 May 2026 16:53:03 -0700 Subject: [PATCH 2/2] docs: include skill and provider-setup in implementation plan - Add PLATFORM.md skill update (configmap.yaml) to files-to-change - Expand provider-setup.md entry with per-provider and web fetch details - Add explicit documentation steps to implementation sequence Signed-off-by: Alexey Kazakov Co-authored-by: Cursor --- docs/proposals/web-search-design.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/proposals/web-search-design.md b/docs/proposals/web-search-design.md index 65a4580..e24c63c 100644 --- a/docs/proposals/web-search-design.md +++ b/docs/proposals/web-search-design.md @@ -355,7 +355,8 @@ spec: | `internal/controller/claw_proxy.go` | Extend `generateProxyConfig` to accept `*WebSearchSpec` and emit search domain routes; extend `stampSecretVersionAnnotation` to also stamp `spec.webSearch.secretRef` | | `internal/controller/claw_resource_controller.go` | Wire validation into reconcile loop, call `configureProxyForWebSearch` in `configureDeployments`, call `injectWebSearchIntoConfigMap` in `enrichConfigAndNetworkPolicy`, pass `webSearch` to `generateProxyConfig` | | New: `internal/controller/claw_web_search_test.go` | Tests for validation, ConfigMap injection, proxy route generation, proxy deployment mounting | -| `docs/provider-setup.md` | User-facing documentation for web search and web fetch setup | +| `internal/assets/manifests/claw/configmap.yaml` | Add "Web Search & Web Fetch" section to `PLATFORM.md` skill (how it works, operator-managed config, what NOT to do); update skill `description` frontmatter to mention web search | +| `docs/provider-setup.md` | New "Web Search" section with per-provider setup (Brave, Tavily, DuckDuckGo, Gemini) and "Web Fetch" section | ### Steps @@ -365,7 +366,8 @@ spec: 4. Extend `stampSecretVersionAnnotation` to stamp the web search secret's `ResourceVersion` 5. Wire into reconciler: validation → proxy config (with web search) → deployment mounting → secret stamping → ConfigMap injection → condition 6. Tests (validation, ConfigMap injection, proxy routes, proxy deployment mounting, secret version stamping) -7. Documentation +7. Update `PLATFORM.md` skill in `configmap.yaml` with web search/fetch section (operator-managed config, provider categories, what NOT to do) +8. Update `docs/provider-setup.md` with per-provider setup guides and web fetch section ## Future Considerations