Skip to content

Add search tool backed by the browser-use search API#67

Open
reformedot wants to merge 7 commits into
mainfrom
add-duckduckgo-search-tool
Open

Add search tool backed by the browser-use search API#67
reformedot wants to merge 7 commits into
mainfrom
add-duckduckgo-search-tool

Conversation

@reformedot
Copy link
Copy Markdown
Contributor

@reformedot reformedot commented Jun 5, 2026

What

Adds a client-executed search tool to the async agent engine, backed by the browser-use search API (search.browser-use.com — a thin proxy in front of Parallel's Search API with browser-use auth + billing).

History: the tool was first ported from the Python DuckDuckGo Lite action, then swapped to the browser-use search API in this same PR (contract verified against the search service source).

How it differs from the existing web_search

web_search (existing) search (this PR)
Execution Hosted — model provider runs it server-side Client — the agent POSTs to the browser-use search API
Needs a capable model provider Yes No (works against any provider)

API integration

  • POST {base}/search with {"query": …} and the X-Browser-Use-API-Key header — key read from the workspace's existing BROWSER_USE_API_KEY (fails fast with an actionable message when unset).
  • Base URL overridable via BROWSER_USE_SEARCH_URL (e.g. a local dev instance, which runs as an open proxy — keyless requests are allowed there).
  • 200 → {"results":[{title?, url, published_date?, content}]}: multi-line markdown content is whitespace-normalized; untitled results fall back to their URL; url-less results dropped; publication date appended to the title line.
  • Errors mapped per the service's table (401 invalid key, 402 insufficient balance, 400/422/502/503 with a body snippet) and surfaced to the model as soft errors (Search failed: …).
  • Output stays token-efficient: titles capped at 30 chars, descriptions at 125 (ellipsis within the cap); URLs kept intact.
  • Serial scheduling (SEARCH_PARALLEL_SAFE = false) — preserves the scheduling tuning from 962a2bf/af4111c as a conservative default for a billed API call.

Architecture

  • Handler follows the same trait stack (Approvable + Sandboxable + ToolRuntime) and conventions as the sibling tools; HTTP behind a SearchBackend seam (real reqwest impl + fakes in tests). No new dependencies.
  • Registered in both default_registry and the production dispatcher (build_tool_dispatcher_with_cwd_and_goal_store), with a dispatcher membership test guarding the production tool set.

Tests

Deterministic suite (fixture JSON + fake backends, no network): response parsing (optional fields, url-less filtering, markdown normalization), status classification (401/402/4xx/5xx + 399/400 boundary), formatting (dates, URL fallback, truncation caps), soft-error surfacing, orchestrator/registry/dispatcher wiring, and the model-guidance description assertions. Plus an #[ignore]d live smoke (search_live_smoke) — verified end-to-end against a running search service instance with real Parallel results.

Verification

  • cargo fmt --check ✓ · clippy: no new warnings ✓ · cargo test: 985 passed ✓ (2 pre-existing PTY shell_tests failures, identical on clean main) · uv run pytest
  • Live: BROWSER_USE_SEARCH_URL=http://localhost:8080 cargo test -p browser-use-agent --lib -- --ignored --nocapture search_live_smoke → 10 real results, correctly formatted.

Note: search.browser-use.com DNS does not resolve yet — until it's live, point the tool at an instance via BROWSER_USE_SEARCH_URL.

🤖 Generated with Claude Code

Port the Python `search` action (DuckDuckGo Lite HTTP search) into the
async agent engine as a new locally-dispatched `search` tool. Only the
search logic is carried over — the `request_human_control` action and the
Controller/DB/session scaffolding are dropped per "keep the logic only".

Unlike the existing hosted `web_search` (provider-executed, no local I/O),
this tool performs a real HTTP GET against `lite.duckduckgo.com/lite/` and
parses the result HTML itself, so it works against any provider.

Implementation notes:
- New handler `tools/handlers/search.rs` follows the same trait stack
  (Approvable + Sandboxable + ToolRuntime) as the sibling tools, with the
  HTTP fetch behind a `SearchBackend` seam (real reqwest impl + fake for
  tests), mirroring the browser/python/mcp backend-injection pattern.
- No new dependencies: the repo deliberately avoids HTML-parser deps
  (browser DOM comes from CDP), so parsing uses targeted `regex` over the
  fixed DuckDuckGo Lite markup plus a small hand-rolled percent-decoder and
  entity decoder. Faithful to the original BeautifulSoup logic.
- Registered as `search` in both `default_registry` and the production
  dispatcher (`build_tool_dispatcher_with_cwd_and_goal_store`) so the live
  model can actually call it; parallel-safe (read-only).
- Tests are fully deterministic (fixture HTML + fake backend, no network):
  parsing, URL unwrapping, entity/whitespace handling, response
  classification, formatting, and orchestrator/registry/dispatcher wiring.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 6 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/browser-use-agent/src/tools/registry.rs">

<violation number="1" location="crates/browser-use-agent/src/tools/registry.rs:1159">
P3: Broken intra-doc link: `[`web_search`](definitions::web_search)` in `search()`'s doc comment references `definitions::web_search` from within the same `definitions` module, resolving to a non-existent path. Should be `[`web_search`]` (same module) or use the full crate path.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread crates/browser-use-agent/src/tools/handlers/search.rs Outdated
}

/// `search`: a LOCALLY-executed DuckDuckGo (Lite) web search. Unlike the
/// hosted [`web_search`](definitions::web_search), the client performs the
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Broken intra-doc link: [web_search](definitions::web_search) in search()'s doc comment references definitions::web_search from within the same definitions module, resolving to a non-existent path. Should be [web_search] (same module) or use the full crate path.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/browser-use-agent/src/tools/registry.rs, line 1159:

<comment>Broken intra-doc link: `[`web_search`](definitions::web_search)` in `search()`'s doc comment references `definitions::web_search` from within the same `definitions` module, resolving to a non-existent path. Should be `[`web_search`]` (same module) or use the full crate path.</comment>

<file context>
@@ -1155,6 +1155,34 @@ to the single frame that proves the task succeeded."
     }
 
+    /// `search`: a LOCALLY-executed DuckDuckGo (Lite) web search. Unlike the
+    /// hosted [`web_search`](definitions::web_search), the client performs the
+    /// HTTP request itself and returns the parsed results as text. Ported from
+    /// the Python `search` action's description.
</file context>
Suggested change
/// hosted [`web_search`](definitions::web_search), the client performs the
/// hosted [`web_search`], the client performs the
Fix with cubic

reformedot and others added 5 commits June 4, 2026 18:15
A network-dependent end-to-end check against the real DuckDuckGo Lite
endpoint via the default HttpSearchBackend. Ignored by default (so CI and
`cargo test` stay deterministic and offline); run manually with:

  cargo test -p browser-use-agent --lib -- --ignored --nocapture search_live_smoke

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iciency

The formatted model-facing output now trims each result's title to 15 chars
and description to 100 chars (ellipsis counted within the cap, on a Unicode
char boundary); destination URLs are kept intact so they stay usable.
Truncation is applied at the display layer (`format_results`), so
`SearchResult` still carries full data for any other consumer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tune the formatted-output truncation limits: titles 15 -> 30 chars,
descriptions 100 -> 125 chars (ellipsis still counted within the cap).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gregpr07 gregpr07 changed the title Add locally-executed DuckDuckGo search tool DO NOT MERGE: Add locally-executed DuckDuckGo search tool Jun 5, 2026
@gregpr07
Copy link
Copy Markdown
Member

gregpr07 commented Jun 5, 2026

DO NOT MERGE. This does not work well enough in practice and should not be merged in its current form.

@gregpr07 gregpr07 changed the title DO NOT MERGE: Add locally-executed DuckDuckGo search tool Add locally-executed DuckDuckGo search tool Jun 5, 2026
The `search` tool now POSTs the query to search.browser-use.com — a thin
proxy in front of Parallel's Search API with browser-use auth + billing —
instead of scraping DuckDuckGo Lite HTML. Contract verified against the
search service source (documents/browser-use/search):

- POST {base}/search with {"query"} and the `X-Browser-Use-API-Key` header
  (key read from BROWSER_USE_API_KEY, the workspace's existing browser-use
  cloud auth variable; fails fast with an actionable message when unset).
- Base URL overridable via BROWSER_USE_SEARCH_URL (e.g. a local dev
  instance, which runs as an open proxy without auth — keyless requests
  are allowed through there).
- 200 -> {"results":[{title?, url, published_date?, content}]}; the
  multi-line markdown content is whitespace-normalized; untitled results
  fall back to their URL; url-less results are dropped; the publication
  date is appended to the title line when known.
- Errors mapped per the service's table: 401 invalid key, 402 insufficient
  balance, other >=400 carried with a 200-char body snippet — all surfaced
  to the model as soft errors ("Search failed: ...").

All the DuckDuckGo HTML-parsing machinery (regex extraction, entity
decoding, redirect unwrapping, percent decoding) is gone; the title/
description truncation (30/125) and output layout are unchanged. Tests
rewritten against fixture JSON; live smoke now targets the real service
(verified end-to-end against a local instance).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@reformedot reformedot changed the title Add locally-executed DuckDuckGo search tool Add search tool backed by the browser-use search API Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants