Web Search Pro

web-search-pro is an OpenClaw search skill and local Node retrieval runtime for agents. It can search the live web, fetch and extract pages, crawl sites, map docs, and assemble research-ready evidence packs with explainable routing.

ClawHub: web-search-pro
GitHub: Zjianru/web-search-pro
OpenClaw archive: openclaw/skills/tree/main/skills/zjianru/web-search-pro

What It Is

This project sits between a lightweight web-search skill and a full hosted scraping product.

The simplest mental model is:

a search skill for agents
a local retrieval runtime
a bridge from search to extract, crawl, map, and research

Use it when search is only the beginning of the task and the agent may need to keep collecting, structuring, and handing off evidence.

Choose This If

Choose web-search-pro if you need:

live web search and current-events lookup
news search with explainable routing and visible fallback behavior
official docs, API docs, and code lookup
company, product, and competitor research
site crawl, site map, and docs discovery
a no-key baseline first, then premium providers only when needed
structured outputs that an upstream model can keep using

In short: this is for developers who want one skill to cover search, retrieval, and evidence prep.

Do Not Choose This If

Do not choose web-search-pro if you primarily want:

the lightest possible single-purpose web search wrapper
a hosted remote scraping SaaS
a browser-first crawler by default
a narrative report writer that hides retrieval details
an unlimited no-key search guarantee

If all you need is lightweight one-shot search, a smaller skill will usually be a better fit.

Why Developers Pick It

Compared with a plain search skill, the differentiators are:

Explainable routing routingSummary exposes why a provider was selected, how confident the planner is, and what the top signals were.
Visible federated gains federated.value shows what multi-provider fanout actually recovered, corroborated, or deduped.
Search-to-research chain One surface can move from search into extract, crawl, map, and research.
No-key baseline You can evaluate the skill without buying into a provider stack first.
Agent-readable diagnostics doctor.mjs, bootstrap.mjs, capabilities.mjs, and review.mjs expose runtime state and boundaries instead of leaving the model to guess.

Quick Start

There are two honest ways to approach this project:

Install and use it as a skill Start from the ClawHub page or the OpenClaw archive entry if you want to use it inside OpenClaw.
Run it from source Use the commands below if you want to evaluate the local runtime directly, inspect outputs, or contribute to the repo.

The shortest successful path is:

start with the no-key baseline
add one premium provider only when you need stronger recall or fresher results
then try docs, news, and research flows

Option A: No-key baseline

No API key is required for the first successful run.

The baseline is:

ddg for best-effort web search
fetch for extract, crawl, and map fallback

node scripts/doctor.mjs --json
node scripts/bootstrap.mjs --json
node scripts/search.mjs "OpenAI Responses API docs" --json

What these commands tell you:

doctor.mjs tells you whether the runtime is usable right now
bootstrap.mjs gives an agent-readable runtime snapshot
search.mjs proves the baseline path works before you add premium providers

Option B: Add one premium provider

If you only add one premium provider, start with TAVILY_API_KEY.

That is the shortest upgrade path because one credential improves:

general web search
news search
extract quality

export TAVILY_API_KEY=tvly-xxxxx
node scripts/doctor.mjs --json
node scripts/search.mjs "latest OpenAI news" --type news --json

First successful searches

node scripts/search.mjs "OpenClaw web search" --json
node scripts/search.mjs "OpenAI Responses API docs" --preset docs --plan --json
node scripts/extract.mjs "https://platform.openai.com/docs" --json

Then try docs, news, and research

node scripts/search.mjs "OpenAI Responses API docs" --preset docs --json
node scripts/search.mjs "latest OpenAI news" --type news --json
node scripts/research.mjs "OpenClaw search skill landscape" --plan --json

Core Commands

Command	Purpose
`search.mjs`	Multi-provider search with explainable routing
`extract.mjs`	Single-page readable extraction
`render.mjs`	Forced browser-backed extraction through the local render lane
`crawl.mjs`	Safe BFS crawl
`map.mjs`	Site-structure discovery
`research.mjs`	Structured `plan + evidence pack` generation
`doctor.mjs`	Runtime diagnostics
`bootstrap.mjs`	Agent-readable runtime bootstrap contract
`capabilities.mjs`	Provider capability snapshot
`review.mjs`	Review and moderation summary
`cache.mjs`	Cache inspection
`health.mjs`	Provider health inspection
`eval.mjs`	Retrieval and benchmark harness

Why Federated Search Matters

Federation is not just "more providers". It makes multi-provider value visible with compact, machine-readable gain metrics.

Key fields:

federated.providersUsed Providers that actually returned results.
federated.value.additionalProvidersUsed How many non-primary providers really contributed.
federated.value.resultsRecoveredByFanout Final results that would disappear in primary-only mode.
federated.value.resultsCorroboratedByFanout Final results supported by both the primary and at least one fanout provider.
federated.value.duplicateSavings Exact or near-duplicate results removed by merge.
routingSummary.federation.value The compact federation gain summary exposed alongside route explanation.

Example:

{
  "federated": {
    "providersUsed": ["serper", "tavily"],
    "value": {
      "additionalProvidersUsed": 1,
      "resultsWithFanoutSupport": 2,
      "resultsRecoveredByFanout": 1,
      "resultsCorroboratedByFanout": 1,
      "duplicateSavings": 1,
      "answerProvider": "tavily",
      "primarySucceeded": true
    }
  }
}

Interpretation:

resultsRecoveredByFanout=1 means federation recovered one final result that primary-only search would have missed
resultsCorroboratedByFanout=1 means another final result got multi-provider support
duplicateSavings=1 means the merge removed one duplicate instead of wasting result slots

Routing And Output Contract

The router combines five layers of truth:

provider capability facts
structured query signals
runtime policy from config.json
local health state
optional federation

Important fields:

selectedProvider The primary route. It is not the same thing as "the only provider used".
routingSummary Compact route explanation with selectionMode, confidence, topSignals, alternatives, and federation context.
routing.diagnostics Full route diagnostics exposed by --explain-routing or --plan.
federated.providersUsed The provider set that actually returned results when fanout is active.
federated.value Compact federation gain summary: added providers, recovered results, corroborated results, and duplicate savings.
cached / cache Cache hit plus age / TTL telemetry for agents.
renderLane Runtime availability and policy summary for the browser-backed render lane.
meta.searchType User-facing result surface selector. Current shipped values are web | news.
meta.intentPreset User-facing intent preset. Current shipped values are general | code | company | docs | research.

These are product-layer inputs, not provider ids.

Providers And Upgrade Paths

No API key is required for the baseline. Optional provider credentials or endpoints unlock stronger coverage.

TAVILY_API_KEY=tvly-xxxxx
EXA_API_KEY=exa-xxxxx
QUERIT_API_KEY=xxxxx
SERPER_API_KEY=xxxxx
BRAVE_API_KEY=xxxxx
SERPAPI_API_KEY=xxxxx
YOU_API_KEY=xxxxx
SEARXNG_INSTANCE_URL=https://searx.example.com

# Perplexity / Sonar: choose one transport path
PERPLEXITY_API_KEY=xxxxx
OPENROUTER_API_KEY=xxxxx
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
KILOCODE_API_KEY=xxxxx

# Or use a custom OpenAI-compatible gateway
PERPLEXITY_GATEWAY_API_KEY=xxxxx
PERPLEXITY_BASE_URL=https://gateway.example.com/v1
PERPLEXITY_MODEL=perplexity/sonar-pro

Provider roles:

Tavily: strongest premium default for general search, news, and extract
Exa: semantic retrieval and extract fallback
Querit: multilingual AI search with native geo and language filters
Serper: Google-like search with strong news and locale coverage
Brave: structured general web search
SerpAPI: multi-engine routing including Baidu and Yandex
You.com: LLM-ready web search with freshness and mixed web/news coverage
SearXNG: self-hosted privacy-first metasearch fallback
Perplexity / Sonar: answer-first grounded search via native or gateway transport
DDG: best-effort no-key baseline search
Fetch: no-key extract / crawl / map baseline
Render: optional local browser lane

Distribution Surfaces

The project has two honest distribution surfaces:

GitHub / local source tree The full surface, including render.mjs, eval.mjs, tests, and the deeper research toolchain.
ClawHub publish package A generated core profile built by scripts/build-clawhub-package.mjs.

Why this split exists:

local developers need the full runtime and benchmark surface
ClawHub moderation benefits from a narrower, more honest publish boundary
the registry package is still a code-backed Node runtime, not an instruction-only bundle

Detailed notes:

Boundaries

web-search-pro is strong at:

capability-aware retrieval
explainable routing
safe extract / crawl / map
structured research packs
local diagnostics and review surfaces

It is intentionally not:

a hosted remote scraping service
a final report writer
a browser-first crawler by default
an unlimited no-key search guarantee

Positioning vs Other Skills

vs lightweight web-search skills web-search-pro is heavier, but it gives you explainable routing, federation visibility, and a path into extract, crawl, map, and research.
vs search-router-first skills such as web-search-plus web-search-pro is broader as a retrieval stack. The differentiator is not only where to search, but what happens after search.
vs hosted scrape-first products web-search-pro stays local-first, more inspectable, and more explicit about safety boundaries and runtime behavior.

Safety

Safe fetch:

allows only http / https
blocks credential-bearing URLs
blocks localhost, private, and metadata targets
revalidates redirects
keeps JavaScript disabled

Browser render:

is off by default
uses a local headless browser only when enabled
revalidates navigations
can enforce same-origin-only navigation

Challenge and anti-bot interstitial pages are reported as failures, not silent successes.

Discovery Keywords

web search, news search, latest updates, current events, docs search, API docs, code search, company research, competitor analysis, site crawl, site map, multilingual search, Baidu search, Google-like search, answer-first search, cited answers, explainable routing, no-key baseline

Versioning

Product / docs version: 2.1
JSON schema version: 1.0

2.x is the retrieval-stack product line. Machine-readable payloads remain additive and compatible, so schema stays 1.0.

Docs

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
docs		docs
eval/cases		eval/cases
scripts		scripts
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Search Pro

What It Is

Choose This If

Do Not Choose This If

Why Developers Pick It

Quick Start

Option A: No-key baseline

Option B: Add one premium provider

First successful searches

Then try docs, news, and research

Core Commands

Why Federated Search Matters

Routing And Output Contract

Providers And Upgrade Paths

Distribution Surfaces

Boundaries

Positioning vs Other Skills

Safety

Discovery Keywords

Versioning

Docs

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Web Search Pro

What It Is

Choose This If

Do Not Choose This If

Why Developers Pick It

Quick Start

Option A: No-key baseline

Option B: Add one premium provider

First successful searches

Then try docs, news, and research

Core Commands

Why Federated Search Matters

Routing And Output Contract

Providers And Upgrade Paths

Distribution Surfaces

Boundaries

Positioning vs Other Skills

Safety

Discovery Keywords

Versioning

Docs

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors 1

Languages

Packages