web-search-pro is an OpenClaw search skill and local Node retrieval runtime for agents. It can
search the live web, fetch and extract pages, crawl sites, map docs, and assemble research-ready
evidence packs with explainable routing.
- ClawHub: web-search-pro
- GitHub: Zjianru/web-search-pro
- OpenClaw archive: openclaw/skills/tree/main/skills/zjianru/web-search-pro
This project sits between a lightweight web-search skill and a full hosted scraping product.
The simplest mental model is:
- a search skill for agents
- a local retrieval runtime
- a bridge from
searchtoextract,crawl,map, andresearch
Use it when search is only the beginning of the task and the agent may need to keep collecting, structuring, and handing off evidence.
Choose web-search-pro if you need:
- live web search and current-events lookup
- news search with explainable routing and visible fallback behavior
- official docs, API docs, and code lookup
- company, product, and competitor research
- site crawl, site map, and docs discovery
- a no-key baseline first, then premium providers only when needed
- structured outputs that an upstream model can keep using
In short: this is for developers who want one skill to cover search, retrieval, and evidence prep.
Do not choose web-search-pro if you primarily want:
- the lightest possible single-purpose web search wrapper
- a hosted remote scraping SaaS
- a browser-first crawler by default
- a narrative report writer that hides retrieval details
- an unlimited no-key search guarantee
If all you need is lightweight one-shot search, a smaller skill will usually be a better fit.
Compared with a plain search skill, the differentiators are:
- Explainable routing
routingSummaryexposes why a provider was selected, how confident the planner is, and what the top signals were. - Visible federated gains
federated.valueshows what multi-provider fanout actually recovered, corroborated, or deduped. - Search-to-research chain
One surface can move from
searchintoextract,crawl,map, andresearch. - No-key baseline You can evaluate the skill without buying into a provider stack first.
- Agent-readable diagnostics
doctor.mjs,bootstrap.mjs,capabilities.mjs, andreview.mjsexpose runtime state and boundaries instead of leaving the model to guess.
There are two honest ways to approach this project:
- Install and use it as a skill Start from the ClawHub page or the OpenClaw archive entry if you want to use it inside OpenClaw.
- Run it from source Use the commands below if you want to evaluate the local runtime directly, inspect outputs, or contribute to the repo.
The shortest successful path is:
- start with the no-key baseline
- add one premium provider only when you need stronger recall or fresher results
- then try docs, news, and research flows
No API key is required for the first successful run.
The baseline is:
ddgfor best-effort web searchfetchfor extract, crawl, and map fallback
node scripts/doctor.mjs --json
node scripts/bootstrap.mjs --json
node scripts/search.mjs "OpenAI Responses API docs" --jsonWhat these commands tell you:
doctor.mjstells you whether the runtime is usable right nowbootstrap.mjsgives an agent-readable runtime snapshotsearch.mjsproves the baseline path works before you add premium providers
If you only add one premium provider, start with TAVILY_API_KEY.
That is the shortest upgrade path because one credential improves:
- general web search
- news search
- extract quality
export TAVILY_API_KEY=tvly-xxxxx
node scripts/doctor.mjs --json
node scripts/search.mjs "latest OpenAI news" --type news --jsonnode scripts/search.mjs "OpenClaw web search" --json
node scripts/search.mjs "OpenAI Responses API docs" --preset docs --plan --json
node scripts/extract.mjs "https://platform.openai.com/docs" --jsonnode scripts/search.mjs "OpenAI Responses API docs" --preset docs --json
node scripts/search.mjs "latest OpenAI news" --type news --json
node scripts/research.mjs "OpenClaw search skill landscape" --plan --json| Command | Purpose |
|---|---|
search.mjs |
Multi-provider search with explainable routing |
extract.mjs |
Single-page readable extraction |
render.mjs |
Forced browser-backed extraction through the local render lane |
crawl.mjs |
Safe BFS crawl |
map.mjs |
Site-structure discovery |
research.mjs |
Structured plan + evidence pack generation |
doctor.mjs |
Runtime diagnostics |
bootstrap.mjs |
Agent-readable runtime bootstrap contract |
capabilities.mjs |
Provider capability snapshot |
review.mjs |
Review and moderation summary |
cache.mjs |
Cache inspection |
health.mjs |
Provider health inspection |
eval.mjs |
Retrieval and benchmark harness |
Federation is not just "more providers". It makes multi-provider value visible with compact, machine-readable gain metrics.
Key fields:
federated.providersUsedProviders that actually returned results.federated.value.additionalProvidersUsedHow many non-primary providers really contributed.federated.value.resultsRecoveredByFanoutFinal results that would disappear in primary-only mode.federated.value.resultsCorroboratedByFanoutFinal results supported by both the primary and at least one fanout provider.federated.value.duplicateSavingsExact or near-duplicate results removed by merge.routingSummary.federation.valueThe compact federation gain summary exposed alongside route explanation.
Example:
{
"federated": {
"providersUsed": ["serper", "tavily"],
"value": {
"additionalProvidersUsed": 1,
"resultsWithFanoutSupport": 2,
"resultsRecoveredByFanout": 1,
"resultsCorroboratedByFanout": 1,
"duplicateSavings": 1,
"answerProvider": "tavily",
"primarySucceeded": true
}
}
}Interpretation:
resultsRecoveredByFanout=1means federation recovered one final result that primary-only search would have missedresultsCorroboratedByFanout=1means another final result got multi-provider supportduplicateSavings=1means the merge removed one duplicate instead of wasting result slots
The router combines five layers of truth:
- provider capability facts
- structured query signals
- runtime policy from
config.json - local health state
- optional federation
Important fields:
selectedProviderThe primary route. It is not the same thing as "the only provider used".routingSummaryCompact route explanation withselectionMode,confidence,topSignals, alternatives, and federation context.routing.diagnosticsFull route diagnostics exposed by--explain-routingor--plan.federated.providersUsedThe provider set that actually returned results when fanout is active.federated.valueCompact federation gain summary: added providers, recovered results, corroborated results, and duplicate savings.cached/cacheCache hit plus age / TTL telemetry for agents.renderLaneRuntime availability and policy summary for the browser-backed render lane.meta.searchTypeUser-facing result surface selector. Current shipped values areweb | news.meta.intentPresetUser-facing intent preset. Current shipped values aregeneral | code | company | docs | research.
These are product-layer inputs, not provider ids.
No API key is required for the baseline. Optional provider credentials or endpoints unlock stronger coverage.
TAVILY_API_KEY=tvly-xxxxx
EXA_API_KEY=exa-xxxxx
QUERIT_API_KEY=xxxxx
SERPER_API_KEY=xxxxx
BRAVE_API_KEY=xxxxx
SERPAPI_API_KEY=xxxxx
YOU_API_KEY=xxxxx
SEARXNG_INSTANCE_URL=https://searx.example.com
# Perplexity / Sonar: choose one transport path
PERPLEXITY_API_KEY=xxxxx
OPENROUTER_API_KEY=xxxxx
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
KILOCODE_API_KEY=xxxxx
# Or use a custom OpenAI-compatible gateway
PERPLEXITY_GATEWAY_API_KEY=xxxxx
PERPLEXITY_BASE_URL=https://gateway.example.com/v1
PERPLEXITY_MODEL=perplexity/sonar-proProvider roles:
- Tavily: strongest premium default for general search, news, and extract
- Exa: semantic retrieval and extract fallback
- Querit: multilingual AI search with native geo and language filters
- Serper: Google-like search with strong news and locale coverage
- Brave: structured general web search
- SerpAPI: multi-engine routing including Baidu and Yandex
- You.com: LLM-ready web search with freshness and mixed web/news coverage
- SearXNG: self-hosted privacy-first metasearch fallback
- Perplexity / Sonar: answer-first grounded search via native or gateway transport
- DDG: best-effort no-key baseline search
- Fetch: no-key extract / crawl / map baseline
- Render: optional local browser lane
The project has two honest distribution surfaces:
- GitHub / local source tree
The full surface, including
render.mjs,eval.mjs, tests, and the deeper research toolchain. - ClawHub publish package
A generated core profile built by
scripts/build-clawhub-package.mjs.
Why this split exists:
- local developers need the full runtime and benchmark surface
- ClawHub moderation benefits from a narrower, more honest publish boundary
- the registry package is still a code-backed Node runtime, not an instruction-only bundle
Detailed notes:
web-search-pro is strong at:
- capability-aware retrieval
- explainable routing
- safe extract / crawl / map
- structured research packs
- local diagnostics and review surfaces
It is intentionally not:
- a hosted remote scraping service
- a final report writer
- a browser-first crawler by default
- an unlimited no-key search guarantee
- vs lightweight
web-searchskillsweb-search-prois heavier, but it gives you explainable routing, federation visibility, and a path intoextract,crawl,map, andresearch. - vs search-router-first skills such as
web-search-plusweb-search-prois broader as a retrieval stack. The differentiator is not only where to search, but what happens after search. - vs hosted scrape-first products
web-search-prostays local-first, more inspectable, and more explicit about safety boundaries and runtime behavior.
Safe fetch:
- allows only
http/https - blocks credential-bearing URLs
- blocks localhost, private, and metadata targets
- revalidates redirects
- keeps JavaScript disabled
Browser render:
- is off by default
- uses a local headless browser only when enabled
- revalidates navigations
- can enforce same-origin-only navigation
Challenge and anti-bot interstitial pages are reported as failures, not silent successes.
web search, news search, latest updates, current events, docs search, API docs,
code search, company research, competitor analysis, site crawl, site map,
multilingual search, Baidu search, Google-like search, answer-first search,
cited answers, explainable routing, no-key baseline
- Product / docs version:
2.1 - JSON schema version:
1.0
2.x is the retrieval-stack product line. Machine-readable payloads remain additive and
compatible, so schema stays 1.0.
- README_zh.md
- CHANGELOG.md
- docs/search-routing-model.md
- docs/search-ux-model.md
- docs/research-layer.md
- docs/head-to-head-eval.md
- docs/clawhub-package.md
- docs/clawhub-compliance.md
- docs/marketing-launch-kit.md
- docs/agent-contract-p0.md
MIT