A CLI Web Search utility designed to be easily used by Large Language Models (LLMs) like Claude Code, as well as human users. It supports structured outputs (JSON) for agents and readable outputs (text) for humans.
- Brave Search (via Brave Search API): Best for getting a list of fast, accurate links and snippets. Supports pagination (
--offset), safesearch, and time-based filtering. - Perplexity (via OpenRouter): Best for getting an intelligent, synthesized answer using online sources. Supports model selection, customizable temperature, and citation formatting.
- LLM Context (via Brave LLM Context API): Returns pre-extracted, relevance-scored web content (smart chunks) optimized for LLM consumption. Extracts text, tables, code blocks, and structured data from multiple sources in a single API call β no scraping needed. Ideal for RAG pipelines and AI agent grounding.
- Both (Concurrency): Runs both Brave and Perplexity searches in parallel, returning a merged outcome (a synthesized answer alongside raw source links).
- Fetch: A built-in web scraper that downloads a given URL, parses it, and returns the cleaned text without HTML tags. Perfect for reading full articles when a snippet isn't enough. Uses curl_cffi for Chrome TLS fingerprint impersonation to access strict anti-bot sites (Facebook, LinkedIn, Medium, etc.), with full Chrome 146 headers and a Google Referer. Includes automatic FlareSolverr fallback for Cloudflare-protected pages and SPA shell detection that identifies JS-heavy pages (empty mount points, script-heavy HTML with little text) and auto-falls back to headless rendering. HTML extraction now prefers
main/article/role="main"content when present to reduce layout noise. Non-HTML text responses are decoded directly, and supported binary documents (PDF,DOCX,PPTX,XLSX, etc.) can be converted to Markdown via optional MarkItDown integration. Twitter/X URLs are automatically intercepted and routed through the fxtwitter API to retrieve tweet content, author info, and engagement metrics without login.
Search-style engines also normalize their output for downstream agents:
- Brave results include
hostname, strip inline HTML tags, decode HTML entities, and deduplicate repeated URLs. - LLM Context results include
hostnameandagefrom Brave'ssourcespayload when available, plus cleaned snippet text. - Perplexity responses preserve normalized
citationswhen the upstream model returns them. bothpreserves partial-failure visibility throughbrave_errororperplexity_errorfields when one backend fails, and forwardsperplexity_citationswhen available.- Search results also carry stable positional metadata such as
rank,result_count, andbrave_result_count/source_countwhere relevant.
- Clone the repository:
git clone https://github.com/jamie950315/ccsearch.git cd ccsearch - Install Python dependencies:
pip install -r requirements.txt
- Copy the example configuration:
Modify
cp config.ini.example config.ini
config.inito adjust rate limits, models, filtering, or retry logic. - Add it to your CLI
$PATHfor global use:(Ensuremkdir -p ~/.local/bin ln -sf $(pwd)/ccsearch.py ~/.local/bin/ccsearch
~/.local/binis in your environment's PATH so you can just runccsearchfrom anywhere) - Set your Environment Variables:
- For Brave Web Search:
export BRAVE_API_KEY="your_brave_api_key" - For LLM Context:
export BRAVE_SEARCH_API_KEY="your_brave_search_plan_key"(falls back toBRAVE_API_KEYif not set; note that the LLM Context API requires a key from Brave's Search plan, which is separate from the Pro plan) - For Perplexity:
export OPENROUTER_API_KEY="your_openrouter_api_key"
- For Brave Web Search:
- For richer binary document conversion in
fetch, install MarkItDown with the formats you care about:pip install 'markitdown[pdf,docx,pptx,xlsx]' - Without MarkItDown installed,
fetchstill works for HTML and plain-text responses, but supported binary documents return a clear error payload instead of low-quality extracted text.
# Brave Search (Text Output)
ccsearch "latest React documentation" -e brave --format text
# Brave Search (2nd page of results using offset)
ccsearch "latest React documentation" -e brave --format text --offset 1
# Perplexity Synthesis (Text Output)
ccsearch "What is the difference between Vue 3 and React 18?" -e perplexity --format text
# LLM Context (Pre-extracted smart chunks for grounding)
ccsearch "React hooks best practices" -e llm-context --format text
# Both Engines Concurrently (Merged Text Output)
ccsearch "What is the new React compiler?" -e both --format text
# Fetch a webpage's clean text
ccsearch "https://react.dev/blog/2025/10/07/react-compiler-1" -e fetch --format text
# Fetch a PDF (requires optional MarkItDown install for Markdown conversion)
ccsearch "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf" -e fetch --format json
# Fetch a tweet (auto-routed via fxtwitter API)
ccsearch "https://x.com/jack/status/20" -e fetch --format text
# Fetch a Twitter/X user profile
ccsearch "https://x.com/NASA" -e fetch --format text
# Run a mixed batch from JSON/JSONL with bounded concurrency
ccsearch --batch-file requests.json --batch-workers 4 --format json
# Keep only the top 3 results after host filtering/post-processing
ccsearch "OpenAI Responses API" -e brave --include-host developers.openai.com --limit 3 --format json
# Force FlareSolverr for a Cloudflare-protected page
ccsearch "https://some-cloudflare-site.com" -e fetch --format text --flaresolverr
# Inspect engine availability and current setup
ccsearch --list-engines --format json
ccsearch --doctor --format textCaches results by an exact hash of the query string. Subsequent identical queries return instantly without hitting the API.
# Cache the result for the default 10 minutes
ccsearch "React 19 release date" -e perplexity --cache
# Cache the result for a custom duration (e.g., 60 minutes)
ccsearch "React 19 release date" -e perplexity --cache --cache-ttl 60Cache files are stored in ~/.cache/ccsearch/ as JSON files keyed by MD5 hash of (query, engine, offset).
For the fetch engine, URLs are normalized before hashing so cache hits survive:
- tracking parameters such as
utm_*,fbclid,gclid, etc. - query parameter reordering
- fragment-only differences
- host casing and default port differences
For search-style engines, exact cache keys also normalize repeated whitespace so React hooks and React hooks reuse the same cache entry.
Extends exact caching with embedding-based similarity matching. If a semantically equivalent query was previously cached, the result is returned without a new API call β even if the wording differs.
Requires fastembed (pip install fastembed). Uses the BAAI/bge-small-en-v1.5 model (384-dim, ~40MB, runs entirely locally via ONNX).
# First search β result is cached and embedding is stored
ccsearch "Python asyncio event loop tutorial" -e brave --semantic-cache --cache-ttl 60
# Semantically similar query β returns the cached result (no API call)
ccsearch "Python asyncio event loop guide" -e brave --semantic-cache --cache-ttl 60
# Output includes: "_from_cache": true, "_semantic_similarity": 0.9434Adjusting the similarity threshold (default 0.9, range 0.0β1.0):
# Stricter: only very close paraphrases hit the cache
ccsearch "Python asyncio tutorial" -e brave --semantic-cache --semantic-threshold 0.95
# Looser: broader topic matching (useful for exploratory queries)
ccsearch "Python asyncio tutorial" -e brave --semantic-cache --semantic-threshold 0.85How it works:
- On a cache miss, the query is embedded and stored alongside the result in
~/.cache/ccsearch/semantic_index.json - On a subsequent query, the new embedding is compared against all stored embeddings using cosine similarity
- If the best match exceeds the threshold, the cached result is returned with
_semantic_similarityset - Falls back to exact-match cache first (faster), then semantic search, then live API call
--semantic-cacheimplies--cacheβ no need to pass both flags
Notes:
- Applies to
brave,perplexity,both, andllm-contextengines. Thefetchengine always uses exact URL matching. - If
fastembedis not installed, a warning is printed and the tool continues without semantic matching. - The same
--cache-ttlapplies to both caches.
Benchmark results (Brave engine, 6 query pairs):
| Condition | Avg. latency |
|---|---|
| Cold API call | ~1,350ms |
| Semantic cache hit | ~360ms |
| Exact cache hit | ~95ms |
Semantic cache delivers ~73% faster responses vs. cold API calls for similar queries.
ccsearch can also be accessed remotely via the built-in HTTP API server (api_server.py), allowing other LLMs and services to use ccsearch over the network.
# Start the server (default port 8888)
python3 api_server.py
# Or via systemd (production)
sudo systemctl start ccsearch-apiAll endpoints except /health require an X-API-Key header. The API key is resolved in this order:
CCSEARCH_API_KEYenvironment variable.api_keyfile in the project directory (auto-generated on first run with0600permissions)
Health check (no auth required).
curl https://ccsearch.0ruka.dev/health
# {"status": "ok", "service": "ccsearch-api"}Main search endpoint. Accepts a JSON body with the following fields:
| Field | Type | Required | Description |
|---|---|---|---|
query |
string | Yes | Search query or URL (for fetch engine) |
engine |
string | Yes | brave, perplexity, both, fetch, or llm-context |
cache |
bool | No | Enable result caching (default: false) |
cache_ttl |
int | No | Cache TTL in minutes (default: 10) |
semantic_cache |
bool | No | Enable semantic similarity cache (default: false) |
semantic_threshold |
float | No | Cosine similarity threshold (default: 0.9) |
offset |
int | No | Pagination offset (Brave only) |
result_limit |
int | No | Trim returned results for brave, both, and llm-context |
flaresolverr |
bool | No | Force FlareSolverr for fetch engine (default: false) |
include_hosts |
list/string | No | Host allow-list for brave, both, and llm-context |
exclude_hosts |
list/string | No | Host deny-list for brave, both, and llm-context |
All single-query responses now include:
cache_status: one ofdisabled,exact,semantic, ormissduration_ms: end-to-end execution time for the request
Search-style engines also expose lightweight source-host summaries:
- Brave / LLM Context:
result_hosts,result_host_count - Perplexity:
citation_hosts,citation_host_countwhen citations are available - Both:
brave_result_hosts,brave_result_host_count,perplexity_citation_hosts,perplexity_citation_host_count
For brave, both, and llm-context, you can also apply host filters at request time:
include_hosts: only keep results from these hostsexclude_hosts: drop results from these hostshost_filtering: response metadata showing the normalized filters that were applied and how many results were removedresult_limit: trim the remaining result list to a stable top-N after filtering, withresult_limitingmetadata describing the applied limit and removed count
For fetch responses, the JSON payload now includes transport metadata such as:
final_url: final URL after redirectsstatus_code: HTTP status code when availablecontent_type: normalized MIME type without the charset suffixcontent_length: response payload size in bytes when availableetag: HTTPETagresponse header when availablelast_modified: HTTPLast-Modifiedresponse header when availablefilename: inferred filename fromContent-Dispositionor URL path when availableconverted_via: present when a binary document was converted (for example,markitdown)content_sha256: stable hash of the extracted text body for downstream deduplicationcontent_word_count: total extracted word countchunks: structured content blocks extracted from the response body, useful for downstream summarization or reranking- Each chunk keeps
index,type, andtext, and also includes lightweight metadata such assection_title,section_path,section_path_text,section_depth,char_count,word_count,relative_position,char_start,char_end,text_sha256, andchunk_id - Link-bearing chunks also expose
link_count,internal_link_count, andexternal_link_count - Some chunk types also expose structure-specific metadata:
- lists:
list_item_count,list_ordered - tables:
table_row_count,table_column_count,table_headers - code:
code_language,code_line_count
- lists:
- Each chunk keeps
chunk_count: total number of structured chunksoutbound_links: deduplicated page-level HTTP/HTTPS links with anchor text, source chunk index, hostname, and same-host classificationoutbound_link_count: total unique outbound link count across all chunksinternal_outbound_link_count: same-host outbound linksexternal_outbound_link_count: off-site outbound linksoutbound_hosts: unique hostnames referenced by the extracted outbound links
For HTML pages, fetch also extracts page metadata when available:
canonical_url: canonical URL from the page's<link rel="canonical">lang: page language from the root HTML tagdescription: page summary from standard or Open Graph meta tagsauthor: author metadata from common article meta tagspublished_at: publish timestamp from common article meta tags
When those HTML meta tags are missing, ccsearch also falls back to JSON-LD article schemas and prunes common non-content UI blocks such as cookie banners and newsletter popups before extracting the main text.
It also sniffs mislabeled HTML payloads (for example, pages served as application/octet-stream) so SPA fallback and metadata extraction still work on poorly configured sites.
For HTML pages, lists and tables are preserved in a Markdown-like form inside both content and chunks.
Code examples are preserved as fenced Markdown code blocks, and code chunks expose code_language when the page declares a recognizable language class such as language-python.
# Brave search
curl -X POST https://ccsearch.0ruka.dev/search \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{"query": "React 19 new features", "engine": "brave"}'
# Perplexity synthesized answer
curl -X POST https://ccsearch.0ruka.dev/search \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{"query": "What is the difference between Vue 3 and React 18?", "engine": "perplexity"}'
# Fetch a URL
curl -X POST https://ccsearch.0ruka.dev/search \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{"query": "https://react.dev/blog", "engine": "fetch"}'
# With caching
curl -X POST https://ccsearch.0ruka.dev/search \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{"query": "Python asyncio tutorial", "engine": "brave", "cache": true, "cache_ttl": 60}'Execute multiple search and fetch requests in a single round-trip.
| Field | Type | Required | Description |
|---|---|---|---|
requests |
array | Yes | List of request objects. Each entry may provide query or url, plus any per-request engine options |
defaults |
object | No | Default options merged into each entry (for example engine, cache, cache_ttl, result_limit, include_hosts, exclude_hosts) |
max_workers |
int | No | Maximum concurrent worker threads (defaults to [Batch].max_workers) |
The response includes:
results: per-request results in original ordercount,success_count,error_count,has_errorsduration_ms: total batch runtimemax_workers: effective concurrency useddeduped_count: how many repeated requests were reused instead of executed againengine_counts: request count by engine- Repeated identical requests inside the same batch are deduplicated automatically and reused in-place, with duplicate entries marked by
_batch_dedupedand_batch_deduped_from
curl -X POST https://ccsearch.0ruka.dev/batch \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{
"max_workers": 4,
"defaults": {"cache": true, "cache_ttl": 30},
"requests": [
{"query": "React compiler release", "engine": "brave"},
{"query": "https://react.dev/blog", "engine": "fetch"}
]
}'List available engines and their server-side capabilities.
curl https://ccsearch.0ruka.dev/engines \
-H "X-API-Key: YOUR_API_KEY"Each engine entry includes:
name,description,requirescategory(search,answer,context,hybrid, orfetch)supports_offsetsupports_semantic_cachesupports_flaresolverrsupports_host_filtersupports_result_limitrequired_env_varsconfiguredconfigured_via
Invalid option combinations are rejected consistently across CLI, HTTP API, and MCP.
Examples: offset is only valid for brave / both, and flaresolverr is only valid for fetch.
Return runtime diagnostics without exposing secret values.
curl https://ccsearch.0ruka.dev/diagnostics \
-H "X-API-Key: YOUR_API_KEY"The response includes:
- dependency availability (
curl_cffi,fastembed,markitdown,mcp) - environment-key presence as booleans
- fetch runtime state such as
flaresolverr_configuredandflaresolverr_mode - batch runtime defaults such as
max_workers - the current engine list
The API server runs as a systemd service (ccsearch-api.service) with automatic restart on failure. Environment variables (API keys, port) are loaded from .env.
sudo systemctl enable ccsearch-api # Enable on boot
sudo systemctl start ccsearch-api # Start
sudo systemctl status ccsearch-api # Check status
journalctl -u ccsearch-api -f # View logsThe service is exposed publicly via Cloudflare Tunnel at ccsearch.0ruka.dev.
mcp_server.py exposes ccsearch as an MCP (Model Context Protocol) server over both SSE and Streamable HTTP transport. It runs as an independent process alongside the Flask HTTP API, sharing the same ccsearch.py core and .env configuration.
ccsearch.py (core search logic, shared)
βββ api_server.py (Flask HTTP API, port 8888)
βββ mcp_server.py (MCP server, port 8890, SSE + Streamable HTTP)
| Tool | Description | Parameters |
|---|---|---|
search |
Web search via brave/perplexity/both/llm-context engines | query, engine, offset, result_limit, cache, cache_ttl, semantic_cache, semantic_threshold, include_hosts, exclude_hosts |
fetch |
Fetch and extract text from a URL | url, flaresolverr, cache, cache_ttl |
batch |
Execute multiple search/fetch requests in one call | requests, optional shared defaults, max_workers |
engines |
List available engines and their capabilities | none |
diagnostics |
Return dependency and runtime diagnostics | none |
fetch returns the same metadata fields as the HTTP API (final_url, status_code, content_type, content_length, optional filename, optional converted_via, chunks, and HTML metadata such as canonical_url, lang, description, author, and published_at when present).
Chunk metadata also includes section hierarchy fields (section_path, section_path_text, section_depth) for more precise citation and reranking workflows.
Path-based authentication β the API key is embedded in the URL path:
SSE: https://ccsearch-mcp.0ruka.dev/<CCSEARCH_API_KEY>/sse
Streamable HTTP: https://ccsearch-mcp.0ruka.dev/<CCSEARCH_API_KEY>/mcp
Requests to any other path (missing or incorrect key) receive a 401 Unauthorized response.
Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"ccsearch": {
"url": "https://ccsearch-mcp.0ruka.dev/<CCSEARCH_API_KEY>/sse"
}
}
}Python MCP SDK (SSE):
from mcp import ClientSession
from mcp.client.sse import sse_client
async with sse_client("https://ccsearch-mcp.0ruka.dev/<KEY>/sse") as (r, w):
async with ClientSession(r, w) as session:
await session.initialize()
await session.call_tool("search", {"query": "hello", "engine": "brave"})Python MCP SDK (Streamable HTTP):
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client
async with streamablehttp_client("https://ccsearch-mcp.0ruka.dev/<KEY>/mcp") as (r, w, _):
async with ClientSession(r, w) as session:
await session.initialize()
await session.call_tool("search", {"query": "hello", "engine": "brave"})- Runtime: Python 3.13 (
/usr/bin/python3) withmcp>=1.26.0 - Port: 8890 (configurable via
CCSEARCH_MCP_PORTenv var) - Systemd service:
ccsearch-mcp.service - Cloudflare Tunnel:
ccsearch-mcp.0ruka.dev β localhost:8890
sudo systemctl enable --now ccsearch-mcp.service
sudo systemctl status ccsearch-mcpThe fetch engine uses a multi-layered approach to access protected websites:
- curl_cffi (recommended): Impersonates Chrome's TLS fingerprint (JA3/JA4), which bypasses most anti-bot detection (Facebook, LinkedIn, Medium, Instagram, etc.). Install with
pip install curl_cffi. Falls back torequestsif not installed. - FlareSolverr: For Cloudflare challenge pages and JS-rendered SPAs that require a real browser. FlareSolverr is a self-hosted proxy that uses a real Chromium browser to solve browser challenges.
- Run FlareSolverr via Docker:
docker run -d --name flaresolverr -p 8191:8191 ghcr.io/flaresolverr/flaresolverr:latest
- Add the URL to your
config.ini:[Fetch] flaresolverr_url = http://localhost:8191/v1 flaresolverr_mode = fallback
fallback(default): Tries a normal HTTP request first. If it fails or detects a Cloudflare challenge, automatically retries through FlareSolverr.always: Skips the normal request and always uses FlareSolverr. Useful for sites that are known to be protected.never: Never uses FlareSolverr, even if configured.
You can also force FlareSolverr for a single invocation with the --flaresolverr CLI flag:
ccsearch "https://cloudflare-site.com" -e fetch --format json --flaresolverrThe tool automatically detects Cloudflare challenges by checking for:
"Just a moment..."in the page title"Checking your browser","cf-browser-verification", or"challenge-platform"in the response body- Suspiciously short responses (< 1KB) with a
cf-rayheader
You can deeply customize tool behavior by adjusting config.ini:
requests_per_second: Rate limiting to prevent ban (Default:1).count: Number of results to fetch per request (Default:10).safesearch: Content filtering level:off,moderate, orstrict.freshness: Filter by time:pd(Past 24h),pw(Past week),pm(Past month),py(Past year). Leave blank for no limit.max_retries: Auto-retry count for network timeouts or 429 Too Many Requests.
model: OpenRouter model string (e.g.,perplexity/sonar,perplexity/sonar-pro).citations: Set totrueto require markdown citations[1]in the synthesized output.temperature: Creativity control (0.0-1.0). Keep low (e.g.,0.1) for factual answering.max_tokens: Hard limit on generation length to save costs.max_retries: Auto-retry count for network anomalies.
count: Number of search results to consider for context extraction, 1-50 (Default:20).maximum_number_of_tokens: Approximate max tokens in the context response, 1024-32768 (Default:8192). Lower for simple factual queries (~2048), higher for deep research (~16384).maximum_number_of_urls: Maximum URLs in the response, 1-50 (Default:20).context_threshold_mode: Relevance filtering:strict(fewer, more precise),balanced(default),lenient(more results), ordisabled(no filtering).freshness: Same time-based filtering as Brave (pd,pw,pm,py).max_retries: Auto-retry count for network anomalies.
flaresolverr_url: FlareSolverr endpoint URL (e.g.,http://localhost:8191/v1). Leave blank to disable.flaresolverr_timeout: Max time in milliseconds for FlareSolverr to solve a challenge (Default:60000).flaresolverr_mode: Strategy:fallback,always, ornever(Default:fallback).
max_workers: Maximum worker threads for batch execution (Default:4).- Batch results preserve input order even when executed concurrently.
- Use a modest value if you are close to Brave or OpenRouter rate limits.
If you are an AI assistant (like Claude, Cursor, or ChatGPT) reading this, you can use this tool to browse the internet to answer user queries!
- JSON First: Use
--format jsonto get machine-readable structures. - Resilience: The script has built-in timeouts and exponential backoff retry logic. If the network hiccups,
ccsearchhandles it safely, avoiding hangs. - Semantic Cache: Use
--semantic-cacheto skip redundant API calls when you're researching the same topic across multiple queries with slightly different wording. The_from_cacheand_semantic_similarityfields in the JSON response tell you when a cached result was returned and how similar it was. - Result Shaping: Use
--include-host,--exclude-host, and--limitonbrave,both, andllm-contextto keep only the sources and top-N items you actually want.
When the user asks you a question that requires up-to-date knowledge, run the python script directly using your bash/terminal tool.
Brave Search Example:
ccsearch "anthropic claude 3.5 sonnet release date" -e brave --format jsonUse this when you need to research specific websites, gather URLs, or need diverse sources.
(Agent Tip: If you didn't find what you need in the first 10 results, you can fetch the next page by adding --offset 1)
LLM Context Example:
ccsearch "React hooks best practices" -e llm-context --format jsonUse this when you need pre-extracted web content optimized for LLM grounding. Returns smart chunks (text, tables, code blocks, structured data) from multiple sources in a single call β far more token-efficient than fetching pages individually. Requires BRAVE_SEARCH_API_KEY (or falls back to BRAVE_API_KEY).
Both Engines Example:
ccsearch "what are the architectural differences between Next.js app router and pages router" -e both --format jsonUse this when you need a deeply synthesized answer but ALSO need immediate access to primary source URLs to read further context in the same query.
Fetch Webpage Example:
ccsearch "https://eslint.org/docs/latest/rules/no-unused-vars" -e fetch --format jsonUse this when a prior search returned a promising URL, but the snippet wasn't detailed enough and you need to read the full page content. The JSON response includes transport metadata such as final_url, status_code, content_type, and content_length, plus HTML metadata like canonical_url, description, author, and published_at when the page exposes them.
Fetch Binary Document Example:
ccsearch "https://example.com/report.pdf" -e fetch --format jsonUse this for PDFs or Office files. If MarkItDown is installed, supported binary documents are converted into Markdown and the JSON response includes "converted_via": "markitdown". If it is not installed, ccsearch returns a clear error telling you what is missing.
Fetch with FlareSolverr (Cloudflare bypass):
ccsearch "https://cloudflare-protected-site.com" -e fetch --format json --flaresolverrUse this when a normal fetch fails due to Cloudflare protection. Requires FlareSolverr configured in config.ini. The JSON output includes a "fetched_via" field ("direct" or "flaresolverr") so you know which method was used. In fallback mode (default), Cloudflare is auto-detected and FlareSolverr is used automatically β no flag needed.
Semantic Cache Example:
ccsearch "Python asyncio event loop tutorial" -e brave --format json --semantic-cache --cache-ttl 60Use --semantic-cache when researching a topic across multiple queries with slightly different wording. Semantically similar queries return the cached result instantly without a new API call. Check _from_cache and _semantic_similarity in the JSON output to know when a cache hit occurred. Requires pip install fastembed.
- If the command returns an error about missing
BRAVE_API_KEY,BRAVE_SEARCH_API_KEY, orOPENROUTER_API_KEY, immediately inform the user that they need to set the environment variable and provide them the exactexportcommand they need to run in their terminal. - Don't try to guess URLs; use this tool instead!
If you deploy ccsearch as a self-hosted HTTP server, you can install it as a Claude Code skill so that Claude automatically uses your server for all web searches β no CLI needed on the client machine.
- Copy the skill file into your Claude Code skills directory:
mkdir -p ~/.claude/skills/ccsearch cp skills/SKILL.md ~/.claude/skills/ccsearch/SKILL.md
- Edit
~/.claude/skills/ccsearch/SKILL.mdand replace allYOUR_CCSEARCH_BASE_URLwith your actual server URL (e.g.,https://ccsearch.example.com). - Set the API key:
export CCSEARCH_API_KEY="your_api_key"
Once installed, Claude Code will automatically invoke /ccsearch whenever it needs to search the web, fetch URLs, or get LLM-optimized context β routing all requests through your server via curl.
The skill template is located at skills/SKILL.md.
To ensure your AI agents (like Claude Code) automatically use this tool instead of hallucinating or using default, less-capable search tools, we highly recommend adding the following guidelines to your project's or home directory's CLAUDE.md (or equivalent system prompt instructions file).
Copy and paste the snippet below into your CLAUDE.md:
### AI Assistant Guidelines & Tools
**Web Search Instructions:**
- **DO NOT** use any default `Web Search Tool` provided by your environment.
- **INSTEAD**, you MUST use the custom `ccsearch` utility CLI tool located at `~/ccsearch/ccsearch.py` (which is also globally accessible as `ccsearch`).
- **PROACTIVE SEARCHING**: You must proactively search the web using `ccsearch` whenever the user asks about something you are unsure of, or whenever the topic involves relatively new technologies, news, or frameworks (released within the last year). Do not guess or hallucinate!
- **ITERATIVE RESEARCH**: You are encouraged to perform multiple rounds of searches with different keywords or different engines (Brave vs Perplexity) to compile the most accurate and reliable answer.
- **Why?** It utilizes Brave Search API (Web Search + LLM Context endpoints) and OpenRouter Perplexity, providing faster, more robust results with automatic error-handling and retries.
- **How to Use Examples (always use `--format json` for agents):**
1. For finding specific links, documentation, or diverse web sources:
`ccsearch "Next.js 14 hydration docs" -e brave --format json`
2. For broad questions requiring a synthesized answer from the web (Use `--cache` to save time on repeated inquiries):
`ccsearch "What are the latest breaking changes in React 19?" -e perplexity --format json --cache`
3. For pre-extracted web content optimized for LLM grounding (smart chunks with structured data, code blocks, tables β no scraping needed):
`ccsearch "React hooks best practices" -e llm-context --format json --cache`
*(Preferred for RAG/grounding β returns query-relevant content from multiple sources in a single call, far more token-efficient than fetch.)*
4. For complex research requiring BOTH an intelligent summary and raw URLs to read further:
`ccsearch "Next.js app router architecture" -e both --format json --cache`
5. **Use `--semantic-cache` when researching a topic across multiple related queries** to avoid redundant API calls β semantically similar queries reuse cached results:
`ccsearch "React Server Components explained" -e perplexity --format json --semantic-cache --cache-ttl 60`
*(Requires `pip install fastembed`. Check `_from_cache` and `_semantic_similarity` in the JSON output to know if a cached result was returned.)*
6. If you didn't find what you need via Brave, you can fetch the next page of results:
`ccsearch "Next.js 14 hydration docs" -e brave --format json --offset 1`
7. **To read the FULL text of a specific URL (like a documentation page or article) when the search snippet isn't enough:**
`ccsearch "https://react.dev/reference/react" -e fetch --format json`
8. **If a fetch fails due to Cloudflare protection or JS-rendered content**, force FlareSolverr:
`ccsearch "https://cloudflare-protected-site.com" -e fetch --format json --flaresolverr`
*(Requires `flaresolverr_url` in `config.ini`. In `fallback` mode, Cloudflare is auto-detected β no flag needed. Check the `"fetched_via"` field in the JSON output to see which method was used.)*
- For the full tutorial and advanced parameters (like how to configure limits or handle missing APIs), please read the README located at `~/ccsearch/README.md` FIRST before making assumptions.