ccsearch

A CLI Web Search utility designed to be easily used by Large Language Models (LLMs) like Claude Code, as well as human users. It supports structured outputs (JSON) for agents and readable outputs (text) for humans.

Supported Engines

Brave Search (via Brave Search API): Best for getting a list of fast, accurate links and snippets. Supports pagination (--offset), safesearch, and time-based filtering.
Perplexity (via OpenRouter): Best for getting an intelligent, synthesized answer using online sources. Supports model selection, customizable temperature, and citation formatting.
LLM Context (via Brave LLM Context API): Returns pre-extracted, relevance-scored web content (smart chunks) optimized for LLM consumption. Extracts text, tables, code blocks, and structured data from multiple sources in a single API call — no scraping needed. Ideal for RAG pipelines and AI agent grounding.
Both (Concurrency): Runs both Brave and Perplexity searches in parallel, returning a merged outcome (a synthesized answer alongside raw source links).
Fetch: A built-in web scraper that downloads a given URL, parses it, and returns the cleaned text without HTML tags. Perfect for reading full articles when a snippet isn't enough. Uses curl_cffi for Chrome TLS fingerprint impersonation to access strict anti-bot sites (Facebook, LinkedIn, Medium, etc.), with full Chrome 146 headers and a Google Referer. Includes automatic FlareSolverr fallback for Cloudflare-protected pages and SPA shell detection that identifies JS-heavy pages (empty mount points, script-heavy HTML with little text) and auto-falls back to headless rendering. HTML extraction now prefers main / article / role="main" content when present to reduce layout noise. Non-HTML text responses are decoded directly, and supported binary documents (PDF, DOCX, PPTX, XLSX, etc.) can be converted to Markdown via optional MarkItDown integration. Twitter/X URLs are automatically intercepted and routed through the fxtwitter API to retrieve tweet content, author info, and engagement metrics without login.

Search-style engines also normalize their output for downstream agents:

Brave results include hostname, strip inline HTML tags, decode HTML entities, and deduplicate repeated URLs.
LLM Context results include hostname and age from Brave's sources payload when available, plus cleaned snippet text.
Perplexity responses preserve normalized citations when the upstream model returns them.
both preserves partial-failure visibility through brave_error or perplexity_error fields when one backend fails, and forwards perplexity_citations when available.
Search results also carry stable positional metadata such as rank, result_count, and brave_result_count / source_count where relevant.

Requirements & Setup

Clone the repository:

git clone https://github.com/jamie950315/ccsearch.git
cd ccsearch

Install Python dependencies:
```
pip install -r requirements.txt
```
Copy the example configuration:
```
cp config.ini.example config.ini
```
Modify config.ini to adjust rate limits, models, filtering, or retry logic.
Add it to your CLI $PATH for global use:
```
mkdir -p ~/.local/bin
ln -sf $(pwd)/ccsearch.py ~/.local/bin/ccsearch
```
(Ensure ~/.local/bin is in your environment's PATH so you can just run ccsearch from anywhere)
Set your Environment Variables:
- For Brave Web Search: export BRAVE_API_KEY="your_brave_api_key"
- For LLM Context: export BRAVE_SEARCH_API_KEY="your_brave_search_plan_key" (falls back to BRAVE_API_KEY if not set; note that the LLM Context API requires a key from Brave's Search plan, which is separate from the Pro plan)
- For Perplexity: export OPENROUTER_API_KEY="your_openrouter_api_key"

Optional Fetch Extras

For richer binary document conversion in fetch, install MarkItDown with the formats you care about:
```
pip install 'markitdown[pdf,docx,pptx,xlsx]'
```
Without MarkItDown installed, fetch still works for HTML and plain-text responses, but supported binary documents return a clear error payload instead of low-quality extracted text.

Usage for Humans

# Brave Search (Text Output)
ccsearch "latest React documentation" -e brave --format text

# Brave Search (2nd page of results using offset)
ccsearch "latest React documentation" -e brave --format text --offset 1

# Perplexity Synthesis (Text Output)
ccsearch "What is the difference between Vue 3 and React 18?" -e perplexity --format text

# LLM Context (Pre-extracted smart chunks for grounding)
ccsearch "React hooks best practices" -e llm-context --format text

# Both Engines Concurrently (Merged Text Output)
ccsearch "What is the new React compiler?" -e both --format text

# Fetch a webpage's clean text
ccsearch "https://react.dev/blog/2025/10/07/react-compiler-1" -e fetch --format text

# Fetch a PDF (requires optional MarkItDown install for Markdown conversion)
ccsearch "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf" -e fetch --format json

# Fetch a tweet (auto-routed via fxtwitter API)
ccsearch "https://x.com/jack/status/20" -e fetch --format text

# Fetch a Twitter/X user profile
ccsearch "https://x.com/NASA" -e fetch --format text

# Run a mixed batch from JSON/JSONL with bounded concurrency
ccsearch --batch-file requests.json --batch-workers 4 --format json

# Keep only the top 3 results after host filtering/post-processing
ccsearch "OpenAI Responses API" -e brave --include-host developers.openai.com --limit 3 --format json

# Force FlareSolverr for a Cloudflare-protected page
ccsearch "https://some-cloudflare-site.com" -e fetch --format text --flaresolverr

# Inspect engine availability and current setup
ccsearch --list-engines --format json
ccsearch --doctor --format text

Advanced Usage

Caching Results

Exact Cache (`--cache`)

Caches results by an exact hash of the query string. Subsequent identical queries return instantly without hitting the API.

# Cache the result for the default 10 minutes
ccsearch "React 19 release date" -e perplexity --cache

# Cache the result for a custom duration (e.g., 60 minutes)
ccsearch "React 19 release date" -e perplexity --cache --cache-ttl 60

Cache files are stored in ~/.cache/ccsearch/ as JSON files keyed by MD5 hash of (query, engine, offset).

For the fetch engine, URLs are normalized before hashing so cache hits survive:

tracking parameters such as utm_*, fbclid, gclid, etc.
query parameter reordering
fragment-only differences
host casing and default port differences

For search-style engines, exact cache keys also normalize repeated whitespace so React hooks and React hooks reuse the same cache entry.

Semantic Cache (`--semantic-cache`)

Extends exact caching with embedding-based similarity matching. If a semantically equivalent query was previously cached, the result is returned without a new API call — even if the wording differs.

Requires fastembed (pip install fastembed). Uses the BAAI/bge-small-en-v1.5 model (384-dim, ~40MB, runs entirely locally via ONNX).

# First search — result is cached and embedding is stored
ccsearch "Python asyncio event loop tutorial" -e brave --semantic-cache --cache-ttl 60

# Semantically similar query — returns the cached result (no API call)
ccsearch "Python asyncio event loop guide" -e brave --semantic-cache --cache-ttl 60
# Output includes: "_from_cache": true, "_semantic_similarity": 0.9434

Adjusting the similarity threshold (default 0.9, range 0.0–1.0):

# Stricter: only very close paraphrases hit the cache
ccsearch "Python asyncio tutorial" -e brave --semantic-cache --semantic-threshold 0.95

# Looser: broader topic matching (useful for exploratory queries)
ccsearch "Python asyncio tutorial" -e brave --semantic-cache --semantic-threshold 0.85

How it works:

On a cache miss, the query is embedded and stored alongside the result in ~/.cache/ccsearch/semantic_index.json
On a subsequent query, the new embedding is compared against all stored embeddings using cosine similarity
If the best match exceeds the threshold, the cached result is returned with _semantic_similarity set
Falls back to exact-match cache first (faster), then semantic search, then live API call
--semantic-cache implies --cache — no need to pass both flags

Notes:

Applies to brave, perplexity, both, and llm-context engines. The fetch engine always uses exact URL matching.
If fastembed is not installed, a warning is printed and the tool continues without semantic matching.
The same --cache-ttl applies to both caches.

Benchmark results (Brave engine, 6 query pairs):

Condition	Avg. latency
Cold API call	~1,350ms
Semantic cache hit	~360ms
Exact cache hit	~95ms

Semantic cache delivers ~73% faster responses vs. cold API calls for similar queries.

HTTP API Server

ccsearch can also be accessed remotely via the built-in HTTP API server (api_server.py), allowing other LLMs and services to use ccsearch over the network.

Quick Start

# Start the server (default port 8888)
python3 api_server.py

# Or via systemd (production)
sudo systemctl start ccsearch-api

Authentication

All endpoints except /health require an X-API-Key header. The API key is resolved in this order:

CCSEARCH_API_KEY environment variable
.api_key file in the project directory (auto-generated on first run with 0600 permissions)

Endpoints

`GET /health`

Health check (no auth required).

curl https://ccsearch.0ruka.dev/health
# {"status": "ok", "service": "ccsearch-api"}

`POST /search`

Main search endpoint. Accepts a JSON body with the following fields:

Field	Type	Required	Description
`query`	string	Yes	Search query or URL (for fetch engine)
`engine`	string	Yes	`brave`, `perplexity`, `both`, `fetch`, or `llm-context`
`cache`	bool	No	Enable result caching (default: `false`)
`cache_ttl`	int	No	Cache TTL in minutes (default: `10`)
`semantic_cache`	bool	No	Enable semantic similarity cache (default: `false`)
`semantic_threshold`	float	No	Cosine similarity threshold (default: `0.9`)
`offset`	int	No	Pagination offset (Brave only)
`result_limit`	int	No	Trim returned results for `brave`, `both`, and `llm-context`
`flaresolverr`	bool	No	Force FlareSolverr for fetch engine (default: `false`)
`include_hosts`	list/string	No	Host allow-list for `brave`, `both`, and `llm-context`
`exclude_hosts`	list/string	No	Host deny-list for `brave`, `both`, and `llm-context`

All single-query responses now include:

cache_status: one of disabled, exact, semantic, or miss
duration_ms: end-to-end execution time for the request

Search-style engines also expose lightweight source-host summaries:

Brave / LLM Context: result_hosts, result_host_count
Perplexity: citation_hosts, citation_host_count when citations are available
Both: brave_result_hosts, brave_result_host_count, perplexity_citation_hosts, perplexity_citation_host_count

For brave, both, and llm-context, you can also apply host filters at request time:

include_hosts: only keep results from these hosts
exclude_hosts: drop results from these hosts
host_filtering: response metadata showing the normalized filters that were applied and how many results were removed
result_limit: trim the remaining result list to a stable top-N after filtering, with result_limiting metadata describing the applied limit and removed count

For fetch responses, the JSON payload now includes transport metadata such as:

final_url: final URL after redirects
status_code: HTTP status code when available
content_type: normalized MIME type without the charset suffix
content_length: response payload size in bytes when available
etag: HTTP ETag response header when available
last_modified: HTTP Last-Modified response header when available
filename: inferred filename from Content-Disposition or URL path when available
converted_via: present when a binary document was converted (for example, markitdown)
content_sha256: stable hash of the extracted text body for downstream deduplication
content_word_count: total extracted word count
chunks: structured content blocks extracted from the response body, useful for downstream summarization or reranking
- Each chunk keeps index, type, and text, and also includes lightweight metadata such as section_title, section_path, section_path_text, section_depth, char_count, word_count, relative_position, char_start, char_end, text_sha256, and chunk_id
- Link-bearing chunks also expose link_count, internal_link_count, and external_link_count
- Some chunk types also expose structure-specific metadata:
  - lists: list_item_count, list_ordered
  - tables: table_row_count, table_column_count, table_headers
  - code: code_language, code_line_count
chunk_count: total number of structured chunks
outbound_links: deduplicated page-level HTTP/HTTPS links with anchor text, source chunk index, hostname, and same-host classification
outbound_link_count: total unique outbound link count across all chunks
internal_outbound_link_count: same-host outbound links
external_outbound_link_count: off-site outbound links
outbound_hosts: unique hostnames referenced by the extracted outbound links

For HTML pages, fetch also extracts page metadata when available:

canonical_url: canonical URL from the page's <link rel="canonical">
lang: page language from the root HTML tag
description: page summary from standard or Open Graph meta tags
author: author metadata from common article meta tags
published_at: publish timestamp from common article meta tags

When those HTML meta tags are missing, ccsearch also falls back to JSON-LD article schemas and prunes common non-content UI blocks such as cookie banners and newsletter popups before extracting the main text. It also sniffs mislabeled HTML payloads (for example, pages served as application/octet-stream) so SPA fallback and metadata extraction still work on poorly configured sites. For HTML pages, lists and tables are preserved in a Markdown-like form inside both content and chunks. Code examples are preserved as fenced Markdown code blocks, and code chunks expose code_language when the page declares a recognizable language class such as language-python.

# Brave search
curl -X POST https://ccsearch.0ruka.dev/search \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{"query": "React 19 new features", "engine": "brave"}'

# Perplexity synthesized answer
curl -X POST https://ccsearch.0ruka.dev/search \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{"query": "What is the difference between Vue 3 and React 18?", "engine": "perplexity"}'

# Fetch a URL
curl -X POST https://ccsearch.0ruka.dev/search \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{"query": "https://react.dev/blog", "engine": "fetch"}'

# With caching
curl -X POST https://ccsearch.0ruka.dev/search \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{"query": "Python asyncio tutorial", "engine": "brave", "cache": true, "cache_ttl": 60}'

`POST /batch`

Execute multiple search and fetch requests in a single round-trip.

Field	Type	Required	Description
`requests`	array	Yes	List of request objects. Each entry may provide `query` or `url`, plus any per-request engine options
`defaults`	object	No	Default options merged into each entry (for example `engine`, `cache`, `cache_ttl`, `result_limit`, `include_hosts`, `exclude_hosts`)
`max_workers`	int	No	Maximum concurrent worker threads (defaults to `[Batch].max_workers`)

The response includes:

results: per-request results in original order
count, success_count, error_count, has_errors
duration_ms: total batch runtime
max_workers: effective concurrency used
deduped_count: how many repeated requests were reused instead of executed again
engine_counts: request count by engine
Repeated identical requests inside the same batch are deduplicated automatically and reused in-place, with duplicate entries marked by _batch_deduped and _batch_deduped_from

curl -X POST https://ccsearch.0ruka.dev/batch \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{
        "max_workers": 4,
        "defaults": {"cache": true, "cache_ttl": 30},
        "requests": [
          {"query": "React compiler release", "engine": "brave"},
          {"query": "https://react.dev/blog", "engine": "fetch"}
        ]
      }'

`GET /engines`

List available engines and their server-side capabilities.

curl https://ccsearch.0ruka.dev/engines \
  -H "X-API-Key: YOUR_API_KEY"

Each engine entry includes:

name, description, requires
category (search, answer, context, hybrid, or fetch)
supports_offset
supports_semantic_cache
supports_flaresolverr
supports_host_filter
supports_result_limit
required_env_vars
configured
configured_via

Invalid option combinations are rejected consistently across CLI, HTTP API, and MCP. Examples: offset is only valid for brave / both, and flaresolverr is only valid for fetch.

`GET /diagnostics`

Return runtime diagnostics without exposing secret values.

curl https://ccsearch.0ruka.dev/diagnostics \
  -H "X-API-Key: YOUR_API_KEY"

The response includes:

dependency availability (curl_cffi, fastembed, markitdown, mcp)
environment-key presence as booleans
fetch runtime state such as flaresolverr_configured and flaresolverr_mode
batch runtime defaults such as max_workers
the current engine list

Deployment

The API server runs as a systemd service (ccsearch-api.service) with automatic restart on failure. Environment variables (API keys, port) are loaded from .env.

sudo systemctl enable ccsearch-api   # Enable on boot
sudo systemctl start ccsearch-api    # Start
sudo systemctl status ccsearch-api   # Check status
journalctl -u ccsearch-api -f        # View logs

The service is exposed publicly via Cloudflare Tunnel at ccsearch.0ruka.dev.

MCP Server

mcp_server.py exposes ccsearch as an MCP (Model Context Protocol) server over both SSE and Streamable HTTP transport. It runs as an independent process alongside the Flask HTTP API, sharing the same ccsearch.py core and .env configuration.

Architecture

ccsearch.py (core search logic, shared)
    ├── api_server.py   (Flask HTTP API, port 8888)
    └── mcp_server.py   (MCP server, port 8890, SSE + Streamable HTTP)

Tools

Tool	Description	Parameters
`search`	Web search via brave/perplexity/both/llm-context engines	`query`, `engine`, `offset`, `result_limit`, `cache`, `cache_ttl`, `semantic_cache`, `semantic_threshold`, `include_hosts`, `exclude_hosts`
`fetch`	Fetch and extract text from a URL	`url`, `flaresolverr`, `cache`, `cache_ttl`
`batch`	Execute multiple search/fetch requests in one call	`requests`, optional shared defaults, `max_workers`
`engines`	List available engines and their capabilities	none
`diagnostics`	Return dependency and runtime diagnostics	none

fetch returns the same metadata fields as the HTTP API (final_url, status_code, content_type, content_length, optional filename, optional converted_via, chunks, and HTML metadata such as canonical_url, lang, description, author, and published_at when present). Chunk metadata also includes section hierarchy fields (section_path, section_path_text, section_depth) for more precise citation and reranking workflows.

Authentication

Path-based authentication — the API key is embedded in the URL path:

SSE:             https://ccsearch-mcp.0ruka.dev/<CCSEARCH_API_KEY>/sse
Streamable HTTP: https://ccsearch-mcp.0ruka.dev/<CCSEARCH_API_KEY>/mcp

Requests to any other path (missing or incorrect key) receive a 401 Unauthorized response.

Client Configuration

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "ccsearch": {
      "url": "https://ccsearch-mcp.0ruka.dev/<CCSEARCH_API_KEY>/sse"
    }
  }
}

Python MCP SDK (SSE):

from mcp import ClientSession
from mcp.client.sse import sse_client

async with sse_client("https://ccsearch-mcp.0ruka.dev/<KEY>/sse") as (r, w):
    async with ClientSession(r, w) as session:
        await session.initialize()
        await session.call_tool("search", {"query": "hello", "engine": "brave"})

Python MCP SDK (Streamable HTTP):

from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client

async with streamablehttp_client("https://ccsearch-mcp.0ruka.dev/<KEY>/mcp") as (r, w, _):
    async with ClientSession(r, w) as session:
        await session.initialize()
        await session.call_tool("search", {"query": "hello", "engine": "brave"})

Deployment

Runtime: Python 3.13 (/usr/bin/python3) with mcp>=1.26.0
Port: 8890 (configurable via CCSEARCH_MCP_PORT env var)
Systemd service: ccsearch-mcp.service
Cloudflare Tunnel: ccsearch-mcp.0ruka.dev → localhost:8890

sudo systemctl enable --now ccsearch-mcp.service
sudo systemctl status ccsearch-mcp

FlareSolverr Integration (Optional)

The fetch engine uses a multi-layered approach to access protected websites:

curl_cffi (recommended): Impersonates Chrome's TLS fingerprint (JA3/JA4), which bypasses most anti-bot detection (Facebook, LinkedIn, Medium, Instagram, etc.). Install with pip install curl_cffi. Falls back to requests if not installed.
FlareSolverr: For Cloudflare challenge pages and JS-rendered SPAs that require a real browser. FlareSolverr is a self-hosted proxy that uses a real Chromium browser to solve browser challenges.

Setup

Run FlareSolverr via Docker:

docker run -d --name flaresolverr -p 8191:8191 ghcr.io/flaresolverr/flaresolverr:latest

Add the URL to your config.ini:

[Fetch]
flaresolverr_url = http://localhost:8191/v1
flaresolverr_mode = fallback

Modes

fallback (default): Tries a normal HTTP request first. If it fails or detects a Cloudflare challenge, automatically retries through FlareSolverr.
always: Skips the normal request and always uses FlareSolverr. Useful for sites that are known to be protected.
never: Never uses FlareSolverr, even if configured.

You can also force FlareSolverr for a single invocation with the --flaresolverr CLI flag:

ccsearch "https://cloudflare-site.com" -e fetch --format json --flaresolverr

Detection

The tool automatically detects Cloudflare challenges by checking for:

"Just a moment..." in the page title
"Checking your browser", "cf-browser-verification", or "challenge-platform" in the response body
Suspiciously short responses (< 1KB) with a cf-ray header

Advanced Configuration (`config.ini`)

You can deeply customize tool behavior by adjusting config.ini:

`[Brave]`

requests_per_second: Rate limiting to prevent ban (Default: 1).
count: Number of results to fetch per request (Default: 10).
safesearch: Content filtering level: off, moderate, or strict.
freshness: Filter by time: pd (Past 24h), pw (Past week), pm (Past month), py (Past year). Leave blank for no limit.
max_retries: Auto-retry count for network timeouts or 429 Too Many Requests.

`[Perplexity]`

model: OpenRouter model string (e.g., perplexity/sonar, perplexity/sonar-pro).
citations: Set to true to require markdown citations [1] in the synthesized output.
temperature: Creativity control (0.0 - 1.0). Keep low (e.g., 0.1) for factual answering.
max_tokens: Hard limit on generation length to save costs.
max_retries: Auto-retry count for network anomalies.

`[LLMContext]`

count: Number of search results to consider for context extraction, 1-50 (Default: 20).
maximum_number_of_tokens: Approximate max tokens in the context response, 1024-32768 (Default: 8192). Lower for simple factual queries (~2048), higher for deep research (~16384).
maximum_number_of_urls: Maximum URLs in the response, 1-50 (Default: 20).
context_threshold_mode: Relevance filtering: strict (fewer, more precise), balanced (default), lenient (more results), or disabled (no filtering).
freshness: Same time-based filtering as Brave (pd, pw, pm, py).
max_retries: Auto-retry count for network anomalies.

`[Fetch]`

flaresolverr_url: FlareSolverr endpoint URL (e.g., http://localhost:8191/v1). Leave blank to disable.
flaresolverr_timeout: Max time in milliseconds for FlareSolverr to solve a challenge (Default: 60000).
flaresolverr_mode: Strategy: fallback, always, or never (Default: fallback).

`[Batch]`

max_workers: Maximum worker threads for batch execution (Default: 4).
Batch results preserve input order even when executed concurrently.
Use a modest value if you are close to Brave or OpenRouter rate limits.

🤖 Guide for LLMs & AI Agents

If you are an AI assistant (like Claude, Cursor, or ChatGPT) reading this, you can use this tool to browse the internet to answer user queries!

Features Built for Agents

JSON First: Use --format json to get machine-readable structures.
Resilience: The script has built-in timeouts and exponential backoff retry logic. If the network hiccups, ccsearch handles it safely, avoiding hangs.
Semantic Cache: Use --semantic-cache to skip redundant API calls when you're researching the same topic across multiple queries with slightly different wording. The _from_cache and _semantic_similarity fields in the JSON response tell you when a cached result was returned and how similar it was.
Result Shaping: Use --include-host, --exclude-host, and --limit on brave, both, and llm-context to keep only the sources and top-N items you actually want.

How to use `ccsearch`

When the user asks you a question that requires up-to-date knowledge, run the python script directly using your bash/terminal tool.

Brave Search Example:

ccsearch "anthropic claude 3.5 sonnet release date" -e brave --format json

Use this when you need to research specific websites, gather URLs, or need diverse sources.

(Agent Tip: If you didn't find what you need in the first 10 results, you can fetch the next page by adding --offset 1)

LLM Context Example:

ccsearch "React hooks best practices" -e llm-context --format json

Use this when you need pre-extracted web content optimized for LLM grounding. Returns smart chunks (text, tables, code blocks, structured data) from multiple sources in a single call — far more token-efficient than fetching pages individually. Requires BRAVE_SEARCH_API_KEY (or falls back to BRAVE_API_KEY).

Both Engines Example:

ccsearch "what are the architectural differences between Next.js app router and pages router" -e both --format json

Use this when you need a deeply synthesized answer but ALSO need immediate access to primary source URLs to read further context in the same query.

Fetch Webpage Example:

ccsearch "https://eslint.org/docs/latest/rules/no-unused-vars" -e fetch --format json

Use this when a prior search returned a promising URL, but the snippet wasn't detailed enough and you need to read the full page content. The JSON response includes transport metadata such as final_url, status_code, content_type, and content_length, plus HTML metadata like canonical_url, description, author, and published_at when the page exposes them.

Fetch Binary Document Example:

ccsearch "https://example.com/report.pdf" -e fetch --format json

Use this for PDFs or Office files. If MarkItDown is installed, supported binary documents are converted into Markdown and the JSON response includes "converted_via": "markitdown". If it is not installed, ccsearch returns a clear error telling you what is missing.

Fetch with FlareSolverr (Cloudflare bypass):

ccsearch "https://cloudflare-protected-site.com" -e fetch --format json --flaresolverr

Use this when a normal fetch fails due to Cloudflare protection. Requires FlareSolverr configured in config.ini. The JSON output includes a "fetched_via" field ("direct" or "flaresolverr") so you know which method was used. In fallback mode (default), Cloudflare is auto-detected and FlareSolverr is used automatically — no flag needed.

Semantic Cache Example:

ccsearch "Python asyncio event loop tutorial" -e brave --format json --semantic-cache --cache-ttl 60

Use --semantic-cache when researching a topic across multiple queries with slightly different wording. Semantically similar queries return the cached result instantly without a new API call. Check _from_cache and _semantic_similarity in the JSON output to know when a cache hit occurred. Requires pip install fastembed.

Error Handling

If the command returns an error about missing BRAVE_API_KEY, BRAVE_SEARCH_API_KEY, or OPENROUTER_API_KEY, immediately inform the user that they need to set the environment variable and provide them the exact export command they need to run in their terminal.
Don't try to guess URLs; use this tool instead!

Claude Code Skill (HTTP API Mode)

If you deploy ccsearch as a self-hosted HTTP server, you can install it as a Claude Code skill so that Claude automatically uses your server for all web searches — no CLI needed on the client machine.

Setup

Copy the skill file into your Claude Code skills directory:

mkdir -p ~/.claude/skills/ccsearch
cp skills/SKILL.md ~/.claude/skills/ccsearch/SKILL.md

Edit ~/.claude/skills/ccsearch/SKILL.md and replace all YOUR_CCSEARCH_BASE_URL with your actual server URL (e.g., https://ccsearch.example.com).
Set the API key:
```
export CCSEARCH_API_KEY="your_api_key"
```

Once installed, Claude Code will automatically invoke /ccsearch whenever it needs to search the web, fetch URLs, or get LLM-optimized context — routing all requests through your server via curl.

The skill template is located at skills/SKILL.md.

Instructing Your AI to Use `ccsearch` (CLAUDE.md)

To ensure your AI agents (like Claude Code) automatically use this tool instead of hallucinating or using default, less-capable search tools, we highly recommend adding the following guidelines to your project's or home directory's CLAUDE.md (or equivalent system prompt instructions file).

Copy and paste the snippet below into your CLAUDE.md:

### AI Assistant Guidelines & Tools

**Web Search Instructions:**
- **DO NOT** use any default `Web Search Tool` provided by your environment.
- **INSTEAD**, you MUST use the custom `ccsearch` utility CLI tool located at `~/ccsearch/ccsearch.py` (which is also globally accessible as `ccsearch`).
- **PROACTIVE SEARCHING**: You must proactively search the web using `ccsearch` whenever the user asks about something you are unsure of, or whenever the topic involves relatively new technologies, news, or frameworks (released within the last year). Do not guess or hallucinate!
- **ITERATIVE RESEARCH**: You are encouraged to perform multiple rounds of searches with different keywords or different engines (Brave vs Perplexity) to compile the most accurate and reliable answer.
- **Why?** It utilizes Brave Search API (Web Search + LLM Context endpoints) and OpenRouter Perplexity, providing faster, more robust results with automatic error-handling and retries.
- **How to Use Examples (always use `--format json` for agents):**
  1. For finding specific links, documentation, or diverse web sources:
     `ccsearch "Next.js 14 hydration docs" -e brave --format json`
  2. For broad questions requiring a synthesized answer from the web (Use `--cache` to save time on repeated inquiries):
     `ccsearch "What are the latest breaking changes in React 19?" -e perplexity --format json --cache`
  3. For pre-extracted web content optimized for LLM grounding (smart chunks with structured data, code blocks, tables — no scraping needed):
     `ccsearch "React hooks best practices" -e llm-context --format json --cache`
     *(Preferred for RAG/grounding — returns query-relevant content from multiple sources in a single call, far more token-efficient than fetch.)*
  4. For complex research requiring BOTH an intelligent summary and raw URLs to read further:
     `ccsearch "Next.js app router architecture" -e both --format json --cache`
  5. **Use `--semantic-cache` when researching a topic across multiple related queries** to avoid redundant API calls — semantically similar queries reuse cached results:
     `ccsearch "React Server Components explained" -e perplexity --format json --semantic-cache --cache-ttl 60`
     *(Requires `pip install fastembed`. Check `_from_cache` and `_semantic_similarity` in the JSON output to know if a cached result was returned.)*
  6. If you didn't find what you need via Brave, you can fetch the next page of results:
     `ccsearch "Next.js 14 hydration docs" -e brave --format json --offset 1`
  7. **To read the FULL text of a specific URL (like a documentation page or article) when the search snippet isn't enough:**
     `ccsearch "https://react.dev/reference/react" -e fetch --format json`
  8. **If a fetch fails due to Cloudflare protection or JS-rendered content**, force FlareSolverr:
     `ccsearch "https://cloudflare-protected-site.com" -e fetch --format json --flaresolverr`
     *(Requires `flaresolverr_url` in `config.ini`. In `fallback` mode, Cloudflare is auto-detected — no flag needed. Check the `"fetched_via"` field in the JSON output to see which method was used.)*
- For the full tutorial and advanced parameters (like how to configure limits or handle missing APIs), please read the README located at `~/ccsearch/README.md` FIRST before making assumptions.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
skills		skills
.ccreview-rejections.md		.ccreview-rejections.md
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
TODO.md		TODO.md
api_server.py		api_server.py
ccsearch.py		ccsearch.py
config.ini.example		config.ini.example
docker-compose.yml		docker-compose.yml
mcp_server.py		mcp_server.py
requirements.txt		requirements.txt
test_ccsearch.py		test_ccsearch.py

Folders and files

Latest commit

History

Repository files navigation

ccsearch

Supported Engines

Requirements & Setup

Optional Fetch Extras

Usage for Humans

Advanced Usage

Caching Results

Exact Cache (--cache)

Semantic Cache (--semantic-cache)

HTTP API Server

Quick Start

Authentication

Endpoints

GET /health

POST /search

POST /batch

GET /engines

GET /diagnostics

Deployment

MCP Server

Architecture

Tools

Authentication

Client Configuration

Deployment

FlareSolverr Integration (Optional)

Setup

Modes

Detection

Advanced Configuration (config.ini)

[Brave]

[Perplexity]

[LLMContext]

[Fetch]

[Batch]

🤖 Guide for LLMs & AI Agents

Features Built for Agents

How to use ccsearch

Error Handling

Claude Code Skill (HTTP API Mode)

Setup

Instructing Your AI to Use ccsearch (CLAUDE.md)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Exact Cache (`--cache`)

Semantic Cache (`--semantic-cache`)

`GET /health`

`POST /search`

`POST /batch`

`GET /engines`

`GET /diagnostics`

Advanced Configuration (`config.ini`)

`[Brave]`

`[Perplexity]`

`[LLMContext]`

`[Fetch]`

`[Batch]`

How to use `ccsearch`

Instructing Your AI to Use `ccsearch` (CLAUDE.md)

Packages