feat: add nimble_web_search data source#261
Conversation
bb102fd to
1a00882
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1a0088290a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Greptile SummaryThis PR adds
Confidence Score: 5/5Safe to merge — the new plugin follows the established pattern, previous review feedback has been addressed, and the test suite is thorough. The implementation is structurally sound: typed config with validated enums, non-transient error short-circuit (previously addressed), small-limit truncation guard (previously addressed), HTML-escaped output, and 32 credential-free tests. The only findings are a stale test count in the README verification checklist and an unreachable safety-net return at the end of the retry loop — neither affects runtime behavior. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant Agent
participant nimble_web_search
participant NimbleSearchRetriever
participant NimbleAPI
Agent->>nimble_web_search: question (str)
Note over nimble_web_search: Truncate query to 400 chars
loop attempt in range(max_retries)
nimble_web_search->>NimbleSearchRetriever: ainvoke(question)
NimbleSearchRetriever->>NimbleAPI: HTTP search request
alt Success
NimbleAPI-->>NimbleSearchRetriever: docs[]
NimbleSearchRetriever-->>nimble_web_search: docs[]
Note over nimble_web_search: Render XML Document blocks
nimble_web_search-->>Agent: formatted result string
else Empty results (ValueError)
nimble_web_search-->>Agent: no results message (no retry)
else 401 Unauthorized
nimble_web_search-->>Agent: friendly key error message (no retry)
else 403 Forbidden
nimble_web_search-->>Agent: friendly entitlement message (no retry)
else Transient error
Note over nimble_web_search: sleep(2^attempt), retry
end
end
Reviews (5): Last reviewed commit: "fix(nimble_web_search): add typed focus ..." | Re-trigger Greptile |
1a00882 to
6b86609
Compare
Adds a Nimble web search integration mirroring exa_web_search and tavily_web_search. The new sources/nimble_web_search package wraps langchain-nimble's NimbleSearchRetriever, supports NIMBLE_API_KEY via env or config, and exposes lite/fast/deep search depths (fast is an enterprise-tier feature that surfaces a clear error on non-enterprise keys; lite is the default). It is wired into the workspace, deploy/Dockerfile (with --no-deps to preserve the frozen lockfile), and scripts/setup.sh. Includes unit tests and documentation updates across the configuration reference, extending guides, installation, deployment, faq, and troubleshooting. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6b86609 to
00746e2
Compare
…and clarify tool description Expose `focus` as a typed Literal config (general, news, location, shopping, geo, social) defaulting to "general", and pass it explicitly to NimbleSearchRetriever. The upstream SDK field is an unvalidated str defaulting to general; the Literal adds parse-time validation and makes the general default explicit. focus is a workflow-config setting, not an agent-chosen parameter, so general research queries cannot silently switch to news. Tighten the tool description the agent sees to state it is a general-purpose web/research search. include_answer remains unexposed in this initial integration. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
7cce6a6 to
c65efdf
Compare
Summary
sources/nimble_web_search, a NAT data source that wrapslangchain-nimble'sNimbleSearchRetriever, mirroring the existingexa_web_searchandtavily_web_searchpackages (typed config, stub-on-missing-key, retries, content truncation, XML-tagged output).lite/fast/deepsearch_depth, typed as aLiteralso invalid values fail at config-parse time.liteis the default — metadata-only, token-cheap, works on any account.fastis enterprise-tier and surfaces a clear 403 entitlement message on non-enterprise keys.focusmode (defaultgeneral, validatedLiteral),country/localeregional controls, and an optionalmax_content_lengthper-result cap.deploy/Dockerfile, andscripts/setup.shso it installs in dev, Docker, and container builds. The Docker layer also installs Nimble's runtime deps (langchain-nimble,nimble-python, lockfile-pinned) so_type: nimble_web_searchresolves in built images, where the--no-devsync would otherwise omit them.Motivation
AI-Q ships Tavily- and Exa-backed web search today. Nimble provides web search and content extraction for AI agents; this adds it as a first-class alternative with the same ergonomics and config surface — handy for users who already have a Nimble subscription, prefer its regional coverage, or want to test across multiple search backends. The default provider is unchanged (Tavily stays the documented default).
It wraps the official
langchain-nimblepackage (maintained by Nimble) rather than calling the HTTP API directly, so retry, auth, and response normalization come from the upstream integration — the same rationale as the Exa source (#181).Configuration
NIMBLE_API_KEY=... # or set api_key: in the YAMLHow it works
A real
litequery withNIMBLE_API_KEYset, trimmed:Each result renders as an XML
<Document>block — the same shape the Tavily and Exa sources produce — so existing AI-Q agents consume it with no changes. To try it, point any existing web-search config at_type: nimble_web_searchand runnat run(swapadvanced_search: true→search_depth: deep).How this was tested
uv run pytest sources/nimble_web_search— 32 passed, credential-free (the SDK is mocked; no live network in CI).uv run pytest sources/exa_web_search sources/nimble_web_search— 46 passed, confirming the new package co-runs cleanly with a sibling source. The test module has a unique name and notests/__init__.py, so there's no pytest collection collision when sources are collected together.ruff checkandruff format --check— clean (whole repo).uv lock --check— no drift.detect-secrets,markdown-link-check(all README/docs links resolve),end-of-file-fixer,trailing-whitespace,check-added-large-files, anduv-lock— matching theAIQ CIlint job.nat info components --types functionlistsnimble_web_search(1.0.0) next toexa_web_searchandtavily_web_search, so_type: nimble_web_searchresolves in a workflow.langchain-nimble==3.0.0+nimble-python==0.18.0install and import cleanly in a fresh environment the same waydeploy/Dockerfileinstalls them, so_type: nimble_web_searchresolves in built images — not only in editable dev installs.NIMBLE_API_KEYacrossliteanddeep, plus the non-enterprisefastpath (returns the friendly 403 entitlement message). Output is redacted; no key is logged by construction.Coverage: config defaults / all fields / invalid-enum rejection (incl.
focus) / out-of-range numeric fields rejected /focusdefaults togeneraland reaches the SDK / non-defaultfocuspassthrough /include_answerabsent from config and kwargs /FunctionBaseConfiginheritance, the missing-key stub + warn-once, key-from-config env hydration, result rendering + description fallback, markup escaping of untrusted fields,search_depthandcountry/localepassthrough, query and content truncation (incl. small-limit hard-cut), empty-result handling, retry-then-succeed, non-transient (401/403) errors short-circuiting without retry, final-retry failure, and the 401 / 403 branches.How this was reviewed
Deviations from the Exa source (all deliberate)
search_depth(3-value enum), a typedfocusmode (defaultgeneral), pluscountry/locale, mirroringlangchain-nimble's surface, where Exa exposessearch_type/full_text/highlights.focusis a workflow-config setting, not an agent parameter, so general research queries cannot drift tonews.descriptionwhenpage_contentis empty — Nimble'slitemode returns metadata only.include_answer(answer generation) is intentionally not exposed in this initial integration. It can be added in a follow-up.url,title, body) are HTML-escaped before rendering into the<Document>markup, so a result can't break the block or inject into downstream parsers.max_results1-100(matchinglangchain-nimble's ownge=1, le=100),max_retriesge=1,max_content_lengthge=1(useNoneto disable truncation). Invalid values fail at config-parse time, and content truncation hard-cuts safely for very small limits.Known limitations
max_resultsis a soft cap — Nimble may return up to N+2 documents for N. The provider returns them all; downstream consumers can slice.litemode returns emptypage_content; the provider renders thedescription(~150 chars, organic-result quality).fastpath is characterized via its 403 message; the enterprisefastbehavior itself isn't exercised here.Scope
In: the
nimble_web_searchprovider, config/docs/deploy wiring, 32 unit tests, README, troubleshooting rows.Not in (easy follow-ups): Nimble Extract / Map / Crawl / Agents;
include_answer; framework integrations beyond AI-Q's data-source mechanism; any change to the default provider.Security
deploy/.env.examplecarries a commentedNIMBLE_API_KEY=placeholder only.SecretStrconfig field; never logged.