v0.3.11
A Claude Code plugin that makes Claude better at n8n. When it detects you're working on
n8n, hooks automatically recall curated n8n knowledge (docs, GitHub issues with status,
community workarounds, node specs) and inject it as context — no web-search permissions, no
MCP server, no API keys. When Claude writes a workflow JSON file, an optional PostToolUse
hook validates it against the n8n-mcp validation engine, feeds the errors back, and lets
Claude fix them in the same turn.
The knowledge is served by a hosted Hindsight
memory instance (bank n8n, 245k+ memories). Validation is served either by a cloud
validator microservice or by a local n8n-mcp install you point it at. Both the knowledge
service and the validator are open source and self-hostable — see
n8n-hindsight.
Privacy: auto-recall sends your prompt text to the author's hosted recall endpoint when n8n context is detected, and the optional validator can send your workflow JSON to a cloud service. Read What data leaves your machine before installing. The validator can run fully local (point it at a local
n8n-mcpinstall). The knowledge database is not local — recall is served from the author's hostedn8nHindsight bank. The n8n-hindsight stack is open source and self-hostable, but reproducing a populated bank of this size is substantial work, so in practice recall talks to the hosted endpoint unless you stand up and fill your own instance.
In Claude Code, add this repo as a plugin marketplace, then install:
/plugin marketplace add https://github.com/dbenn8/n8n-knowledge
/plugin install n8n-knowledge@n8n-knowledge
/reload-pluginsNo setup, API keys, or configuration required to start. The plugin ships with the node lookup dictionary checked in and points at the hosted knowledge service by default.
See it working. The plugin injects context straight into Claude's turn, where only the model sees it. To watch exactly what it pulls in, tail its debug log in a second terminal:
tail -f ~/.cache/n8n-knowledge/debug.logEvery n8n-related prompt shows the docs, node specs, and known-bug warnings it injected. The
log is owner-only (mode 0600) under your cache dir; the debugRecall option controls
verbosity (summary by default, full for everything).
To work on the plugin from a clone, add it as a local marketplace instead:
git clone https://github.com/dbenn8/n8n-knowledge.git
# In Claude Code:
# /plugin marketplace add /path/to/n8n-knowledge
# /plugin install n8n-knowledge@n8n-knowledge-localThree repos, one system. This plugin is the client; the other two are the backend.
flowchart TD
subgraph local["Your machine — Claude Code"]
UP["UserPromptSubmit hook<br/>detect-n8n + keyword gate"]
NL["node_lookup.py<br/>3,591-entry dictionary"]
PT["PostToolUse hook<br/>workflow JSON validation<br/>(optional, off by default)"]
BS["PostToolUse backstop<br/>mid-turn refresh"]
end
subgraph svc["n8n-hindsight — knowledge service (hosted by author)"]
REC["/public/recall<br/>(unauthenticated, rate-limited)"]
BANK["Hindsight bank: n8n<br/>245k+ memories"]
end
subgraph val["n8n-validator — validation microservice"]
VW["/public/validate-workflow"]
VH["/public/validator-health<br/>versions + nodes_content_sha256"]
end
LOCALV["Local n8n-mcp install<br/>(EVAL_PLUGIN_VALIDATOR_MODE=local)"]
UP -->|"prompt text + tag filters"| REC
NL --> UP
BS -->|"fresh-keyword query"| REC
REC --> BANK
PT -->|"workflow JSON (cloud mode)"| VW
PT -->|"workflow JSON (local mode)"| LOCALV
PT -.->|"preflight parity check"| VH
VH -.->|"hash/version match → run<br/>mismatch → fail closed"| PT
%% Layout only: force the three sections to stack vertically (top → bottom)
%% instead of an L-shape, so nothing sits in the bottom-right corner where
%% GitHub's diagram zoom/pan controls render and would otherwise cover it.
local ~~~ svc
svc ~~~ val
val ~~~ LOCALV
- Recall path: prompt text (+ tag filters for detected node names) goes to
/public/recall, which serves from then8nHindsight bank. The endpoint is unauthenticated and rate-limited; the key is injected server-side by nginx. - Validation path: workflow JSON goes to
/public/validate-workflow(cloud) or a localn8n-mcpinstall (local mode). Before an eval run trusts a validator, it compares the validator'snodes_content_sha256and engine versions against its own via/public/validator-healthand fails closed on mismatch — so plugin-time validation and post-hoc scoring can never silently use different node data.
- Auto-recall — detects n8n keywords in your messages and injects relevant docs, issues, and community solutions as context (~5 results, sub-second).
- Manual recall —
/n8n-knowledgesearches deeper when auto-recall didn't trigger (~20 results). - Confidence scoring — each result annotated HIGH/MEDIUM/LOW based on source type and engagement metrics (votes, likes, views, solved status), with user-configurable thresholds.
- GitHub issue state — every GitHub result is prefixed with its canonical state, e.g.
[OPEN]or[CLOSED·completed·2026-02-26]/[CLOSED·not_planned·…]. The model is warned that[CLOSED·completed]usually means already fixed and[CLOSED·not_planned]means n8n won't fix it — so it never builds a workaround for a bug that's already resolved. - Backstop recall — refreshes n8n context during an agentic turn (after Edit/Write/Task), not just on your prompt — gated, deduped, and capped. See Backstop recall.
- Source citations — every result links to the specific doc page, GitHub issue, or community post.
- Node-name detection — identifies n8n node names mentioned in prompts via a
3,591-entry lookup dictionary (
hooks/lib/node_lookup_data.json) covering name variants for official and community nodes. Handles trigger-intent detection ("listen for Gmail events" →gmailTrigger), camelCase splitting ("httpRequest" → "http request"), and compound service names ("sentryIo" → "sentry"). - Structured node-spec recall — when a node name is detected, issues a parallel
tag-filtered recall (
type:node-spec+node:<type>) returning the node's operations, fields, types, and defaults, rendered as compactkind="node-spec"blocks at HIGH confidence. - 13,000+ node-spec units — n8n-mcp's
nodes.dbships 1,851 nodes; these are split into 13,000+ per-resource, per-operation spec units in the knowledge bank. Large multi-resource nodes like Slack (44 ops), Salesforce (65 ops), and Gmail (26 ops) are split so each operation's fields are individually recallable. - 28 official workflow examples — node-level wiring context, topology maps, and full importable JSON. Sticky notes and source JSON are suppressed from auto-recall (available via manual recall to avoid context bloat).
When enabled (Enable Workflow Validation), a PostToolUse hook fires after Claude writes or
edits a workflow JSON file:
- runs only on plugin-side
Edit/Writeevents, on workflow JSON only; - validates via the routing settings below (
local,cloud, ordefault); - injects the validator's errors back into the turn as additional context, with targeted edit guidance (parameter paths, allowed enum values) and a completeness gate so Claude fixes the workflow before declaring it done;
- caps validator calls per session (
Workflow Validation Max Calls, default3).
This hook is plugin-side only. It does not affect the eval harness conditions or the local post-hoc validation scripts.
- n8n codebase —
package.jsonwith an n8n dependency,.n8n.jsonconfig, a README mentioning "n8n", or workflow JSON files ({"name":"...","nodes":[...]}). - n8n consumer —
docker-compose.ymlreferencing n8n. - Keyword gating — broad keywords (workflow, node, trigger, webhook, …) fire in n8n
projects; only the explicit token
n8nfires in consumer repos. Zero noise in non-n8n projects.
This is the trust section. Plainly:
-
Your prompt text is sent to the author's hosted recall endpoint (
https://n8nhindsight.applikuapp.com/public/recall) whenever n8n context is detected (auto-recall on your message, and backstop recall after Edit/Write/Task during a turn). That endpoint is unauthenticated and rate-limited — it is the author's personal hosted Hindsight instance, not an official n8n service. If you don't want your prompts leaving your machine, disable auto-recall and backstop recall, or self-host the service (see below). -
Your workflow JSON is sent to the cloud validator (
https://n8nvalidator.applikuapp.com/public/validate-workflow) when the optional workflow validation hook runs in cloud or default mode and no local validator is found. Inlocalmode (or default mode with a localn8n-mcpinstall present), validation runs entirely on your machine and no workflow JSON leaves it. -
Nothing else. No credentials, no file contents beyond the workflow JSON you asked it to validate, no telemetry.
-
Debug log: injected context is written locally to
~/.cache/n8n-knowledge/debug.logwhendebugRecallissummary(default) orfull. Set it tooffto disable. Inspect exactly what's being injected with:tail -f ~/.cache/n8n-knowledge/debug.log
- Local-only validation: set
EVAL_PLUGIN_VALIDATOR_MODE=local(orvalidator_mode: localin.claude/n8n-knowledge.local.md) to require a localn8n-mcpinstall and keep workflow JSON on your machine. The plugin auto-detects the defaultn8n-mcproot under~/.npm/_npx/.../node_modules/n8n-mcp, or you can point it explicitly withvalidator_local_path. - Disable network recall: turn off
enableAutoRecallandenableBackstopRecallto stop all prompt text from leaving your machine. (You lose recall, obviously.) - Self-host the whole backend: the knowledge service and the validator are open source.
See n8n-hindsight — it includes the sync pipeline,
the ops-proxy, the validator microservice, the nginx config, and the Appliku deploy. Stand up
your own instance and point
validator_cloud_urlat it.
The plugin is benchmarked head-to-head against the community n8n-mcp server on a 128-prompt workflow-generation battery — same prompts, same model, same scoring; the only variable is the tool. Every generated workflow is validated by the n8n-mcp validation engine (an independent open-source project, not n8n itself), a blinded Claude Opus judge scores intent-fidelity, and 28 deterministic rules — one per known bug — check known-bug avoidance straight from the workflow JSON, so anyone can audit it. Judge where you must, deterministic where you can. Basis: newest run per prompt, integrity-cleaned. Snapshot: June 24, 2026.
Read it as a funnel — each stage a stricter bar than the last:
- valid% — passes the n8n-mcp validator (it would import)
- correct% — valid and does what the prompt asked (blinded Opus judge)
- works% — correct and designs around the relevant known n8n bug, so it won't silently fail in production — the headline metric
- pitfall% — of the 28 known-bug prompts, the share the workflow designed around (scored by 28 deterministic rules, not the judge)
Claude Sonnet 4.6 — a full 128-prompt run on the current shipped plugin:
| Condition | valid% | correct% | works% | pitfall% | $/run | turns | time (mean / median) |
|---|---|---|---|---|---|---|---|
| plugin (gate-ON, ship default) | 94% | 93% | 80% | 39% | $0.752 | 9.8 | 375s / 269s |
| n8n-mcp | 72% | 70% | 59% | 32% | $1.256 | 19.4 | 601s / 367s |
| raw model — no tools (78/128) | 26% | 26% | 26% | 29% | $0.58 | 3.7 | 359s / 156s |
DeepSeek v4 Flash — latest available per prompt:
| Condition | valid% | correct% | works% | pitfall% | $/run | turns | time (mean / median) |
|---|---|---|---|---|---|---|---|
| plugin (gate-ON, ship default) | 92% | 75% | 67% | 46% | $0.024 | 27.5 | 430s / 321s |
| n8n-mcp | 79% | 70% | 62% | 36% | $0.033 | 38.1 | 347s / 265s |
| raw model — no tools | 9% | — | — | 36% | $0.013 | 10.2 | 228s / 177s |
DeepSeek v4 Pro — clean v4 Pro run (gate-ON vs n8n-mcp):
| Condition | valid% | correct% | works% | pitfall% | $/run | turns | time (mean / median) |
|---|---|---|---|---|---|---|---|
| plugin (gate-ON, ship default) | 98% | 79% | 68% | 36% | $0.044 | 16.9 | 357s / 251s |
| n8n-mcp | 80% | 72% | 60% | 32% | $0.059 | 30.6 | 283s / 218s |
On the headline works%, the plugin's default beats n8n-mcp by +21pp on Claude (80 vs 59), +5pp on DeepSeek Flash (67 vs 62), and +8pp on DeepSeek Pro (68 vs 60) — while running ~40% cheaper and with ~50% fewer tool turns on Claude. The edge holds across all three cohorts, not just one.
The raw model — no tools row is the status-quo baseline: the same model, same prompts, no plugin and no n8n-mcp. It's where most people start, and it produces a valid, working workflow only ~1-in-4 times (Claude 26%, DeepSeek Flash 9%) — versus 80% works with the plugin. The Claude baseline is measured over 78/128 prompts (all three difficulty groups sampled: a 48%, b 64%, c 73%); weighting each group's rate up to the full corpus lands at ~26% too, so the sample is representative.
Honest caveats:
- Validator ≠ live import. "Valid" means it passes the independent n8n-mcp validator, not that it executed on a live n8n instance — a disclosed trade-off for reproducibility.
- Known-bug provenance. Some bug-prompts share a corpus with the catalog the plugin recalls from, so pitfall% flatters the plugin. Reported as a directional signal, not a clean win.
- The judge is an LLM. The Opus judge is blinded and cached, and scores intent only — pitfall avoidance is scored separately by the 28 deterministic rules, never the judge.
- DeepSeek Flash vs Pro are separate cohorts. Flash numbers are latest-per-prompt across the repriced v4 Flash history; Pro is a clean v4 Pro run. They're reported in separate tables, not pooled.
Reproduce it: the harness, prompt set, and scoring live in scripts/eval/. A
fuller write-up with methodology is in the eval case study.
| Setting | Default | Description |
|---|---|---|
enableAutoRecall |
true |
Auto-recall on every message. Disable for manual-only (saves tokens, stops prompt text leaving your machine). |
showRecallResults |
true |
When enabled, Claude cites the knowledge base. When disabled, Claude uses the context silently. |
enableWorkflowValidation |
false |
Plugin-side validation after Claude writes/edits workflow JSON. |
workflowValidationMaxCalls |
3 |
Max plugin-side validator calls per session. |
enableBackstopRecall |
true |
Refresh n8n context during agent reasoning (after Edit/Write/Task). |
backstopRecallCap |
4 |
Max backstop recalls per session. |
backstopRecallMaxTokens |
8000 |
Returned-context size cap per backstop recall. |
backstopRecallBudget |
high |
Hindsight recall effort: low, mid, or high. |
validatorMode |
default |
Validator routing: local, cloud, or default (prefer local n8n-mcp, fall back to cloud). |
validatorCloudUrl |
"" |
Cloud validator endpoint URL. |
validatorLocalPath |
"" |
Override the local n8n-mcp install root (blank = auto-detect). |
debugRecall |
summary |
Local debug output to ~/.cache/n8n-knowledge/debug.log: off, summary, full. |
enableSubagentInjectionexists but is work-in-progress and unverified — leave it off.
Auto-recall only fires on your message (UserPromptSubmit). But a long agentic turn drifts:
by the time Claude has read files, edited code, and spun up subagents, the original recall
context may be stale. Backstop recall fills that gap:
- After
Edit/Write/Task— aPostToolUsehook inspects what Claude just wrote, extracts a fresh-keyword-anchored query, and injects a new<result>block asadditionalContext. Topics already covered this session are skipped, and recalls are capped per session.
It complements auto-recall rather than replacing it: auto-recall covers the user's question, backstop recall covers where the work actually goes.
Inside an n8n codebase, recall fires on a set of broad keywords (in consumer repos, only the
explicit token n8n triggers it). The built-in default list is:
workflow, node, trigger, webhook, credential, expression, execution
triggerKeywords customizes this. The token DEFAULTS expands inline to the built-in list:
- Extend —
DEFAULTS, mynode→ the built-ins plusmynode. - Replace —
workflow, node, mything→ exactly these three. - Reset — leave blank (or include
DEFAULTS) to use the built-in list.
Each auto-recalled result gets a confidence score based on source type, engagement metrics, and
resolution signals. Tune it per project via .claude/n8n-knowledge.local.md. All fields are
optional — only override what you want to change.
---
high_threshold: 70
medium_threshold: 50
docs_base: 80
github_base: 49
community_base: 40
clear_signal_bonus: 25
author_member_bonus: 5
solved_bonus: 25
high_engagement_threshold: 10
high_engagement_bonus: 20
medium_engagement_threshold: 3
medium_engagement_bonus: 10
high_views_threshold: 500
views_bonus: 5
max_results: 5
max_low_results: 1
max_text_length_high: -1
max_text_length_medium: 800
max_text_length_low: 300
---Add .claude/*.local.md to your .gitignore.
You can inspect the resolved validator choice with:
python3 hooks/lib/resolve_validator_target.py "$PWD"UserPromptSubmithook fires on every message.detect-n8n.shchecks if the message is n8n-related (multi-signal repo detection + keyword matching).node_lookup.pyidentifies node names in the prompt for structured recall.recall.shcurls/public/recall(semantic);structured_recall.shcurls with tag filters (node specs).- Results merged (node specs prepended), scored by
format_results.py, and injected asadditionalContext. - Debug output written to
~/.cache/n8n-knowledge/debug.logunlessdebugRecallisoff.
No MCP server. No daemon. No dependencies beyond bash, curl, and the Python stdlib.
When a new n8n version ships with updated nodes:
bash scripts/refresh-node-lookup.shThis fetches the latest n8n-mcp package, regenerates the node dictionary, and runs validation
tests. The dictionary is checked into the repo so users don't need to run this themselves.
bash tests/run-all.sh221 assertions across 15 test files (including a 75-test pytest suite for the Python helper libraries), all passing: auto-recall, detection, recall formatting, node lookup, structured recall, lookup integrity, GitHub state, observation scoring, backstop recall, workflow validation, bridge resolution, cross-repo hash parity, hook JSON helpers, and recall endpoint resolution.
- Workflow scoring — workflow example units currently score LOW in auto-recall; need their own scoring path.
- Richer workflow tags — trigger type, complexity, use-case, integration tags for better matching.
- More workflow sources — expand beyond the 28 official docs examples to the template library.
- Public retain with trust tiers — community contributions weighted by Discourse trust level.
- Prompt injection filtering — pre-filter + LLM classifier on community content before ingestion.
PRs welcome. The knowledge base is public and auto-syncs nightly. To improve the plugin:
- Fork the repo
- Make changes
- Run
bash tests/run-all.shto verify - Open a PR
MIT — see LICENSE.