diff --git a/CLAUDE.md b/CLAUDE.md index 5c893b3..ece1767 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -3,6 +3,17 @@ Conventions for authoring this skill. This governs how skill content is **written** and **validated**. +# General rules + +Never open responses with filler phrases like "Great question!", "Of course!", "Certainly!", or similar warmups. Start every response with the actual answer. No preamble, no acknowledgment of the question. + +Match response length to task complexity. Simple questions get direct, short answers. Complex tasks get full, detailed responses. Never pad responses with restatements of the question or closing sentences that repeat what you just said. + +Before any significant task, show me 2-3 ways you could approach this work. Wait for me to choose before proceeding. + +If you are uncertain about any fact, statistic, date, or piece of technical information: say so explicitly before including it. Never fill gaps in your knowledge with plausible-sounding information. When in doubt, say so. + + ## Writing style - **Be concise.** Technical documentation, not an essay. Favor tables, command recipes, and short @@ -20,6 +31,36 @@ Conventions for authoring this skill. This governs how skill content is **writte - **Anchor to canonical docs.** Each reference doc cites the upstream CrowdSec docs URL it derives from. Claims trace to canonical documentation, not to memory. +## Content structure + +`SKILL.md` is the router — a symptom/intent-indexed table that points into `references/`. +All depth lives in `references//`, organized by the axis that fits the area: + +| Dir | Organized by | Notes | +|---|---|---| +| `install/` | **platform** (one file each) | `bare-metal.md` (apt/dnf + systemd), `docker.md`, `kubernetes.md`, `console.md` (enrollment) — install mechanics genuinely diverge per platform. | +| `configure/` | **config domain** | `acquisition`, `hub`, `profiles`, `notifications`, `allowlists`; platforms merged inline. `configure/bouncers/` nests one level by **service type** (`firewall`, `web-servers`). | +| `operate/` | **task** | `health-check`, `upgrades`, `multi-server`. | +| `appsec/` | **lifecycle** | `overview` → `deploy` → `configure` → `troubleshoot` (the WAF/AppSec feature silo). | +| `debug/` | **kind** | `common/` (`triage`, `errors`, `platform-gotchas`) + `symptoms/` (`parsing`, `no-alerts`, `not-blocked`). Feature troubleshooting is *routed to* the feature's own dir (e.g. AppSec → `appsec/troubleshoot.md`), not duplicated under debug/. | +| `migrate/` | **source product** | `from-fail2ban`. | +| `scripts/` | — | helper scripts (`diagnose.sh`, `check-verification.py`); stdlib/bash only, runnable in static checks. | + +**Split files vs inline the prefix.** When deciding whether a platform variant gets its own file: + +- **Split into separate files** only when the *content itself* diverges — package managers, file + paths, install/upgrade mechanics. `install/` is the canonical case. +- **Keep one file with inline command-prefix notes** when the task is identical and only the + invocation differs (`sudo cscli …` → `docker exec …` → `kubectl exec -n -- …`). + This is the default across `configure/`, `operate/`, `appsec/`, and `debug/`. +- **Genuinely platform-specific *failure modes*** (not just prefixes — e.g. container mounts, + SELinux/AppArmor, k8s RBAC) collect in one place (`debug/common/platform-gotchas.md`) rather than + fragmenting a single symptom across per-platform files. + +**Keep this current.** When you add, move, or remove a `references/` directory — or change an +area's organizing axis — update the table above in the *same* change. This section is the +authoritative map of the layout; let it drift and it stops being trustworthy. + ## Testing - **Nothing ships unverified.** Every command and every expected outcome must have been diff --git a/skills/crowdsec/SKILL.md b/skills/crowdsec/SKILL.md index 7a3b532..87e4898 100644 --- a/skills/crowdsec/SKILL.md +++ b/skills/crowdsec/SKILL.md @@ -75,11 +75,13 @@ Docker/k8s commands run inside the container/pod and do not need this. | "upgrade", "back up", "roll back", "new version", "tainted items after upgrade" | [references/operate/upgrades.md](./references/operate/upgrades.md) | | "multiple agents", "remote LAPI", "mTLS", "postgres backend" | [references/operate/multi-server.md](./references/operate/multi-server.md) *(TODO — stub)* | | "is it working?", "smoke test", "validate install", "verify setup", "did detection / WAF / blocking actually wire up?" | [references/operate/health-check.md](./references/operate/health-check.md) | -| "it's broken" / "not working" / general diagnosis | [references/debug/triage.md](./references/debug/triage.md) → run `bash ${CLAUDE_SKILL_DIR}/scripts/diagnose.sh` | -| "logs not parsed", "0 parsed" | [references/debug/parsing.md](./references/debug/parsing.md) | -| "no alerts firing" | [references/debug/no-alerts.md](./references/debug/no-alerts.md) | -| "decision exists but not blocked" | [references/debug/bouncer-not-blocking.md](./references/debug/bouncer-not-blocking.md) | -| Specific error message | [references/debug/common-errors.md](./references/debug/common-errors.md) | +| **Debug — common** · "it's broken" / "not working" / general diagnosis | [references/debug/common/triage.md](./references/debug/common/triage.md) → run `bash ${CLAUDE_SKILL_DIR}/scripts/diagnose.sh` | +| **Debug — common** · specific error string | [references/debug/common/errors.md](./references/debug/common/errors.md) | +| **Debug — common** · "container can't see logs", "mount", "SELinux/AppArmor denied", "k8s RBAC / DaemonSet" | [references/debug/common/platform-gotchas.md](./references/debug/common/platform-gotchas.md) | +| **Debug — by symptom** · "logs not parsed", "0 parsed" | [references/debug/symptoms/parsing.md](./references/debug/symptoms/parsing.md) | +| **Debug — by symptom** · "no alerts firing" | [references/debug/symptoms/no-alerts.md](./references/debug/symptoms/no-alerts.md) | +| **Debug — by symptom** · "decision exists but not blocked" | [references/debug/symptoms/not-blocked.md](./references/debug/symptoms/not-blocked.md) | +| **Debug — by feature** · AppSec/WAF not blocking, false positives, captcha | [references/appsec/troubleshoot.md](./references/appsec/troubleshoot.md) | | "switch from fail2ban" | [references/migrate/from-fail2ban.md](./references/migrate/from-fail2ban.md) *(TODO — stub)* | For anything debug-shaped, the first move is almost always: @@ -134,7 +136,7 @@ Where things live on a default bare-metal install: Confirm with the user before any of these: - `cscli decisions delete --all` — wipes every active ban including CAPI-pulled blocklists. Use targeted `delete -i`, `delete -r`, `delete --id`, `delete --origin lists --scenario `. -- Editing hub-managed files under `/etc/crowdsec/{parsers,scenarios,collections,postoverflows,contexts}/` instead of the sibling `_custom/` directory — see [references/debug/triage.md](./references/debug/triage.md) § Hard don'ts. +- Editing hub-managed files under `/etc/crowdsec/{parsers,scenarios,collections,postoverflows,contexts}/` instead of the sibling `_custom/` directory — see [references/debug/common/triage.md](./references/debug/common/triage.md) § Hard don'ts. - Disabling a signature collection wholesale to silence a false positive — pick the right suppression layer (allowlist / whitelist parser / postoverflow) per [references/configure/allowlists.md](./references/configure/allowlists.md) § Suppression mechanisms. - Mutating host firewall state (firewall bouncer install, `ipset` flush, iptables↔nftables switch) without confirming — the firewall bouncer can wipe rule chains other tools depend on. - Skipping `--reset-then-reuse-values` on `helm upgrade crowdsec` — silently drops values. diff --git a/skills/crowdsec/references/configure/acquisition.md b/skills/crowdsec/references/configure/acquisition.md index 668303f..840441a 100644 --- a/skills/crowdsec/references/configure/acquisition.md +++ b/skills/crowdsec/references/configure/acquisition.md @@ -14,7 +14,7 @@ Acquisition tells the engine **what logs to read and how to label them**. Each s declares a `source:` (the datasource type) and a `labels.type:` (the parser hint). If the engine reads lines but they show up as **`Lines unparsed`**, acquisition is usually fine and the problem is the `type:` or the parser — debug that with -[../debug/parsing.md](../debug/parsing.md). If a source shows **0 `Lines read`**, the +[../debug/symptoms/parsing.md](../debug/symptoms/parsing.md). If a source shows **0 `Lines read`**, the problem is here. ## Where acquisition lives diff --git a/skills/crowdsec/references/configure/bouncers/firewall.md b/skills/crowdsec/references/configure/bouncers/firewall.md index 00f5bef..33ef479 100644 --- a/skills/crowdsec/references/configure/bouncers/firewall.md +++ b/skills/crowdsec/references/configure/bouncers/firewall.md @@ -64,7 +64,7 @@ Only register manually when the bouncer runs on a **different host** than LAPI > `/var/log/crowdsec-firewall-bouncer.log` (and the dpkg `--configure` step errors). > Re-register: `cscli bouncers delete `, `KEY=$(cscli bouncers add fw-local -o raw)`, > write it into the yaml's `api_key:`, `systemctl restart crowdsec-firewall-bouncer`. -> See [../../debug/bouncer-not-blocking.md](../../debug/bouncer-not-blocking.md) § 3. +> See [../../debug/symptoms/not-blocked.md](../../debug/symptoms/not-blocked.md) § 3. ## 3 — What it creates in nftables @@ -140,7 +140,7 @@ sudo cscli decisions delete -i 192.0.2.66 container-to-container blocking matters. - **"Banned but still reachable"** → almost always `update_frequency` not elapsed, `disable_ipv6` masking a v6 client, or the bouncer service stopped. - Full decision tree: [../../debug/bouncer-not-blocking.md](../../debug/bouncer-not-blocking.md). + Full decision tree: [../../debug/symptoms/not-blocked.md](../../debug/symptoms/not-blocked.md). ## Teardown diff --git a/skills/crowdsec/references/configure/bouncers/web-servers.md b/skills/crowdsec/references/configure/bouncers/web-servers.md index a0e578c..2b922d1 100644 --- a/skills/crowdsec/references/configure/bouncers/web-servers.md +++ b/skills/crowdsec/references/configure/bouncers/web-servers.md @@ -340,7 +340,7 @@ docker exec crowdsec cscli metrics show appsec # Processed/Blocked increment - **WAF off silently:** `crowdsecAppsecEnabled` defaults to `false`, and AppSec must listen on `0.0.0.0:7422` (not loopback) for a containerized Traefik to reach it. - **`stream` lag:** a fresh ban lands within `updateIntervalSeconds`; immediate ban-then-curl - looks like a failure. (See [../../debug/bouncer-not-blocking.md](../../debug/bouncer-not-blocking.md).) + looks like a failure. (See [../../debug/symptoms/not-blocked.md](../../debug/symptoms/not-blocked.md).) ### Kubernetes (Helm) — extra gotchas diff --git a/skills/crowdsec/references/configure/hub.md b/skills/crowdsec/references/configure/hub.md index 374eba4..00b8d41 100644 --- a/skills/crowdsec/references/configure/hub.md +++ b/skills/crowdsec/references/configure/hub.md @@ -101,7 +101,7 @@ editing them taints the item and your change is lost on the next `--force` upgra Instead, drop an override file in the sibling `_custom/` directory for that type (`scenarios/.../_custom/`, `parsers/.../_custom/`, etc.). Overrides are merged on top of the hub item by `name`, survive upgrades, and keep the hub item pristine. See -[../debug/triage.md](../debug/triage.md) § Hard don'ts and the SKILL.md Hard don'ts list. +[../debug/common/triage.md](../debug/common/triage.md) § Hard don'ts and the SKILL.md Hard don'ts list. To remove a collection and its pulled items: diff --git a/skills/crowdsec/references/debug/common-errors.md b/skills/crowdsec/references/debug/common/errors.md similarity index 84% rename from skills/crowdsec/references/debug/common-errors.md rename to skills/crowdsec/references/debug/common/errors.md index fc590b6..52d2029 100644 --- a/skills/crowdsec/references/debug/common-errors.md +++ b/skills/crowdsec/references/debug/common/errors.md @@ -30,17 +30,17 @@ Match the error string the engine/bouncer printed to the row below. | Error string | Cause | Fix | |---|---|---| -| `datasource of type appsec: … cannot parse appsec configuration: [2:3] cannot unmarshal []interface {} into Go struct field Configuration.AppsecConfig of type string` | `appsec_config:` (singular) given a **list** | Use the **plural** key `appsec_configs:` for a list; singular takes one string. See [../appsec/configure.md](../appsec/configure.md). | -| `unable to initialize inband engine : invalid WAF config from string: failed to compile the directive "secrule": duplicated rule id 100` | Two appsec-configs on one listener pull the **same** underlying rule (e.g. both include `base-config`/`vpatch-*`) | Use non-overlapping configs, or just `crowdsecurity/appsec-default` alone. See [../appsec/configure.md](../appsec/configure.md). | -| `no appsec-rules found for pattern ` | A bare appsec-config was installed without its rules; engine expands globs at load, `cscli` does not | Install via the **collection** (`cscli collections install crowdsecurity/appsec-virtual-patching`), which pulls the rule graph. See [../appsec/deploy.md](../appsec/deploy.md). | +| `datasource of type appsec: … cannot parse appsec configuration: [2:3] cannot unmarshal []interface {} into Go struct field Configuration.AppsecConfig of type string` | `appsec_config:` (singular) given a **list** | Use the **plural** key `appsec_configs:` for a list; singular takes one string. See [../appsec/configure.md](../../appsec/configure.md). | +| `unable to initialize inband engine : invalid WAF config from string: failed to compile the directive "secrule": duplicated rule id 100` | Two appsec-configs on one listener pull the **same** underlying rule (e.g. both include `base-config`/`vpatch-*`) | Use non-overlapping configs, or just `crowdsecurity/appsec-default` alone. See [../appsec/configure.md](../../appsec/configure.md). | +| `no appsec-rules found for pattern ` | A bare appsec-config was installed without its rules; engine expands globs at load, `cscli` does not | Install via the **collection** (`cscli collections install crowdsecurity/appsec-virtual-patching`), which pulls the rule graph. See [../appsec/deploy.md](../../appsec/deploy.md). | | `no such datasource` / source type unknown | `source:`/`labels.type:` typo or a datasource the build doesn't support | Fix the key in the `acquis.d/*.yaml`; `crowdsec -t` points at the file:line. | -| Source reads lines but **0 parsed** | `type:` label doesn't match any installed parser | [parsing.md](./parsing.md). | +| Source reads lines but **0 parsed** | `type:` label doesn't match any installed parser | [parsing.md](../symptoms/parsing.md). | ## Permissions / OS | Symptom | Cause | Fix | |---|---|---| -| `permission denied` opening a log file; or source present but 0 lines read | `crowdsec` user can't read the file | `sudo -u crowdsec head `; fix ownership/ACL. If that user *can* read it but the engine still can't, it's **SELinux/AppArmor** — `ausearch -m avc -ts recent` / `dmesg | grep DENIED`, then relabel/add policy (don't disable enforcement). | +| `permission denied` opening a log file; or source present but 0 lines read | `crowdsec` user can't read the file | `sudo -u crowdsec head `; fix ownership/ACL. If that user *can* read it but the engine still can't, it's **SELinux/AppArmor** → [platform-gotchas.md](./platform-gotchas.md). | | apt install of a bouncer hangs: `Failed to open terminal … debconf: whiptail output the above errors, giving up!` | A debconf dialog (e.g. pending-kernel notice) on a non-interactive shell | Re-run with `sudo DEBIAN_FRONTEND=noninteractive apt install -y …`. | ## LAPI / CAPI / auth @@ -48,14 +48,14 @@ Match the error string the engine/bouncer printed to the row below. | Error | Cause | Fix | |---|---|---| | Agent: `unable to authenticate … machine not validated` | Agent machine not registered/validated with LAPI | `cscli machines list`; validate with `cscli machines validate ` (or re-`cscli machines add` on the agent). | -| Bouncer log: **HTTP 401** on decision pull | Bouncer key ≠ LAPI key (rotated, stale config, re-added) | `cscli bouncers list`; re-issue and paste the key into the bouncer config. [bouncer-not-blocking.md](./bouncer-not-blocking.md) §3. | +| Bouncer log: **HTTP 401** on decision pull | Bouncer key ≠ LAPI key (rotated, stale config, re-added) | `cscli bouncers list`; re-issue and paste the key into the bouncer config. [not-blocked.md](../symptoms/not-blocked.md) §3. | | `cscli capi status` fails / CAPI register errors | Missing `online_api_credentials.yaml`, **clock skew**, or egress blocked to `api.crowdsec.net` | `cscli capi register` then reload; check `timedatectl` (TLS fails on skew); allow egress / set proxy. | ## Database | Error | Cause | Fix | |---|---|---| -| `database is locked` (sqlite) | Concurrent writers / slow disk; sqlite single-writer | Reduce write pressure; move `crowdsec.db` to faster storage; for multi-agent or high volume switch the backend to PostgreSQL — see [../operate/multi-server.md](../operate/multi-server.md). | +| `database is locked` (sqlite) | Concurrent writers / slow disk; sqlite single-writer | Reduce write pressure; move `crowdsec.db` to faster storage; for multi-agent or high volume switch the backend to PostgreSQL — see [../operate/multi-server.md](../../operate/multi-server.md). | | sqlite errors + `df` shows full `/var/lib/crowdsec` | Disk full → silent alert-write failure | Free space / rotate; alerts resume. | ## Hub @@ -69,10 +69,10 @@ Match the error string the engine/bouncer printed to the row below. | Symptom | Likely cause | Confirm | |---|---|---| -| Expected ban "not happening" for an IP | The IP matches an **allowlist** | `cscli allowlists check ` → [../configure/allowlists.md](../configure/allowlists.md). | -| Decision exists, traffic still passes | Bouncer latency / scope / key / IP family | Full ladder: [bouncer-not-blocking.md](./bouncer-not-blocking.md). | +| Expected ban "not happening" for an IP | The IP matches an **allowlist** | `cscli allowlists check ` → [../../configure/allowlists.md](../../configure/allowlists.md). | +| Decision exists, traffic still passes | Bouncer latency / scope / key / IP family | Full ladder: [not-blocked.md](../symptoms/not-blocked.md). | When the string isn't here, capture the full forensic bundle with -[`scripts/diagnose.sh`](../../scripts/diagnose.sh) and read the agent log around +[`scripts/diagnose.sh`](../../../scripts/diagnose.sh) and read the agent log around the first `level=error`/`FATAL` — the *first* error is usually the root cause; later ones are fallout. diff --git a/skills/crowdsec/references/debug/common/platform-gotchas.md b/skills/crowdsec/references/debug/common/platform-gotchas.md new file mode 100644 index 0000000..c682600 --- /dev/null +++ b/skills/crowdsec/references/debug/common/platform-gotchas.md @@ -0,0 +1,75 @@ +# Debug — Platform-specific gotchas + +Canonical docs: + +Most CrowdSec troubleshooting is platform-agnostic — the symptom docs +([parsing](../symptoms/parsing.md), [no-alerts](../symptoms/no-alerts.md), +[not-blocked](../symptoms/not-blocked.md)) apply everywhere and only the command +prefix changes (`sudo cscli …` → `docker exec …` → `kubectl exec -n + -- …`). This page collects the failures that are genuinely **specific to +how the engine is deployed** — the ones that don't reduce to a prefix. + +Reach here from the symptom docs when a check says "0 lines read" or "permission +denied" and the cause turns out to be the platform, not the config. + +## Docker / docker-compose — host log path not mounted in + +The single most common containerised failure: acquisition points at a path that +exists **on the host** but was never bind-mounted into the container, so the +engine reads zero lines. + +Symptom: `cscli metrics show acquisition` shows the source with **0 lines read** +(or the row absent), even though `filenames:`/`type:` look correct. + +Confirm from *inside* the container — the host view lies: + +```bash +docker exec ls -l /var/log/nginx/access.log # No such file ⇒ not mounted +``` + +Fix: add the host log directory to the crowdsec service's `volumes:` (read-only +is fine), e.g. `- /var/log/nginx:/var/log/nginx:ro`, then recreate the +container. The acquisition path inside the container must match the mount target. +See [../../configure/acquisition.md](../../configure/acquisition.md). + +## Kubernetes — mount + container runtime + +Two distinct k8s-only causes: + +- **Path not mounted into the pod** (same class as Docker above). Verify inside: + ```bash + kubectl exec -n -- ls -l + ``` + Pod/container logs live under the node's `/var/log/pods` or + `/var/log/containers`; the agent DaemonSet must hostPath-mount that directory. + +- **Wrong `container_runtime`** → lines read, **0 parsed**. Managed clusters + (recent EKS/GKE/AKS) run **containerd**, not Docker; with the wrong value the + agent reads pod logs in the wrong format and no parser claims them. Set + `container_runtime: containerd` unless nodes genuinely use the Docker runtime; + confirm with `kubectl get nodes -o wide` (CONTAINER-RUNTIME column). See + [../../install/kubernetes.md](../../install/kubernetes.md). + +## systemd / bare-metal — SELinux / AppArmor denials + +When the `crowdsec` user *can* read a log file by hand but the engine still gets +**0 lines read** or `permission denied`, mandatory-access-control is blocking the +service even though POSIX permissions allow it. + +```bash +sudo -u crowdsec head # succeeds ⇒ not a POSIX-perms problem +sudo ausearch -m avc -ts recent # SELinux denials (RHEL/Fedora/Alma/Rocky) +sudo dmesg | grep -i denied # AppArmor denials (Debian/Ubuntu) +``` + +Fix by relabelling / adding policy for the path the engine reads — **do not +disable enforcement** to "make it work". For a non-standard log location, apply +the log file context (e.g. `var_log_t` on SELinux) or extend the crowdsec +AppArmor profile, then retry. + +## journald — group access + +A file-source → journald migration silently reads nothing if the `crowdsec` user +isn't in a group permitted to read the journal (`systemd-journal`, or the unit's +`SupplementaryGroups`). This is a systemd-specific variant of the perms check in +[../symptoms/parsing.md](../symptoms/parsing.md) § Reachability. diff --git a/skills/crowdsec/references/debug/triage.md b/skills/crowdsec/references/debug/common/triage.md similarity index 87% rename from skills/crowdsec/references/debug/triage.md rename to skills/crowdsec/references/debug/common/triage.md index 86c59f5..d957c1e 100644 --- a/skills/crowdsec/references/debug/triage.md +++ b/skills/crowdsec/references/debug/common/triage.md @@ -49,7 +49,7 @@ Read these one at a time and stop at the first anomaly. Each level matches a dee | docker | `docker ps --filter name=crowdsec --format '{{.Names}} {{.Status}}'` | | k8s | `kubectl get pods -A -l app.kubernetes.io/name=crowdsec` | -If not running → `journalctl -u crowdsec -n 200` / `docker logs ` / `kubectl logs `. Then [common-errors.md](./common-errors.md). +If not running → `journalctl -u crowdsec -n 200` / `docker logs ` / `kubectl logs `. Then [errors.md](./errors.md). ### 2. Is acquisition reading anything? @@ -58,9 +58,9 @@ cscli metrics ``` Look at the **Acquisition Metrics** table. For each source you expect: -- **Source row entirely absent** for a service you run (e.g. nginx active, no `file:/var/log/nginx/...` row) → no acquisition feeds it. Cross-check enabled collections vs declared types: `cscli collections list` vs `grep -r 'type:' /etc/crowdsec/acquis.d/`. Classic when the service was installed *after* `cscli setup` ran. See [parsing.md](./parsing.md) § "Collection installed but no source feeds it". +- **Source row entirely absent** for a service you run (e.g. nginx active, no `file:/var/log/nginx/...` row) → no acquisition feeds it. Cross-check enabled collections vs declared types: `cscli collections list` vs `grep -r 'type:' /etc/crowdsec/acquis.d/`. Classic when the service was installed *after* `cscli setup` ran. See [parsing.md](../symptoms/parsing.md) § "Collection installed but no source feeds it". - **Lines read = 0** → log file not reachable or rotated under it. Check perms, mount, and that the file path in `/etc/crowdsec/acquis.d/*.yaml` is correct. On 1.7.x the default file is split into per-service files in `acquis.d/`; there is no `/etc/crowdsec/acquis.yaml` after `cscli setup`. -- **Lines read > 0, parsed = 0** → wrong `type:` label or no parser installed for it. See [parsing.md](./parsing.md). +- **Lines read > 0, parsed = 0** → wrong `type:` label or no parser installed for it. See [parsing.md](../symptoms/parsing.md). - **Mostly unparsed but some parsed** → mixed-format file (e.g. `/var/log/syslog` includes lines that aren't sshd/postfix). Often benign. ### 3. Are scenarios firing? @@ -68,7 +68,7 @@ Look at the **Acquisition Metrics** table. For each source you expect: Same `cscli metrics` output, **Scenario Metrics** table: - **Instantiated = 0** for every scenario → events aren't matching any bucket filter (acquisition labels wrong, or events whitelisted before reaching the bucket). -- **Instantiated > 0, Poured > 0, Overflows = 0** → buckets receive events but never tip. Either threshold not reached, or LEAKY decay too fast for the traffic. See [no-alerts.md](./no-alerts.md). +- **Instantiated > 0, Poured > 0, Overflows = 0** → buckets receive events but never tip. Either threshold not reached, or LEAKY decay too fast for the traffic. See [no-alerts.md](../symptoms/no-alerts.md). - **Overflows > 0** → alerts should exist. Continue to step 4. Also check **Whitelist Metrics** in the same output — a high `Whitelisted` count can hide expected alerts. And confirm simulation isn't masking: @@ -87,7 +87,7 @@ cscli decisions list ``` - **No active alerts** → step 3 lied about overflows, or LAPI write failed. Check `tail -n 200 /var/log/crowdsec_api.log` for `database is locked` / disk-full / migration errors. -- **Alerts exist, no decisions** → inspect `/etc/crowdsec/profiles.yaml` (there is no `cscli profiles` command) — the profile filter may not match, or the duration is `0s`. See [../configure/profiles.md](../configure/profiles.md). +- **Alerts exist, no decisions** → inspect `/etc/crowdsec/profiles.yaml` (there is no `cscli profiles` command) — the profile filter may not match, or the duration is `0s`. See [../../configure/profiles.md](../../configure/profiles.md). - **Decisions exist** → continue to step 5. ### 4½. Is the IP allowlisted? @@ -99,7 +99,7 @@ cscli allowlists check # which allowlist (if any) covers it cscli allowlists list # local + Console-managed ``` -Allowlists suppress *new* decisions (local and CAPI/Console) for matching IPs but leave alerts visible — so the symptom is "alerts exist, no decision". See [../configure/allowlists.md](../configure/allowlists.md). +Allowlists suppress *new* decisions (local and CAPI/Console) for matching IPs but leave alerts visible — so the symptom is "alerts exist, no decision". See [../configure/allowlists.md](../../configure/allowlists.md). ### 5. Is the bouncer pulling and enforcing? @@ -109,9 +109,9 @@ cscli lapi status cscli capi status ``` -- **`cscli bouncers list` empty** → no bouncer registered. Install one (firewall, web bouncer, AppSec — see `../configure/bouncers/`). -- **Bouncer present but `Last API pull` is old** → bouncer can't reach LAPI (auth or network). See [bouncer-not-blocking.md](./bouncer-not-blocking.md). -- **Bouncer pulling decisions but traffic still passes** → backend state problem (iptables/nftables rules not materialised, web bouncer mis-wired, captcha mode instead of ban). Also see [bouncer-not-blocking.md](./bouncer-not-blocking.md). +- **`cscli bouncers list` empty** → no bouncer registered. Install one (firewall, web bouncer, AppSec — see `../../configure/bouncers/`). +- **Bouncer present but `Last API pull` is old** → bouncer can't reach LAPI (auth or network). See [not-blocked.md](../symptoms/not-blocked.md). +- **Bouncer pulling decisions but traffic still passes** → backend state problem (iptables/nftables rules not materialised, web bouncer mis-wired, captcha mode instead of ban). Also see [not-blocked.md](../symptoms/not-blocked.md). ### 6. Is the hub healthy? @@ -125,7 +125,7 @@ A line with `status: tainted` is hub-managed content modified locally — fix wi `cscli lapi status` should print `You can successfully interact with Local API (LAPI)`. Anything else means the agent can't talk to its own LAPI — wrong URL in `/etc/crowdsec/config.yaml` `api.client.credentials_path`, expired creds, or LAPI not listening. -`cscli capi status` shows whether the engine is registered with the Central API (community blocklist, signal sharing, console). Failure here is non-fatal but kills CAPI-pulled blocklists. See [../install/console.md](../install/console.md) for enrollment. +`cscli capi status` shows whether the engine is registered with the Central API (community blocklist, signal sharing, console). Failure here is non-fatal but kills CAPI-pulled blocklists. See [../../install/console.md](../../install/console.md) for enrollment. ## Live log streaming for an active incident @@ -137,10 +137,10 @@ sudo tail -F /var/log/crowdsec.log # bare-metal; docker/k8s: `logs -f` cscli explain --file /var/log/auth.log --type syslog --only-successful-parsers ``` -See [parsing.md](./parsing.md) for the flag combinations. +See [parsing.md](../symptoms/parsing.md) for the flag combinations. ## Hard don'ts during triage - **Do not** run `cscli decisions delete --all` to "reset" — it removes every active ban, including CAPI-pulled blocklists. If you need to clear one test IP, use `cscli decisions delete -i `. - **Do not** edit `/etc/crowdsec/hub/`, `/etc/crowdsec/parsers/`, `/etc/crowdsec/scenarios/`, or `/etc/crowdsec/collections/` in place to "fix" a hub item. The next `cscli hub upgrade` will overwrite it and the file will show as *tainted* until then. Use `cscli upgrade --force` to restore, and put local overrides in `*/parsers/*/_custom/` etc. -- **Do not** disable a collection wholesale to silence a false positive. Pick the right suppression layer — see [../configure/allowlists.md](../configure/allowlists.md) § Suppression mechanisms. +- **Do not** disable a collection wholesale to silence a false positive. Pick the right suppression layer — see [../configure/allowlists.md](../../configure/allowlists.md) § Suppression mechanisms. diff --git a/skills/crowdsec/references/debug/no-alerts.md b/skills/crowdsec/references/debug/symptoms/no-alerts.md similarity index 93% rename from skills/crowdsec/references/debug/no-alerts.md rename to skills/crowdsec/references/debug/symptoms/no-alerts.md index 8ee0407..a096129 100644 --- a/skills/crowdsec/references/debug/no-alerts.md +++ b/skills/crowdsec/references/debug/symptoms/no-alerts.md @@ -41,7 +41,7 @@ sudo cscli metrics show scenarios collection if partial. - **Events in, "overflow" 0**: traffic didn't cross the scenario threshold (e.g. `ssh-bf` needs N failures in the window). Generate enough events, or - test with a purpose-built probe — see [../operate/health-check.md](../operate/health-check.md). + test with a purpose-built probe — see [../../operate/health-check.md](../../operate/health-check.md). ## 2 — Source IP whitelisted (the most common "silent" cause) @@ -60,7 +60,7 @@ mobile data, or temporarily `cscli parsers remove crowdsecurity/whitelists` If the *alert* exists but no ban does, it's an **allowlist**, not a whitelist parser — these are different layers. See -[../configure/allowlists.md](../configure/allowlists.md) § Suppression +[../../configure/allowlists.md](../../configure/allowlists.md) § Suppression mechanisms for the full comparison. ## 3 — Simulation mode masking the alert @@ -81,12 +81,12 @@ The agent detects but can't persist to LAPI. Check the agent log (`/var/log/crowdsec.log`) for write errors: - **`database is locked`** (sqlite): concurrent writers / slow disk — see - [common-errors.md](./common-errors.md). + [../common/errors.md](../common/errors.md). - **Disk full**: `df -h /var/lib/crowdsec` — sqlite write fails silently from the user's view. - **Remote LAPI unreachable**: agent-only node can't reach the LAPI host. `cscli lapi status` from the agent. See - [../operate/multi-server.md](../operate/multi-server.md). + [../../operate/multi-server.md](../../operate/multi-server.md). ## 5 — Right scenario at all? diff --git a/skills/crowdsec/references/debug/bouncer-not-blocking.md b/skills/crowdsec/references/debug/symptoms/not-blocked.md similarity index 93% rename from skills/crowdsec/references/debug/bouncer-not-blocking.md rename to skills/crowdsec/references/debug/symptoms/not-blocked.md index a8c4e93..d05f104 100644 --- a/skills/crowdsec/references/debug/bouncer-not-blocking.md +++ b/skills/crowdsec/references/debug/symptoms/not-blocked.md @@ -26,7 +26,7 @@ If it matches, the bouncer is *correctly* not blocking. Allowlists are **not retroactive** and they don't delete existing decisions — if you added an allowlist but the IP is still banned, delete the decision too (`cscli decisions delete -i `). See -[../configure/allowlists.md](../configure/allowlists.md). +[../configure/allowlists.md](../../configure/allowlists.md). ## 1 — Decision actually active and the right scope/family @@ -97,18 +97,18 @@ sudo nft list set ip crowdsec crowdsec-blacklists-cscli | grep the bouncer service isn't running: `systemctl status crowdsec-firewall-bouncer`. - Counter not incrementing on a known-banned source you curl from → traffic isn't traversing the hooked chain (e.g. it's container-internal on - Docker's own table). See [../configure/bouncers/firewall.md](../configure/bouncers/firewall.md). + Docker's own table). See [../../configure/bouncers/firewall.md](../../configure/bouncers/firewall.md). **Web-server bouncer** (nginx/traefik/caddy): the bouncer trusts the *client* IP. Behind a proxy/CDN without correct `X-Forwarded-For` trust config, it bans the proxy or sees the wrong IP and never matches. Also check **mode**: `captcha` mode returns a challenge page, not a 403 — "not blocking" may actually -be "serving the captcha". See [../configure/bouncers/web-servers.md](../configure/bouncers/web-servers.md). +be "serving the captcha". See [../../configure/bouncers/web-servers.md](../../configure/bouncers/web-servers.md). **AppSec**: distinct from a decision bouncer — AppSec blocks by request *shape* inband (403 from AppSec), not by IP decision. If an inband rule should 403 but doesn't, the bouncer isn't forwarding to the AppSec endpoint, or the config is -out-of-band only. See [../appsec/troubleshoot.md](../appsec/troubleshoot.md). +out-of-band only. See [../../appsec/troubleshoot.md](../../appsec/troubleshoot.md). ## 6 — CAPI/blocklist not imported yet diff --git a/skills/crowdsec/references/debug/parsing.md b/skills/crowdsec/references/debug/symptoms/parsing.md similarity index 94% rename from skills/crowdsec/references/debug/parsing.md rename to skills/crowdsec/references/debug/symptoms/parsing.md index 081714d..b8dee48 100644 --- a/skills/crowdsec/references/debug/parsing.md +++ b/skills/crowdsec/references/debug/symptoms/parsing.md @@ -105,14 +105,11 @@ acquisition file. Fix either way: ## Reachability (when 0 lines read) - **File perms**: the `crowdsec` user must read the file. `sudo -u crowdsec head - ` — if that fails, it's perms (or SELinux/AppArmor — see - [common-errors.md](./common-errors.md)). + ` — if that fails, it's perms (or SELinux/AppArmor, or — in a + container/k8s — the path isn't mounted in). Platform-specific causes are + collected in [../common/platform-gotchas.md](../common/platform-gotchas.md). - **journald**: source must be in a group that can read the journal; a file-source migration to journald silently reads nothing otherwise. -- **Containers/k8s**: the log path must be *mounted into* the crowdsec - container/pod. A path that exists on the host but not in the container reads - zero. Verify inside: `docker exec crowdsec ls -l ` / - `kubectl exec -n -- ls -l `. ## Multi-stage chains diff --git a/skills/crowdsec/references/install/bare-metal.md b/skills/crowdsec/references/install/bare-metal.md index a871584..4946a1c 100644 --- a/skills/crowdsec/references/install/bare-metal.md +++ b/skills/crowdsec/references/install/bare-metal.md @@ -77,10 +77,10 @@ sudo cscli hub list # collections installed by cscli setup If `cscli metrics` shows acquisition sources but **0 parsed**, the source is matched but the parser collection for it isn't installed — see -[../debug/parsing.md](../debug/parsing.md). If the service won't start, the +[../debug/symptoms/parsing.md](../debug/symptoms/parsing.md). If the service won't start, the single most common cause is a malformed file in `acquis.d/` (the `-t` pre-check prints the offending file + line to the journal) — see -[../debug/common-errors.md](../debug/common-errors.md). +[../debug/common/errors.md](../debug/common/errors.md). ## 4 — Common post-install pitfalls diff --git a/skills/crowdsec/references/install/kubernetes.md b/skills/crowdsec/references/install/kubernetes.md index 79c9793..0d2e61b 100644 --- a/skills/crowdsec/references/install/kubernetes.md +++ b/skills/crowdsec/references/install/kubernetes.md @@ -64,7 +64,7 @@ appsec: The chart ships `container_runtime: docker`. kind, k3d, and most managed clusters (EKS/GKE/AKS recent) run **containerd**. With the wrong value the agent reads pod logs in the wrong format → lines read, **0 parsed**, no -alerts (the [parsing.md](../debug/parsing.md) symptom). Set +alerts (the [parsing.md](../debug/symptoms/parsing.md) symptom). Set `container_runtime: containerd` unless your nodes genuinely use the Docker runtime. Confirm with `kubectl get nodes -o wide` → CONTAINER-RUNTIME column. diff --git a/skills/crowdsec/references/operate/health-check.md b/skills/crowdsec/references/operate/health-check.md index 1745096..a8395c4 100644 --- a/skills/crowdsec/references/operate/health-check.md +++ b/skills/crowdsec/references/operate/health-check.md @@ -49,7 +49,7 @@ Expected: one row with `kind: crowdsec`, scope `Ip:`. The test s **Common failure paths** (in order to check): 1. *No row, no parser hit* → the web server's logs aren't being read. `cscli metrics show acquisition` — does your access log show non-zero "Lines read"? If not, see [../configure/acquisition.md](../configure/acquisition.md). -2. *Logs read, 0 parsed* → wrong `type:` label vs installed parser. See [../debug/parsing.md](../debug/parsing.md). +2. *Logs read, 0 parsed* → wrong `type:` label vs installed parser. See [../debug/symptoms/parsing.md](../debug/symptoms/parsing.md). 3. *Source IP is private* → see the warning at top of page. ## 2. Engine detection: SSH @@ -136,7 +136,7 @@ curl -I https:/// sudo cscli decisions list ``` -If the request still goes through after a successful add → the bouncer isn't polling fast enough, isn't enforcing, or doesn't have the right scope. See [../debug/bouncer-not-blocking.md](../debug/bouncer-not-blocking.md). +If the request still goes through after a successful add → the bouncer isn't polling fast enough, isn't enforcing, or doesn't have the right scope. See [../debug/symptoms/not-blocked.md](../debug/symptoms/not-blocked.md). ## Automating the health-check