Skip to content

Revise spam detection: multi-signal scoring pipeline with graduated responses #268

@vcarl

Description

@vcarl

Problem

Current spam detection (app/helpers/isSpam.ts) is purely content-based — a fixed keyword list with a static scoring threshold. It has several blind spots:

  • No behavioral signals: Doesn't consider account age, server tenure, or role status. Brand-new accounts posting links are treated the same as established members.
  • No velocity detection: Can't detect channel-hopping (posting the same message across many channels rapidly) or message flooding.
  • Binary response: Either does nothing or deletes + eventually kicks after 3 detections. No proportional middle ground.
  • Separate honeypot system: honeypotTracker.ts runs as a separate MessageCreate listener doing its own softban, duplicating some of the same checks.

Proposed Solution

Replace isSpam() with a multi-signal scoring pipeline that collects signals across 4 categories, produces an explained score, and routes to graduated responses.

Signal Categories

1. Content (evolved from current system): keyword matches, link-to-text ratio, mention density, unicode/zalgo abuse, bare invite links, @everyone/@here pings.

2. Behavioral (new): account age, server tenure, first-message-has-link, no-roles-assigned. Uses user.createdAt, member.joinedAt, and message_stats table for history.

3. Velocity (new): channel-hop detection, duplicate message detection, message rate. Uses an in-memory per-user tracker (Map with TTL cleanup, bounded at 20 messages per user).

4. Honeypot (unified): Messages in honeypot channels score +100, triggering maximum response (softban).

Graduated Responses

Tier Score Action
none 0–5 No action
low 6–9 Log to mod thread for review (no delete)
medium 10–14 Delete + apply restricted role
high 15+ Delete + timeout; kick after 3 cumulative detections
honeypot 100+ Softban (ban + unban, clears 7 days of messages)

Explainability

Each verdict includes a signal breakdown in the mod log:

Score 12 (medium): spam_keyword:nitro (+1), has_link (+2), account_age_lt_1d (+3), server_tenure_lt_1h (+3), first_message_has_link (+3)

Architecture

  • New modules under app/features/spam/
  • Pure analysis functions (content, behavior, velocity) — unit testable
  • In-memory RecentActivityTracker for velocity state
  • SpamDetectionService (Effect service with Layer) wiring analyzers together
  • Unified MessageCreate handler replacing both automod.ts and honeypotTracker.ts
  • No database migration needed (uses existing extra field and message_stats table)

Files

Create: app/features/spam/spamPatterns.ts, contentAnalyzer.ts, behaviorAnalyzer.ts, recentActivityTracker.ts, velocityAnalyzer.ts, spamScorer.ts, spamResponseHandler.ts, service.ts

Modify: app/discord/automod.ts, app/discord/gateway.ts, app/helpers/metrics.ts

Remove: app/helpers/isSpam.ts, app/discord/honeypotTracker.ts

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions