Skip to content

Add PhishDestroy/destroylist as a default threat-intel feed #22

@schmug

Description

@schmug

Summary

workers/intel/defaults.ts currently ships URLhaus and OpenPhish as default threat-intel feeds. PhishDestroy publishes a large, MIT-licensed phishing domain list that's a strong fit for the bloom-filter pipeline but isn't wired in — the file even references it in a licensing-caveat comment that has now been resolved.

Motivation

As a mailbox operator, I want broader phishing-domain coverage out of the box so that threat-intel bloom lookups catch more credential-harvesting and crypto-scam domains without requiring custom feed configuration.

PhishDestroy is a good default because:

  • License is unambiguous: MIT, "Free for any use with attribution." Compatible with our redistribution model (bloom filter ships aggregated data). No NC, no share-alike, no API key.
  • Scale: ~140k curated phishing/scam domains; ~78k DNS-validated active subset. Updated continuously via GitHub Actions.
  • Format: plain newline-delimited domains — drop-in for kind: "domain".
  • Active: pushed daily; 991 ⭐; clear governance, appeals process, and abuse-reporting pipeline.

Full audit at https://github.com/phishdestroy/destroylist (LICENSE, SECURITY.md, README).

Scope

In:

  • Add a phishdestroy entry to DEFAULT_FEEDS in workers/intel/defaults.ts using the DNS-validated list (dns/active_domains.txt, 24h refresh, ~78k domains).
  • Update the licensing comment block at the top of defaults.ts from "check their current terms before redistributing" to reflect the resolved MIT status with attribution.
  • Add a one-line PhishDestroy attribution to hub/README.md alongside the existing URLhaus/PhishDestroy mentions, so the MIT notice travels with redistributed bloom-filter outputs.
  • Add/extend tests in workers/intel/ (or wherever feed-loading is tested) to assert the new feed parses and contributes to the bloom.

Out (defer to follow-ups):

  • Wiring the curated list.txt (140k, "real-time") as a separate opt-in feed for max-coverage operators — file as a follow-up if there's demand.
  • Wiring the community/blocklist.txt aggregated list — riskier (no manual false-positive removal per their SECURITY.md); should remain opt-in only.
  • Integrating the live api.destroy.tools single-domain check API — useful but a different abstraction than bulk feeds.

Acceptance criteria

  • DEFAULT_FEEDS in workers/intel/defaults.ts includes a phishdestroy entry pointing at https://raw.githubusercontent.com/phishdestroy/destroylist/main/dns/active_domains.txt with kind: "domain" and refreshHours: 24.
  • Licensing comment in defaults.ts reflects the MIT-with-attribution status; no stale "check terms" warning for PhishDestroy.
  • hub/README.md (or another visible place that ships with redistributed data) carries a PhishDestroy attribution line satisfying the MIT notice requirement.
  • Existing feed-refresh / bloom tests still pass; a new test asserts the PhishDestroy feed loads and contributes domains to the bloom.
  • No regression in feed-refresh worker behavior on cold start (manual verification via wrangler dev or the existing harness).

Cross-references

  • No prior issues or PRs found (gh issue/pr list --search "phishdestroy OR destroylist OR feed" returned only unrelated results).
  • Existing PhishDestroy references already in the codebase as documentation comments only:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions