Skip to content

feat(intelligence): capture User-Agent for traffic classification#128

Merged
Satelink-Protocol merged 1 commit into
developfrom
worktree-traffic-intel-ua-capture
Jun 14, 2026
Merged

feat(intelligence): capture User-Agent for traffic classification#128
Satelink-Protocol merged 1 commit into
developfrom
worktree-traffic-intel-ua-capture

Conversation

@Satelink-Protocol

Copy link
Copy Markdown
Owner

Why

The free-tier gate currently stores only a per-IP daily counter (ft:<ip> → int).
Method, timing, and User-Agent are captured nowhere — so of 354k calls/day across 2,204
IPs, ~92% of traffic is unclassifiable. We can spot crawlers by volume but cannot find
the developers who can actually pay.

What

  • Capture User-Agent in free_tier_gate.js, stored as ftua:<ip> with the same daily
    TTL as the counter. Written once per IP per day (first call only) → one extra Redis
    write on the hot path, not per-request.
  • Add classifyUserAgent(): curl/python/wget=script, ethers/viem/web3=developer,
    Mozilla=browser, *bot/spider=crawler.
  • conversion_targets export now includes user_agent + classification.
  • User-Agent is a software identifier, not PII — stored alongside the salted-SHA256
    hashed client_id, never with the raw IP.

Report

Adds docs/TRAFFIC_INTELLIGENCE_REPORT.md — volume-based classification of current live
traffic. Key findings:

  • 78% of all traffic = 6 crawler IPs (now 429 abuse-blocked via PR fix(abuse+dashboard): crawler block + real-time metrics #127).
  • 91.7% of calls come from 55 IPs; the remaining ~2,149 IPs average 13.7 calls/day.
  • Developers cannot be distinguished from bots until this UA field exists — that is the
    single highest-ROI action
    , and it ships here.

Note: STEP 1/2 of the brief (method & timing analysis from revenue_events_v2) could
not be run — the prod Postgres is only reachable via Railway's internal hostname
(paperclip-db.railway.internal), and that table holds only billed events (<2.5% of
traffic). Details in the report.

Verify after deploy

Tomorrow's GET /system/free-tierconversion_targets[] entries will carry
user_agent + classification. Re-run the report against classified data.

🤖 Generated with Claude Code

…classification

Without this we cannot distinguish developers from crawlers.
User-Agent is not PII — it is a software identifier, not a person.
Stored alongside hashed client_id (ftua:<ip>, same daily TTL), never with raw IP.

Enables automated classification: curl/wget/python=script, ethers/viem/web3=developer,
Mozilla=browser, *bot/spider=crawler. conversion_targets export now includes
user_agent + classification. Captured once per IP per day (first call), so the hot
path takes a single extra Redis write.

Also adds docs/TRAFFIC_INTELLIGENCE_REPORT.md: volume-based classification of the
current 354k calls/day across 2,204 IPs. Key finding: 78% of traffic is 6 crawlers
(now 429 abuse-blocked); developers are unidentifiable until this UA field exists.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 14, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
web Ready Ready Preview, Comment Jun 14, 2026 6:17am

@Satelink-Protocol Satelink-Protocol merged commit d2796b1 into develop Jun 14, 2026
6 of 7 checks passed
Satelink-Protocol added a commit that referenced this pull request Jun 14, 2026
…r job (#129)

* fix(security): npm audit fix — clear semver-safe vulns (32→29)

Non-breaking npm audit fix. Cleared 2 high + 1 moderate (semver-compatible).
Remaining 29 all require breaking major bumps, deliberately NOT applied:
- ws via @ethersproject (needs ethers@5.8.0 major) — risky on Polygon mainnet
- shell-quote/concurrently 2 criticals (needs concurrently@10 major) — dev-tooling only

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(intelligence): capture User-Agent for traffic classification (#128)

feat(intelligence): capture User-Agent in free-tier gate for traffic classification

Without this we cannot distinguish developers from crawlers.
User-Agent is not PII — it is a software identifier, not a person.
Stored alongside hashed client_id (ftua:<ip>, same daily TTL), never with raw IP.

Enables automated classification: curl/wget/python=script, ethers/viem/web3=developer,
Mozilla=browser, *bot/spider=crawler. conversion_targets export now includes
user_agent + classification. Captured once per IP per day (first call), so the hot
path takes a single extra Redis write.

Also adds docs/TRAFFIC_INTELLIGENCE_REPORT.md: volume-based classification of the
current 354k calls/day across 2,204 IPs. Key finding: 78% of traffic is 6 crawlers
(now 429 abuse-blocked); developers are unidentifiable until this UA field exists.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

* fix(settlement): correct epoch_ledger table and column names in anchor job

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@Satelink-Protocol Satelink-Protocol deleted the worktree-traffic-intel-ua-capture branch June 15, 2026 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant