Skip to content

patatapython/link-cleaner-pro

Repository files navigation

Link Cleaner Pro

Link Cleaner Pro

Chrome extension that removes tracking from URLs, expands shortened links,
and catches phishing domains — offline, local, no data leaves your browser.

Manifest V3 JavaScript Security Research License

Leer en Español


Why

Every link you share carries invisible baggage: utm_source, fbclid, gclid, affiliate tags buried inside Amazon paths, session tokens encoded in Base64. URL shorteners hide where the link actually goes. Phishing sites register domains with Cyrillic characters that look identical to google.com or paypal.com in a browser bar.

I built this to understand how deep that problem goes — and to build a tool that actually fixes it without sending your data anywhere.


Screenshots

Cleaning a shortened AliExpress URL in aggressive mode — expanded, cleaned, 84% shorter     Statistics dashboard in settings: URLs cleaned, trackers removed, history

Left: expanding and cleaning a tinyurl → AliExpress link in aggressive mode. Right: tracking stats in the settings page.


How It Works

The extension runs every URL through a 4-layer pipeline:

  URL in
    |
    v
 [Platform Rules]  ──  Custom logic per site (Amazon, YouTube, LinkedIn, Twitter, Instagram, AliExpress)
    |
    v
 [ClearURLs DB]    ──  730+ rules across 205 tracking providers, auto-updated weekly
    |
    v
 [Heuristics]      ──  Shannon entropy analysis + UUID/Base64/hex pattern detection
    |
    v
 [Preservation]    ──  Keeps essential params (product IDs, search queries, pagination)
    |
    v
  Clean URL

Three modes control how aggressive the cleaning is:

Mode What it does Trade-off
Minimal Strips known trackers (utm_*, fbclid, gclid) Never breaks links
Smart Adds entropy heuristics + platform rules Recommended default
Aggressive Whitelist-only — keeps id, page, q, v, t and drops everything else Maximum privacy, may break edge cases

Beyond cleaning, shortened URLs are expanded through a 4-strategy cascade (embedded URL extraction → native redirect following → iframe capture → external service fallback) covering 1,400+ shortener domains. The process loops up to 5 times to resolve chains like bit.ly → t.co → redirect.com → real-destination.com.


The Interesting Parts

Entropy-based tracker detection

Static blocklists can't keep up — new tracking params appear daily. So instead of only matching known names, the extension measures the Shannon entropy of each parameter value:

ref=homepage        → 2.4 bits/char  →  functional, keep it
_ga=2.18943.10873   → 3.4 bits/char  →  tracking token, remove
sid=a1b2c3d4e5f6    → 3.7 bits/char  →  unique ID, remove

High entropy means high randomness, which means the value was generated to identify you, not to serve a page. Threshold is ~3.0 bits/char. On top of that, the heuristic layer catches UUIDs, Base64 padding patterns, and hex hashes by format.

Homograph attack detection

The extension maps Cyrillic, Greek, and fullwidth Unicode characters back to ASCII and checks against 15+ high-value phishing targets:

аpple.com   →  Cyrillic 'а' →  impersonates apple.com
gооgle.com  →  Cyrillic 'о' →  impersonates google.com

This runs on every URL before it reaches the user.

Offline-first security

An 11,000+ domain blacklist is loaded into a Set at startup — lookups take sub-millisecond time, no network needed. For deeper analysis, optional VirusTotal integration scans against 70+ engines, but only when the user explicitly clicks the button. API keys stay in chrome.storage.local (device-only, never synced).


Architecture

LayerWhatPattern
BackgroundService Worker — context menu, scheduled rule updates, message routingEvent-driven
PopupStateManager (single source of truth) + UIController (state-driven DOM)Observer
Rule Engine7 providers chained: each checks domain match, applies transforms, passes to nextChain of Responsibility
Cleaning4-layer pipeline with 3 swappable strategies (minimal/smart/aggressive)Strategy
NetworkingFetch with exponential backoff + jitter, per-request timeouts via AbortSignalRetry with backoff
StorageQuota-aware persistence, auto-trims at 90% of 10MB limitSafeStorage

Zero frameworks. Vanilla JS + ES6 modules. No bundler, no transpiler — the code in the repo is the code in the browser. Adding a new platform means writing one provider class and registering it in the factory.


Tech Stack

Chrome JS Jest VT

  • No dependencies in production — Chrome APIs + Fetch API + ES6
  • Dual storage: chrome.storage.sync for settings, .local for sensitive data
  • ClearURLs rule database (205 providers, 730+ rules, bundled + auto-updated)
  • 1,400+ shortener domains from PeterDaveHello/url-shorteners (CC BY-SA 4.0)

Quick Start

git clone https://github.com/patatapython/link-cleaner-pro.git
cd link-cleaner-pro
npm install    # dev dependencies only (Jest)
npm test       # run test suite

Then load as unpacked extension in chrome://extensions/ (Developer mode ON).

Right-click any link → "Clean this link". Done.


Privacy

No telemetry. No analytics. No content scripts. No background monitoring. Everything runs locally. VirusTotal scans are opt-in and manual. Full policy · How it works


Credits

  • ClearURLs — open-source tracking parameter database with 205 providers and 730+ rules
  • PeterDaveHello/url-shorteners — community-curated list of 1,400+ URL shortening services (CC BY-SA 4.0)
  • VirusTotal — threat intelligence API scanning against 70+ antivirus engines

MIT License

About

Privacy-first Chrome extension (Manifest V3) — strips tracking parameters, expands shortened URLs, and detects malicious links. All processing happens locally.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors