Skip to content

Comments

fix: OpenSky auth resilience for Railway#329

Merged
koala73 merged 1 commit intomainfrom
fix/opensky-auth-resilience
Feb 24, 2026
Merged

fix: OpenSky auth resilience for Railway#329
koala73 merged 1 commit intomainfrom
fix/opensky-auth-resilience

Conversation

@koala73
Copy link
Owner

@koala73 koala73 commented Feb 24, 2026

Summary

Fixes OpenSky data outage on Railway relay caused by PR #320.

Problem 1: Auth "socket hang up" (FIXED in previous commit)

Railway containers couldn't reach auth.opensky-network.orgfamily: 4 forces IPv4.

Problem 2: Negative cache poisoning (FIXED in previous commit)

Auth failures were negative-cached per bbox key, poisoning ALL cache entries for 30s.

Problem 3: 429 rate-limit storm (NEW — this commit)

5 unique bbox queries fire simultaneously when the 30s negative cache expires. ALL get 429'd by OpenSky, ALL get re-cached as negative for 30s. Cycle repeats forever — zero data ever flows.

Fix — two mechanisms:

  1. Global 429 cooldown (90s, env-configurable via OPENSKY_429_COOLDOWN_MS): When ANY bbox request gets 429, ALL upstream requests are blocked for 90s. Reduces wasted requests from 5/30s → 1/90s.

  2. Request serializer queue with 2s spacing (env-configurable via OPENSKY_REQUEST_SPACING_MS): Upstream requests go one at a time instead of 5 concurrent. First request succeeds → cached → remaining 4 serve from cache.

Other changes

  • Promisified upstream fetch (replaces callback-based https.get) for cleaner queue integration
  • /opensky-reset clears 429 cooldown
  • /metrics and /opensky-diag expose cooldown state
  • 3-attempt auth retry with credential error short-circuit (401/400/403 → no retry)
  • Token mutex race condition fix (if (openskyTokenPromise === myPromise))

Test plan

  • Deploy to Railway
  • Check /metricsglobal429CooldownRemainingMs should show cooldown state
  • Wait for cooldown to expire — first upstream fetch should return 200
  • Subsequent bbox queries should serve from positive cache (X-Cache: HIT)
  • opensky positive cache count should increase in periodic log
  • If stuck: hit /opensky-reset to clear all state

… fail

Three fixes for OpenSky auth failure on Railway:

1. Don't negative-cache auth failures (503) — was poisoning ALL bbox
   cache keys when auth failed, making recovery take 30s+ even after
   auth comes back. Only cache actual upstream 429/5xx now.

2. Add 3-attempt retry with backoff (0s, 2s, 5s) before entering 60s
   cooldown. Previously one socket hang up = 60s of zero data.

3. Force IPv4 (family: 4) on all OpenSky HTTPS requests — auth endpoint
   and API endpoint. Defense against Railway IPv6 routing issues.

Also adds /opensky-reset endpoint to manually clear cooldown + negative
cache + force fresh token fetch. Better error logging with error codes.
@vercel
Copy link

vercel bot commented Feb 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
worldmonitor Ready Ready Preview, Comment Feb 24, 2026 4:44pm
worldmonitor-finance Building Building Preview, Comment Feb 24, 2026 4:44pm
worldmonitor-happy Ready Ready Preview, Comment Feb 24, 2026 4:44pm
worldmonitor-startup Building Building Preview, Comment Feb 24, 2026 4:44pm

Request Review

@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@koala73 koala73 merged commit 85958a6 into main Feb 24, 2026
3 of 5 checks passed
koala73 added a commit that referenced this pull request Feb 24, 2026
## What's Changed

### Performance
- perf: defer YouTube/map init and stagger data loads to reduce blocking time (#287)

### Features
- feat: universal country detection — CII scoring for all countries (#344)
- feat: add Mexico as CII hotspot for cartel/security monitoring (#327)
- feat: add Mexico and LatAm security feeds for instability coverage (#325)
- feat: add category pills and search filter to Panels tab (#322)
- feat: consolidate settings into unified tabbed modal (#319)
- feat: add Island Times (Palau) RSS feed (#317)
- feat: add AI Flow settings popup for web-only AI provider control (#314)
- feat: optional channels with tab-based region browse UI (#295)
- feat: custom channel management (#282)
- feat: add Bild RSS feed scoped to German locale (#312)

### Bug Fixes
- fix: suppress notification sound when popup alerts are disabled
- fix: prevent entity conflation in pane summarization (#341)
- fix: add Mexico to COUNTRY_BOUNDS and COUNTRY_ALIASES (#338)
- fix: make OpenSky cache TTLs env-configurable (#333)
- fix: serialize OpenSky requests with global 429 cooldown (#332)
- fix: replace RSSHub feeds with native/Google News alternatives (#331)
- fix: OpenSky auth resilience — retry, IPv4, no negative cache on auth fail (#329)
- fix: add CARTO and OpenStreetMap attribution to map (#323)
- fix: add drag cleanup handlers and suppress click after drag-drop (#315)
- fix: replace HTML5 drag API with mouse events for WKWebView (#313)
- fix: open channel settings as inline modal instead of separate window (#311)
- fix: sync YouTube live panel mute state with native player controls (#285)
- fix: strip Ollama reasoning tokens from summaries (#299)
- fix: open external links in system browser on Tauri desktop (#297)
- fix: add User-Agent and Cloudflare 403 detection to secret validation (#296)
- fix: infra cost optimizations round 2 (#275)
- fix: enforce military bbox filtering (#284)
- fix: infrastructure cost optimizations across caching, polling, batching (#283)
- fix: circuit breaker persistent cache with safety fixes (#281)
- fix: immediately refresh stale services when tab regains focus (#277)

### Security
- Security hardening: SSRF protection, auth gating, and token generation (#343)
- Harden Railway relay auth, caching, and proxy routing (#320)
- Build/runtime hardening and dependency security updates (#286)
- fix: harden embed postMessage origin check (#302)
koala73 added a commit that referenced this pull request Feb 25, 2026
## What's Changed

### Performance
- perf: defer YouTube/map init and stagger data loads (#287)

### Features
- feat: universal country detection — CII scoring for all countries (#344)
- feat: add Mexico as CII hotspot (#327)
- feat: add Mexico and LatAm security feeds (#325)
- feat: add category pills and search filter to Panels tab (#322)
- feat: consolidate settings into unified tabbed modal (#319)
- feat: optional channels with tab-based region browse UI (#295)
- feat: custom channel management (#282)

### Bug Fixes
- fix: suppress notification sound when popup alerts are disabled
- fix: prevent entity conflation in pane summarization (#341)
- fix: add Mexico to COUNTRY_BOUNDS and COUNTRY_ALIASES (#338)
- fix: OpenSky cache TTLs, serialization, and auth resilience (#329-#333)
- fix: replace RSSHub feeds with native/Google News alternatives (#331)
- fix: replace HTML5 drag API with mouse events for WKWebView (#313)
- fix: sync YouTube mute state with native player controls (#285)
- fix: strip Ollama reasoning tokens from summaries (#299)
- fix: infra cost optimizations (#275, #283)
- fix: circuit breaker persistent cache (#281)
- fix: immediately refresh stale services on tab focus (#277)

### Security
- Security hardening: SSRF protection, auth gating, token generation (#343)
- Harden Railway relay auth, caching, and proxy routing (#320)
- Build/runtime hardening and dependency security updates (#286)
koala73 added a commit that referenced this pull request Feb 25, 2026
…346)

* fix: suppress notification sound when popup alerts are disabled

Badge playSound() was firing on new findings regardless of the
"Pop up new alerts" toggle. Gate sound on popupEnabled so both
the modal and audio respect the user preference.

* chore: bump version to 2.5.7 with changelog

## What's Changed

### Performance
- perf: defer YouTube/map init and stagger data loads (#287)

### Features
- feat: universal country detection — CII scoring for all countries (#344)
- feat: add Mexico as CII hotspot (#327)
- feat: add Mexico and LatAm security feeds (#325)
- feat: add category pills and search filter to Panels tab (#322)
- feat: consolidate settings into unified tabbed modal (#319)
- feat: optional channels with tab-based region browse UI (#295)
- feat: custom channel management (#282)

### Bug Fixes
- fix: suppress notification sound when popup alerts are disabled
- fix: prevent entity conflation in pane summarization (#341)
- fix: add Mexico to COUNTRY_BOUNDS and COUNTRY_ALIASES (#338)
- fix: OpenSky cache TTLs, serialization, and auth resilience (#329-#333)
- fix: replace RSSHub feeds with native/Google News alternatives (#331)
- fix: replace HTML5 drag API with mouse events for WKWebView (#313)
- fix: sync YouTube mute state with native player controls (#285)
- fix: strip Ollama reasoning tokens from summaries (#299)
- fix: infra cost optimizations (#275, #283)
- fix: circuit breaker persistent cache (#281)
- fix: immediately refresh stale services on tab focus (#277)

### Security
- Security hardening: SSRF protection, auth gating, token generation (#343)
- Harden Railway relay auth, caching, and proxy routing (#320)
- Build/runtime hardening and dependency security updates (#286)
andreteow pushed a commit to andreteow/worldmonitor-a47 that referenced this pull request Feb 25, 2026
… fail (koala73#329)

Three fixes for OpenSky auth failure on Railway:

1. Don't negative-cache auth failures (503) — was poisoning ALL bbox
   cache keys when auth failed, making recovery take 30s+ even after
   auth comes back. Only cache actual upstream 429/5xx now.

2. Add 3-attempt retry with backoff (0s, 2s, 5s) before entering 60s
   cooldown. Previously one socket hang up = 60s of zero data.

3. Force IPv4 (family: 4) on all OpenSky HTTPS requests — auth endpoint
   and API endpoint. Defense against Railway IPv6 routing issues.

Also adds /opensky-reset endpoint to manually clear cooldown + negative
cache + force fresh token fetch. Better error logging with error codes.
andreteow pushed a commit to andreteow/worldmonitor-a47 that referenced this pull request Feb 25, 2026
…oala73#346)

* fix: suppress notification sound when popup alerts are disabled

Badge playSound() was firing on new findings regardless of the
"Pop up new alerts" toggle. Gate sound on popupEnabled so both
the modal and audio respect the user preference.

* chore: bump version to 2.5.7 with changelog

## What's Changed

### Performance
- perf: defer YouTube/map init and stagger data loads (koala73#287)

### Features
- feat: universal country detection — CII scoring for all countries (koala73#344)
- feat: add Mexico as CII hotspot (koala73#327)
- feat: add Mexico and LatAm security feeds (koala73#325)
- feat: add category pills and search filter to Panels tab (koala73#322)
- feat: consolidate settings into unified tabbed modal (koala73#319)
- feat: optional channels with tab-based region browse UI (koala73#295)
- feat: custom channel management (koala73#282)

### Bug Fixes
- fix: suppress notification sound when popup alerts are disabled
- fix: prevent entity conflation in pane summarization (koala73#341)
- fix: add Mexico to COUNTRY_BOUNDS and COUNTRY_ALIASES (koala73#338)
- fix: OpenSky cache TTLs, serialization, and auth resilience (koala73#329-koala73#333)
- fix: replace RSSHub feeds with native/Google News alternatives (koala73#331)
- fix: replace HTML5 drag API with mouse events for WKWebView (koala73#313)
- fix: sync YouTube mute state with native player controls (koala73#285)
- fix: strip Ollama reasoning tokens from summaries (koala73#299)
- fix: infra cost optimizations (koala73#275, koala73#283)
- fix: circuit breaker persistent cache (koala73#281)
- fix: immediately refresh stale services on tab focus (koala73#277)

### Security
- Security hardening: SSRF protection, auth gating, token generation (koala73#343)
- Harden Railway relay auth, caching, and proxy routing (koala73#320)
- Build/runtime hardening and dependency security updates (koala73#286)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant