From 5db1d97168d6b9ae2d90d5636eeeee9ffcc0e114 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 16 Feb 2026 03:34:57 +0000 Subject: [PATCH 1/3] Initial plan From d084585a2603106ccda7a402939fd06b8f66f480 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 16 Feb 2026 03:39:34 +0000 Subject: [PATCH 2/3] Add comprehensive data collection documentation Co-authored-by: backgroundcheck <18512725+backgroundcheck@users.noreply.github.com> --- docs/DATA_COLLECTION.md | 1079 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 1079 insertions(+) create mode 100644 docs/DATA_COLLECTION.md diff --git a/docs/DATA_COLLECTION.md b/docs/DATA_COLLECTION.md new file mode 100644 index 000000000..6aabf6633 --- /dev/null +++ b/docs/DATA_COLLECTION.md @@ -0,0 +1,1079 @@ +# World Monitor Data Collection Documentation + +This document provides comprehensive details about how the World Monitor application collects, processes, and presents data from over 150 external sources. + +## Table of Contents + +1. [Architecture Overview](#architecture-overview) +2. [Data Sources](#data-sources) +3. [Data Collection Methods](#data-collection-methods) +4. [Refresh Intervals & Caching](#refresh-intervals--caching) +5. [Authentication & API Keys](#authentication--api-keys) +6. [Data Processing Pipeline](#data-processing-pipeline) +7. [Reliability & Resilience](#reliability--resilience) + +--- + +## Architecture Overview + +World Monitor employs a **distributed multi-source aggregation architecture** with the following components: + +### Infrastructure Layers + +1. **Vercel Edge Functions** (Primary) + - Server-side API proxies for secure key management + - CORS handling and rate limiting + - Cross-user caching via Upstash Redis + - Located in `/api` directory + +2. **Railway Relay Server** + - Real-time WebSocket streaming for AIS (vessel) and OpenSky (aircraft) data + - RSS feed proxy for sources blocked by Vercel IPs (UN News, CISA, etc.) + - Deployed separately from main application + +3. **Browser-Based Services** + - Client-side data aggregation and display + - Local caching and state management + - AI-powered analysis and classification + - Located in `/src/services` directory + +### Data Flow + +``` +External APIs/RSS → Vercel/Railway → Browser Services → UI Components + ↓ + Upstash Redis Cache (Cross-user) + ↓ + Browser localStorage (Per-user) +``` + +--- + +## Data Sources + +### 1. Conflict & Geopolitical Events + +#### ACLED (Armed Conflict Location & Event Data) +- **API**: `/api/acled` and `/api/acled-conflict` +- **Data**: Protest events, conflict incidents, violence against civilians +- **Coverage**: Global, 200+ countries +- **Update Frequency**: Daily +- **Authentication**: API access token required +- **Source**: https://acleddata.com/ +- **License**: Free for researchers/non-commercial use + +#### UCDP (Uppsala Conflict Data Program) +- **API**: `/api/ucdp` and `/api/ucdp-events` +- **Data**: Organized violence, one-sided violence, non-state conflicts +- **Coverage**: Global since 1989 +- **Update Frequency**: Daily for events, monthly for aggregates +- **Authentication**: None (public API) +- **Source**: https://ucdp.uu.se/ + +#### GDELT (Global Database of Events, Language, and Tone) +- **API**: `/api/gdelt-geo` and `/api/gdelt-doc` +- **Data**: Geo-tagged events, document processing, tone analysis +- **Coverage**: Real-time global events from news media +- **Update Frequency**: Every 15 minutes +- **Authentication**: None (public API) +- **Source**: https://www.gdeltproject.org/ + +#### UNHCR (UN Refugee Agency) +- **API**: `/api/unhcr-population` +- **Data**: Refugee populations, internally displaced persons, asylum seekers +- **Coverage**: Global displacement data +- **Update Frequency**: Quarterly with monthly updates for major crises +- **Authentication**: None (public API) +- **Source**: https://www.unhcr.org/ +- **License**: CC BY 4.0 + +### 2. Real-Time Military & Aviation Tracking + +#### AISStream (Vessel Tracking) +- **Connection**: WebSocket via Railway relay +- **Data**: Live vessel positions, course, speed, ship details +- **Coverage**: Global maritime traffic (AIS-equipped vessels) +- **Update Frequency**: Real-time (every 10-60 seconds per vessel) +- **Authentication**: API key required +- **Source**: https://aisstream.io/ +- **Notes**: Uses Railway relay server to maintain persistent WebSocket connection + +#### OpenSky Network (Aircraft Tracking) +- **API**: `/api/opensky` via Railway relay +- **Data**: Aircraft positions, altitude, velocity, callsign +- **Coverage**: Global air traffic with ADS-B coverage +- **Update Frequency**: Real-time (every 10 seconds) +- **Authentication**: OAuth2 (optional, for higher rate limits) +- **Source**: https://opensky-network.org/ +- **License**: Free for non-commercial use + +#### Wingbits (Aircraft Enrichment) +- **API**: Aircraft metadata enrichment +- **Data**: Aircraft owner, operator, type, registration details +- **Coverage**: Global aircraft registry +- **Authentication**: API key required +- **Source**: https://wingbits.com/ +- **Notes**: Used to enrich OpenSky data with ownership information + +#### FAA Status +- **API**: `/api/faa-status` +- **Data**: Airport delays, ground stops, flight restrictions +- **Coverage**: US airports and airspace +- **Update Frequency**: Every 5 minutes +- **Authentication**: None (public API) +- **Source**: https://www.faa.gov/ + +### 3. Natural Disasters & Environmental Events + +#### GDACS (Global Disaster Alert and Coordination System) +- **API**: Direct fetch to `https://www.gdacs.org/gdacsapi/api/events` +- **Data**: Major disasters (earthquakes, floods, cyclones, droughts) +- **Coverage**: Global, severity-based filtering +- **Update Frequency**: Real-time alerts +- **Authentication**: None (public API) +- **Source**: https://www.gdacs.org/ + +#### NASA EONET (Earth Observatory Natural Events Tracker) +- **API**: Direct fetch to `https://eonet.gsfc.nasa.gov/api/v3/events` +- **Data**: Wildfires, severe storms, volcanoes, floods, landslides, sea/lake ice +- **Coverage**: Global satellite observations +- **Update Frequency**: Near real-time (15-30 minute lag) +- **Authentication**: None (public API) +- **Source**: https://eonet.gsfc.nasa.gov/ + +#### NASA FIRMS (Fire Information for Resource Management System) +- **API**: `/api/firms-fires` +- **Data**: Satellite fire detections (VIIRS SNPP/NOAA-20) +- **Coverage**: Global, 375m resolution +- **Update Frequency**: Near real-time (3-hour lag) +- **Authentication**: API key required +- **Source**: https://firms.modaps.eosdis.nasa.gov/ + +#### USGS Earthquakes +- **API**: `/api/earthquakes` +- **Data**: Earthquake events magnitude 4.5+ +- **Coverage**: Global seismic network +- **Update Frequency**: Near real-time (5-10 minute lag for M4.5+) +- **Authentication**: None (public API) +- **Source**: https://earthquake.usgs.gov/ + +#### NWS (National Weather Service) +- **API**: Direct fetch to `https://api.weather.gov/alerts/active` +- **Data**: Severe weather alerts, warnings, watches +- **Coverage**: United States +- **Update Frequency**: Real-time +- **Authentication**: None (public API) +- **Source**: https://www.weather.gov/ + +#### Open-Meteo (Climate Data) +- **API**: Used by `/api/climate-anomalies` +- **Data**: Temperature, precipitation, climate anomalies (via ERA5 reanalysis) +- **Coverage**: Global gridded data +- **Update Frequency**: Daily for historical, 5-day lag for ERA5 +- **Authentication**: None (public API, optional key for higher limits) +- **Source**: https://open-meteo.com/ +- **Notes**: Processes Copernicus ERA5 data + +### 4. Financial Markets & Economics + +#### Finnhub +- **API**: `/api/finnhub` +- **Data**: Stock quotes, real-time prices, company data +- **Coverage**: US and international equities +- **Update Frequency**: Every 2 minutes (free tier: 60 calls/minute) +- **Authentication**: API key required +- **Source**: https://finnhub.io/ +- **License**: Free tier available + +#### Yahoo Finance +- **API**: `/api/yahoo-finance` +- **Data**: Stock prices, indices, historical data +- **Coverage**: Global markets +- **Update Frequency**: Every 2 minutes +- **Authentication**: None (public API) +- **Source**: https://finance.yahoo.com/ + +#### CoinGecko +- **API**: `/api/coingecko` +- **Data**: Cryptocurrency prices, market cap, 24h changes +- **Coverage**: Bitcoin, Ethereum, Solana, and major cryptocurrencies +- **Update Frequency**: Every 2 minutes +- **Authentication**: None (public API, free tier) +- **Source**: https://www.coingecko.com/ + +#### Polymarket +- **API**: `/api/polymarket` +- **Data**: Prediction market odds on political/world events +- **Coverage**: Global event predictions +- **Update Frequency**: Every 5 minutes +- **Authentication**: None (public Gamma API) +- **Source**: https://gamma-api.polymarket.com/ +- **Notes**: Shows real-money betting odds on outcomes + +#### FRED (Federal Reserve Economic Data) +- **API**: `/api/fred-data` +- **Data**: Economic indicators, interest rates, inflation, unemployment +- **Coverage**: US economic data, 800,000+ time series +- **Update Frequency**: Varies by series (daily to quarterly) +- **Authentication**: API key required +- **Source**: https://fred.stlouisfed.org/ +- **License**: Free for non-commercial use + +#### EIA (Energy Information Administration) +- **API**: `/api/eia/*` endpoints +- **Data**: Oil prices, petroleum production, inventory, natural gas +- **Coverage**: US energy data +- **Update Frequency**: Weekly for most series +- **Authentication**: API key required +- **Source**: https://www.eia.gov/ +- **License**: Public domain (US government) + +#### USA Spending +- **API**: Direct fetch to `https://api.usaspending.gov/api/v2` +- **Data**: US federal spending, defense contracts, grants +- **Coverage**: All US government spending +- **Update Frequency**: Daily +- **Authentication**: None (public API) +- **Source**: https://usaspending.gov/ + +### 5. Internet Infrastructure & Outages + +#### Cloudflare Radar +- **API**: `/api/cloudflare-outages` +- **Data**: Internet outages, BGP events, traffic anomalies +- **Coverage**: Global internet infrastructure +- **Update Frequency**: Near real-time +- **Authentication**: API token required (free Cloudflare account) +- **Source**: https://radar.cloudflare.com/ + +#### NGA Warnings (Undersea Cables) +- **API**: `/api/nga-warnings` +- **Data**: Cable repair warnings, navigation hazards +- **Coverage**: Global undersea infrastructure +- **Update Frequency**: As issued +- **Authentication**: None (public notices) +- **Source**: National Geospatial-Intelligence Agency + +### 6. Technology & Research + +#### ArXiv +- **API**: `/api/arxiv` +- **Data**: Scientific papers, preprints (CS.AI category by default) +- **Coverage**: Computer science, AI/ML research +- **Update Frequency**: Every hour +- **Authentication**: None (public API) +- **Source**: https://arxiv.org/ + +#### GitHub Trending +- **API**: `/api/github-trending` +- **Data**: Trending repositories by language and timeframe +- **Coverage**: All public GitHub repositories +- **Update Frequency**: Every 30 minutes +- **Authentication**: None (web scraping) +- **Source**: https://github.com/trending + +#### Hacker News +- **API**: `/api/hackernews` +- **Data**: Top stories, new stories, best stories +- **Coverage**: Tech/startup news aggregation +- **Update Frequency**: Every 5 minutes +- **Authentication**: None (public Firebase API) +- **Source**: https://news.ycombinator.com/ + +### 7. Cybersecurity + +#### Cyber Threats API +- **API**: `/api/cyber-threats` +- **Data**: Command & control servers, malware hosts, phishing sites, malicious IPs +- **Coverage**: Global threat intelligence +- **Update Frequency**: Multiple times daily +- **Authentication**: None (aggregates public threat feeds) +- **Sources**: + - AlienVault OTX + - URLhaus (abuse.ch) + - ThreatFox (abuse.ch) + - Feodo Tracker + - PhishTank +- **Notes**: Geo-locates threats for map display + +#### CISA Advisories +- **Feed**: Via RSS proxy +- **Data**: Cybersecurity alerts, advisories, vulnerability bulletins +- **Coverage**: US government cybersecurity guidance +- **Update Frequency**: As issued +- **Authentication**: None +- **Source**: https://www.cisa.gov/ + +### 8. News Aggregation (100+ RSS Feeds) + +The application aggregates news from over 100 RSS feeds across multiple categories: + +#### Wire Services (Tier 1) +- **Reuters** (World, Business) +- **AP News** +- **AFP** +- **Bloomberg** + +#### Government Sources (Tier 1) +- **White House** (press releases, statements) +- **State Department** (briefings, travel advisories) +- **Pentagon** (news releases, contracts) +- **UN News** +- **CISA** (cybersecurity alerts) +- **Treasury, DOJ, DHS, CDC, FEMA** +- **Federal Reserve, SEC** + +#### Major News Outlets (Tier 2) +- **BBC World, BBC Middle East** +- **Guardian World, Guardian Middle East** +- **NPR News** +- **CNN World** +- **Al Jazeera** +- **Financial Times** +- **Politico** + +#### Defense & Intelligence (Tier 3) +- **Defense One** +- **Breaking Defense** +- **The War Zone** +- **Defense News** +- **Janes** +- **Foreign Policy** +- **The Diplomat** +- **Bellingcat** +- **Krebs on Security** + +#### Think Tanks (Tier 3) +- **CSIS** (Center for Strategic & International Studies) +- **RAND Corporation** +- **Brookings Institution** +- **Carnegie Endowment** +- **Atlantic Council** +- **Foreign Affairs** +- **Arms Control Association** +- **Bulletin of the Atomic Scientists** +- **RUSI, Wilson Center, GMF, CNAS** + +#### Regional News +- **Middle East**: Al Jazeera, Al Arabiya, TRT World +- **Africa**: All Africa, ISS Africa +- **Latin America**: Latin America News +- **Asia-Pacific**: Nikkei, The Diplomat + +#### Technology & Startups (Tech Variant) +- **TechCrunch, VentureBeat, Ars Technica, The Verge** +- **Y Combinator, a16z, Sequoia blogs** +- **Crunchbase, CB Insights, PitchBook** +- **Regional**: e27 (SEA), 36Kr (China), Inc42 (India), TechCabal (Africa) +- **Think Tanks**: Stanford HAI, MIT Tech Review, OECD Digital + +#### Feed Processing +- **Proxy**: Feeds routed through `/api/rss-proxy` (Vercel) or Railway relay +- **Circuit Breakers**: Failed feeds enter 5-minute cooldown after 2 failures +- **Caching**: 10-minute TTL in browser localStorage +- **Classification**: AI-powered threat classification via Groq/OpenRouter +- **Entity Extraction**: Automatic extraction of locations, organizations, CVEs + +--- + +## Data Collection Methods + +### 1. REST API Calls + +Most data sources use standard HTTP/REST APIs: + +```javascript +// Example from services/earthquakes.ts +const response = await fetch('/api/earthquakes'); +const data = await response.json(); +``` + +**Characteristics:** +- Polling-based (request every X minutes) +- Edge function proxies hide API keys +- Rate limiting on server side +- Response caching via Upstash Redis + +### 2. RSS Feed Parsing + +RSS/Atom feeds are fetched and parsed in browser: + +```javascript +// Example from services/rss.ts +const response = await fetchWithProxy(feed.url); +const text = await response.text(); +const parser = new DOMParser(); +const doc = parser.parseFromString(text, 'text/xml'); +``` + +**Characteristics:** +- Proxied through Vercel or Railway to handle CORS +- DOM parser for XML processing +- Deduplication by URL and title +- Per-feed circuit breakers + +### 3. WebSocket Streaming + +Real-time data uses WebSocket connections: + +```javascript +// Example from services/ais.ts +const ws = new WebSocket(WS_RELAY_URL); +ws.onmessage = (event) => { + const message = JSON.parse(event.data); + // Process real-time vessel position +}; +``` + +**Characteristics:** +- Persistent connection via Railway relay +- Real-time updates (sub-second latency) +- Automatic reconnection on disconnect +- Used for: AIS vessels, OpenSky aircraft + +### 4. GraphQL Queries + +Some sources use GraphQL (e.g., Polymarket): + +```javascript +const response = await fetch('https://gamma-api.polymarket.com/query', { + method: 'POST', + body: JSON.stringify({ query: ... }) +}); +``` + +### 5. Web Scraping + +Limited scraping for sources without APIs: + +```javascript +// Example: GitHub trending (no official API) +const html = await fetch('/api/github-trending').then(r => r.text()); +// Parse HTML to extract trending repos +``` + +**Note**: Only used when no official API exists + +--- + +## Refresh Intervals & Caching + +### Update Frequencies (from `REFRESH_INTERVALS`) + +| Data Type | Refresh Interval | Rationale | +|-----------|------------------|-----------| +| **RSS Feeds** | 5 minutes | Balance freshness with rate limits | +| **Markets/Crypto** | 2 minutes | Fast-moving market data | +| **Prediction Markets** | 5 minutes | Slower-changing probabilities | +| **AIS Vessels** | 10 minutes | Snapshot refresh (WebSocket is real-time) | +| **ArXiv Papers** | 1 hour | Papers published once daily | +| **GitHub Trending** | 30 minutes | Rankings update slowly | +| **Hacker News** | 5 minutes | Active discussion board | + +### Caching Strategy + +#### 1. Cross-User Cache (Upstash Redis) +- **Purpose**: Share expensive operations across all users +- **Location**: Vercel edge functions +- **TTL**: Varies by data type (5-60 minutes) +- **Use Cases**: + - AI-generated summaries (World Brief) + - Risk score calculations + - ACLED conflict data (API has rate limits) + +```javascript +// Example from api/_upstash-cache.js +const cached = await redis.get(`cache:${key}`); +if (cached) return JSON.parse(cached); + +const fresh = await fetchFreshData(); +await redis.set(`cache:${key}`, JSON.stringify(fresh), { ex: 300 }); // 5 min TTL +``` + +#### 2. Per-User Cache (Browser localStorage) +- **Purpose**: Avoid re-fetching data on page reload +- **Location**: Browser localStorage +- **TTL**: 10 minutes default +- **Use Cases**: + - RSS feed items + - Market data + - Map layer data + +```javascript +// Example from services/persistent-cache.ts +export async function getPersistentCache(key: string): Promise | null> { + const item = localStorage.getItem(key); + if (!item) return null; + + const entry = JSON.parse(item); + if (Date.now() - entry.timestamp > entry.ttl) { + localStorage.removeItem(key); + return null; + } + + return entry; +} +``` + +#### 3. In-Memory Cache (Runtime) +- **Purpose**: Deduplicate requests within a session +- **Location**: JavaScript Map/Set objects +- **TTL**: Session lifetime +- **Use Cases**: + - Feed circuit breaker state + - Recent news items (deduplication) + - Entity index + +--- + +## Authentication & API Keys + +### Required API Keys (from `.env.example`) + +#### AI Services +- **GROQ_API_KEY**: Primary AI summarization (14,400 req/day free tier) +- **OPENROUTER_API_KEY**: Fallback AI (50 req/day free tier) + +#### Market Data +- **FINNHUB_API_KEY**: Stock quotes (60 calls/min free tier) +- **EIA_API_KEY**: Energy data (free registration) +- **FRED_API_KEY**: Federal Reserve data (free) + +#### Conflict Data +- **ACLED_ACCESS_TOKEN**: Protest/conflict events (free for researchers) + +#### Real-Time Tracking +- **AISSTREAM_API_KEY**: Vessel tracking (free tier available) +- **OPENSKY_CLIENT_ID/SECRET**: Aircraft tracking (optional for higher limits) +- **WINGBITS_API_KEY**: Aircraft enrichment (commercial) + +#### Infrastructure +- **CLOUDFLARE_API_TOKEN**: Internet outages (free Cloudflare account) +- **NASA_FIRMS_API_KEY**: Satellite fires (free registration) + +#### Caching +- **UPSTASH_REDIS_REST_URL/TOKEN**: Cross-user cache (free tier: 10k commands/day) + +### Optional Keys +- **WORLDPOP_API_KEY**: Higher rate limits for population data (works without) + +### Key Management +- **Storage**: Environment variables (`.env.local` for local dev) +- **Deployment**: Vercel environment variables (encrypted) +- **Security**: Keys never exposed to browser; all proxied through edge functions +- **Fallbacks**: App degrades gracefully when keys missing (disables features) + +--- + +## Data Processing Pipeline + +### 1. Ingestion + +``` +External Source → API Proxy → Circuit Breaker → Rate Limiter → Cache Check +``` + +**Steps:** +1. Request initiated by browser service +2. Routed through Vercel edge function (if authenticated) +3. Circuit breaker checks if source is in cooldown +4. Rate limiter enforces per-IP limits +5. Cache checked for recent data +6. Fresh fetch if cache miss + +### 2. Transformation + +``` +Raw Data → Parser → Normalizer → Enricher → Validator +``` + +**Steps:** +1. **Parse**: XML (RSS), JSON (APIs), HTML (scraping) +2. **Normalize**: Convert to common formats (NewsItem, MapMarker) +3. **Enrich**: Add geolocation, extract entities, classify threats +4. **Validate**: Filter invalid dates, missing fields, malformed data + +**Example:** +```javascript +// services/rss.ts +const items = Array.from(doc.querySelectorAll('item')).map(item => ({ + title: item.querySelector('title')?.textContent || '', + link: item.querySelector('link')?.textContent || '', + pubDate: new Date(item.querySelector('pubDate')?.textContent || Date.now()), + description: item.querySelector('description')?.textContent || '', + source: feed.name, +})); +``` + +### 3. Classification + +AI-powered classification for threat assessment: + +``` +Keyword Classifier (instant) → LLM Classifier (async) → Confidence Score +``` + +**Hybrid Approach:** +1. **Instant**: Keyword matching for common patterns +2. **Async**: LLM call for ambiguous cases (queued, cached) +3. **Confidence**: 0-100 score, shown as color in UI + +**Example:** +```javascript +// services/threat-classifier.ts +const keywordClass = classifyByKeyword(title, description); +if (keywordClass.confidence > 80) return keywordClass; + +// Low confidence - queue LLM classification +const llmClass = await classifyWithAI(title, description); +return llmClass; +``` + +### 4. Aggregation + +``` +Multiple Sources → Deduplication → Prioritization → Display +``` + +**Deduplication:** +- **Haversine distance**: Protest events within 5km merged +- **URL matching**: Same article from multiple feeds +- **Title similarity**: Fuzzy matching (Levenshtein distance) + +**Prioritization:** +- **Source tier**: Wire services > Government > Mainstream > Specialty +- **Recency**: Newer items ranked higher +- **Relevance**: Keyword matching for user monitors + +### 5. Presentation + +``` +Aggregated Data → UI Components → User Interaction → Storage +``` + +**Display Mechanisms:** +- **Map Layers**: Deck.gl markers with clustering (Supercluster) +- **News Panels**: Sorted by priority, filtered by category +- **Market Widgets**: Real-time updates with color coding +- **AI Briefs**: LLM-generated summaries with citations + +--- + +## Reliability & Resilience + +### 1. Circuit Breakers + +**Per-Feed Protection:** +```javascript +// services/rss.ts +const MAX_FAILURES = 2; +const FEED_COOLDOWN_MS = 5 * 60 * 1000; // 5 minutes + +function recordFeedFailure(feedName: string): void { + const state = feedFailures.get(feedName) || { count: 0, cooldownUntil: 0 }; + state.count++; + if (state.count >= MAX_FAILURES) { + state.cooldownUntil = Date.now() + FEED_COOLDOWN_MS; + console.warn(`[RSS] ${feedName} on cooldown for 5 minutes after ${state.count} failures`); + } + feedFailures.set(feedName, state); +} +``` + +**Benefits:** +- Prevents thundering herd to failing endpoints +- Allows degraded operation (shows cached data) +- Automatic recovery after cooldown + +### 2. Multi-Source Fallbacks + +**Conflict Data Example:** +```javascript +// Tries ACLED first, falls back to UCDP if unavailable +const conflicts = await fetchACLED().catch(() => fetchUCDP()); +``` + +**Protest Data:** +- **Primary**: ACLED protests endpoint +- **Secondary**: GDELT geo-tagged events (filtered for protests) +- **Merge**: Deduplicated by location and time + +**Natural Disasters:** +- **Primary**: GDACS alerts +- **Secondary**: NASA EONET events +- **Tertiary**: USGS earthquakes +- **Merge**: All three displayed simultaneously + +### 3. Offline Capabilities + +**Progressive Web App (PWA):** +- Service worker caches map tiles +- localStorage preserves feed data +- Offline indicator in UI +- Graceful degradation (no real-time updates) + +**Desktop App (Tauri):** +- Full offline map support +- Local data persistence +- Background sync when online + +### 4. Rate Limit Handling + +**IP-Based Limiting (Edge Functions):** +```javascript +// api/_ip-rate-limit.js +const MAX_REQUESTS = 100; +const WINDOW_MS = 60 * 1000; // 1 minute + +async function checkRateLimit(ip: string): Promise { + const key = `ratelimit:${ip}`; + const count = await redis.incr(key); + if (count === 1) await redis.expire(key, WINDOW_MS / 1000); + return count <= MAX_REQUESTS; +} +``` + +**Per-API Respecting:** +- Finnhub: 60 calls/min → poll every 2 minutes +- ACLED: Rate limited → Redis cache for 10 minutes +- GitHub: Unauthenticated scraping → 30 min intervals + +### 5. Error Handling + +**Graceful Degradation:** +```javascript +try { + const data = await fetchData(); + updateUI(data); +} catch (error) { + console.error(`Failed to fetch ${source}:`, error); + // Show cached data or hide panel + if (cached) updateUI(cached); + else hidePanel(); +} +``` + +**User Feedback:** +- Loading spinners during fetch +- Error badges on failed panels +- Retry buttons for manual recovery +- Console logging for debugging + +### 6. Data Freshness Monitoring + +**Telemetry:** +```javascript +// api/_cache-telemetry.js +export function recordCacheMetrics(key: string, hit: boolean, age?: number) { + // Tracks cache hit rates and data age + // Used to tune TTLs and identify stale sources +} +``` + +**Stale Data Detection:** +- Timestamps on all cached data +- Visual indicators for data age (e.g., "Updated 5m ago") +- Automatic refresh on stale detection + +--- + +## Architecture Diagrams + +### High-Level Data Flow + +``` +┌─────────────────────────────────────────────────────────────┐ +│ External Sources │ +│ APIs • RSS Feeds • WebSockets • Public Databases • Scrapers │ +└───────────────────────┬─────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────┐ +│ Vercel Edge Network │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ API Proxies │ │ Rate Limits │ │ CORS Handler │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ │ +│ │ │ │ │ +│ └────────────────┴──────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌──────────────────┐ │ +│ │ Upstash Redis │ │ +│ │ (Cross-user) │ │ +│ └──────────────────┘ │ +└───────────────────────┬─────────────────────────────────────┘ + │ + ┌───────────────┴───────────────┐ + │ │ + ▼ ▼ +┌─────────────────┐ ┌─────────────────┐ +│ Railway Relay │ │ Browser App │ +│ • AIS Stream │◄──────────┤ • Services │ +│ • OpenSky │ WebSocket│ • UI Components │ +│ • RSS Fallback │ │ • localStorage │ +└─────────────────┘ └─────────────────┘ +``` + +### Service Architecture + +``` +/api (Vercel Edge Functions) +├── acled.js → ACLED API +├── ais-snapshot.js → AIS via Railway +├── arxiv.js → ArXiv API +├── classify-event.js → AI classification +├── climate-anomalies.js → Open-Meteo +├── cloudflare-outages.js → Cloudflare Radar +├── coingecko.js → CoinGecko API +├── cyber-threats.js → Threat feeds aggregator +├── earthquakes.js → USGS API +├── firms-fires.js → NASA FIRMS +├── finnhub.js → Stock quotes +├── fred-data.js → Federal Reserve +├── gdelt-geo.js → GDELT GEO API +├── github-trending.js → GitHub scraper +├── hackernews.js → HN Firebase API +├── opensky.js → Aircraft tracking +├── polymarket.js → Gamma API +├── rss-proxy.js → RSS aggregation +├── ucdp.js → Uppsala Conflict +├── unhcr-population.js → Refugee data +└── worldbank.js → World Bank API + +/src/services (Browser Services) +├── rss.ts → Feed parser +├── ais.ts → Vessel tracking +├── flights.ts → Aircraft display +├── conflicts.ts → Conflict aggregation +├── earthquakes.ts → Disaster display +├── markets.ts → Market data +├── polymarket.ts → Predictions +├── cyber-threats.ts → Threat display +├── threat-classifier.ts → AI classification +├── trending-keywords.ts → Spike detection +├── country-instability.ts→ CII scoring +└── focal-point-detector.ts→ Convergence analysis +``` + +--- + +## Performance Optimizations + +### 1. Lazy Loading +- Map layers loaded on-demand +- News panels fetched only when visible +- AI analysis deferred until user request + +### 2. Progressive Disclosure +- Low-zoom: Clustered markers only +- High-zoom: Detailed markers and labels +- Detail layers (bases, nuclear) appear at zoom ≥ 5 + +### 3. Batch Processing +- Multiple AI classification calls batched +- RSS feeds fetched in parallel chunks +- Map markers rendered in bulk + +### 4. WebWorker Offloading +```javascript +// services/ml-worker.ts +const worker = new Worker('./analysis-worker.ts'); +worker.postMessage({ type: 'classify', items: headlines }); +worker.onmessage = (e) => updateUI(e.data); +``` + +**Offloaded Tasks:** +- Heavy computations (clustering, correlation) +- ML inference +- Large data parsing + +### 5. Incremental Updates +- Only changed data re-rendered +- Diff-based map marker updates +- Virtual scrolling for long lists + +--- + +## Privacy & Security + +### 1. No User Tracking +- **No cookies** (except session for auth if implemented) +- **No analytics by default** (optional self-hosted) +- **No telemetry** sent to third parties +- All data processing happens client-side or on user's Vercel account + +### 2. API Key Protection +- Keys stored in Vercel environment variables (encrypted at rest) +- Never exposed to browser (proxied through edge functions) +- Per-IP rate limiting prevents abuse + +### 3. Content Security +- CSP headers on all pages +- CORS restrictions on edge functions +- Input sanitization for user-provided data (monitors) + +### 4. Data Retention +- **Browser localStorage**: Cleared by user +- **Upstash Redis**: TTL-based expiration (max 60 minutes) +- **No permanent storage** of user data + +--- + +## Future Data Sources (Roadmap) + +### Planned Integrations +- **Sentinel Hub**: Satellite imagery for infrastructure monitoring +- **MarineTraffic**: Enhanced vessel tracking (port arrivals/departures) +- **FlightRadar24**: More detailed aircraft data +- **Shodan**: IoT device exposure +- **VirusTotal**: File/URL reputation +- **Twitter/X API**: Social media signals (if API access restored) +- **Telegram Channels**: OSINT community feeds + +### Community Requests +- Nuclear power plant status (IAEA) +- Space launch tracking (Space-Track.org) +- Commodities futures (CME, ICE) +- Container shipping rates (Freightos) + +--- + +## Troubleshooting + +### Common Issues + +#### 1. Missing Data in Panels +**Cause**: API key not configured or expired +**Solution**: Check `.env.local` for required keys, verify validity + +#### 2. "Circuit breaker activated" Messages +**Cause**: Feed failing repeatedly +**Solution**: Wait 5 minutes for auto-recovery, or check feed URL + +#### 3. Slow Map Loading +**Cause**: Too many layers enabled simultaneously +**Solution**: Disable unused layers, increase zoom level for detail + +#### 4. AIS/OpenSky Not Updating +**Cause**: Railway relay server down or WebSocket disconnected +**Solution**: Check `VITE_WS_RELAY_URL`, restart relay server + +#### 5. Outdated News +**Cause**: Cached data not refreshing +**Solution**: Clear localStorage, disable browser cache, check refresh intervals + +### Debug Mode + +Enable verbose logging: +```javascript +localStorage.setItem('debug', 'worldmonitor:*'); +location.reload(); +``` + +View console for: +- Feed fetch status +- Cache hits/misses +- API response times +- Circuit breaker states + +--- + +## Contributing New Data Sources + +### Guidelines + +1. **Check Licensing**: Ensure data source allows redistribution +2. **Respect Rate Limits**: Implement appropriate caching +3. **Add Circuit Breaker**: Use existing pattern from `rss.ts` +4. **Document API**: Add section to this document +5. **Test Fallbacks**: Verify graceful degradation + +### Adding a New Feed + +```javascript +// 1. Add to src/config/feeds.ts +const FEEDS = { + myCategory: [ + { name: 'My Source', url: rss('https://example.com/feed.xml') }, + ], +}; + +// 2. Add tier rating +export const SOURCE_TIERS: Record = { + 'My Source': 3, // Tier 3 = specialty source +}; + +// 3. Add to panel config (optional) +export const DEFAULT_PANELS: Record = { + myCategory: { name: 'My Category', enabled: true, priority: 2 }, +}; +``` + +### Adding a New API + +```javascript +// 1. Create edge function in /api/my-api.js +export default async function handler(req) { + const response = await fetch('https://api.example.com/data', { + headers: { 'Authorization': `Bearer ${process.env.MY_API_KEY}` } + }); + return response.json(); +} + +// 2. Create service in /src/services/my-service.ts +export async function fetchMyData() { + const response = await fetch('/api/my-api'); + if (!response.ok) throw new Error('Fetch failed'); + return response.json(); +} + +// 3. Update .env.example +MY_API_KEY= + +// 4. Document in this file +``` + +--- + +## License & Attribution + +### Data Sources +Each data source has its own license. See individual source documentation for terms: +- **Public Domain**: USGS, NASA, US Government sources +- **Open Data**: UCDP, UNHCR (CC BY 4.0) +- **Free Tier**: Finnhub, CoinGecko, OpenSky (non-commercial use) +- **Researcher Access**: ACLED (free for researchers) + +### Attribution Requirements +- **UNHCR**: Must credit "UNHCR, The UN Refugee Agency" +- **GDELT**: Credit "The GDELT Project" +- **UCDP**: Credit "Uppsala Conflict Data Program" +- **OpenSky**: Credit "The OpenSky Network" + +### World Monitor License +- **Code**: MIT License (see LICENSE file) +- **Data**: As per individual source licenses +- **Aggregation**: No copyright claimed on aggregated display + +--- + +## Support & Contact + +### Issues +- **GitHub**: https://github.com/koala73/worldmonitor/issues +- **Discussions**: https://github.com/koala73/worldmonitor/discussions + +### Documentation +- **Main README**: ../README.md +- **Full Docs**: ./DOCUMENTATION.md +- **Desktop Setup**: ./DESKTOP_CONFIGURATION.md +- **Release Process**: ./RELEASE_PACKAGING.md + +### Community +- **Discord**: (Link if exists) +- **Twitter**: (Link if exists) + +--- + +*Last Updated: 2024-02-16* +*Version: 2.1.4* From 7a21f56da99d7df2a3b4779f4f14f6793d961163 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 16 Feb 2026 03:40:24 +0000 Subject: [PATCH 3/3] Add references to DATA_COLLECTION.md in README and DOCUMENTATION Co-authored-by: backgroundcheck <18512725+backgroundcheck@users.noreply.github.com> --- README.md | 1 + docs/DOCUMENTATION.md | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/README.md b/README.md index 79fb7b910..6e4bf2318 100644 --- a/README.md +++ b/README.md @@ -22,6 +22,7 @@

Full Documentation  ·  + Data Sources  ·  All Releases

diff --git a/docs/DOCUMENTATION.md b/docs/DOCUMENTATION.md index 3aad822ff..d379b6e46 100644 --- a/docs/DOCUMENTATION.md +++ b/docs/DOCUMENTATION.md @@ -11,6 +11,10 @@ AI-powered real-time global intelligence dashboard aggregating news, markets, ge ![World Monitor Dashboard](../new-world-monitor.png) +> **📊 Data Collection**: For detailed information about data sources, APIs, and collection methods, see **[DATA_COLLECTION.md](./DATA_COLLECTION.md)** + +--- + ## Platform Variants World Monitor runs two specialized variants from a single codebase, each optimized for different monitoring needs: