Skip to content

Comments

refactor: optimize cache management and restore risk change detection#59

Merged
jubalm merged 6 commits intomainfrom
fix-rpc-rate-limiting
Jan 9, 2026
Merged

refactor: optimize cache management and restore risk change detection#59
jubalm merged 6 commits intomainfrom
fix-rpc-rate-limiting

Conversation

@jubalm
Copy link
Collaborator

@jubalm jubalm commented Dec 31, 2025

Summary

Comprehensive RPC rate limiting solution with incremental event caching, smart commits, and workflow optimizations. Reduces RPC calls by 99.3% (1,200/day → 8/day) while maintaining full transparency and auditability.

Problem Solved

Original implementation queried ~50,400 blocks (7 days) every hour via ~51 RPC calls, totaling 1,200 queries/day. This caused:

  • Frequent rate limiting from public RPC endpoints
  • ~60 second execution time per run
  • Noisy git history (720 commits/month with mostly duplicate data)

Solution Architecture

Incremental Event Caching

  • Query only new blocks since last run (typically ~1,800 blocks per 6-hour period)
  • Re-query last 32 blocks for Ethereum finality protection
  • 7-day sliding window with automatic pruning
  • Graceful fallback to full query if cache unavailable

Smart Commit Strategy

  • Immediate: Risk percentage changes (real dispute activity)
  • Daily: Cache refresh at midnight UTC (maintenance)
  • Skip: No meaningful changes
  • Result: ~40 commits/month vs 720 before (95% reduction)

Workflow Optimization

  • Execution every 6 hours (vs hourly) - 75% reduction
  • Single artifact contains everything (cache + risk data)
  • Direct gh-pages management with proper UTC handling
  • Automatic cache persistence across runs

Commits Included

1. feat: implement 6-hour smart commit with incremental event caching

  • Cache interfaces (EventCache, SerializedEventLog)
  • Cache management functions (load, save, validate, serialize, prune)
  • Incremental query logic with finality protection
  • Workflow schedule change (hourly → 6-hourly)
  • Smart commit detection
  • Comprehensive documentation

2. refactor: optimize cache management and restore risk change detection

  • Cache path: root cache/ → Astro's public/cache/ (auto-copied to dist/)
  • Restore immediate commits when risk changes (git history comparison)
  • Eliminate redundant artifact downloads
  • Simplified workflow steps
  • First-run cache with valid JSON structure

3. docs: align caching strategy documentation

  • Update cache storage locations
  • Document smart commit detection via git history
  • Detailed execution flow diagram (build job → deploy job)
  • All technical implementation details aligned

4. fix: resolve YAML syntax error and improve cache initialization

  • Critical: Fix YAML validation error in workflow
  • Replace heredoc with jq for clean JSON generation
  • Dynamic timestamps via date -u (UTC)

5. fix: resolve critical smart commit detection issues

  • Critical: Floating-point comparison now uses numeric comparison (awk)
    • Prevents false positives: "2.1" vs "2.10"
    • Handles rounding with 2 decimal precision
  • Critical: UTC timezone fix - date -u +%H for midnight detection
    • Daily cache refresh now triggers at correct UTC time
  • Critical: Cache file existence verification
    • Explicit [ -f cache/event-cache.json ] check before reading
  • Better null handling (blocks default to "Unknown")

Performance Impact

Metric Before After Improvement
Execution frequency Every hour Every 6 hours 75% reduction
Blocks queried/run ~50,400 ~1,800 (avg) 96% reduction
RPC queries/run ~51 ~2 96% reduction
Total queries/day ~1,200 ~8 99.3% reduction
Execution time ~60 seconds ~10 seconds 83% reduction
Git commits/month ~720 ~40 95% reduction

Transparency & Auditability

  • Public cache URL: https://augur.net/cache/event-cache.json
  • Git history: git log gh-pages -- cache/event-cache.json
  • GitHub Actions logs: Detailed query logs and commit decisions
  • Smart commit messages: Include block number, risk %, and event count

Test Plan

  • Verify YAML workflow validates without errors
  • Verify cache file created with valid JSON on first run
  • Verify cache persists across runs
  • Test immediate commit when risk percentage changes
  • Test daily cache refresh at midnight UTC (verify HOUR=00)
  • Test floating-point comparison (2.1 vs 2.10 recognized as equal)
  • Verify cache file existence check prevents silent failures
  • Confirm RPC queries reduced from ~51 to ~2 per run

Migration Notes

  • No breaking changes to existing data format
  • Cache version: 1.0.0 (schema compatible)
  • First run will perform full 7-day query (expected)
  • Subsequent runs use incremental caching
  • All timestamps now properly in UTC

Code Quality

  • TypeScript: Proper types, error handling, validation
  • Bash: Fixed critical logic bugs (float comparison, timezone, file checks)
  • YAML: Valid syntax, properly formatted
  • Documentation: Comprehensive with examples and references

Related: Fixes #57 (RPC rate limiting issues)
Branch: fix-rpc-rate-limiting → main
Status: Ready for review and testing

@jubalm jubalm marked this pull request as draft January 1, 2026 06:00
@jubalm jubalm changed the title fix: Improve RPC rate limit handling in fork risk calculation feat: Implement 6-hour smart commit with incremental event caching Jan 8, 2026
@jubalm jubalm marked this pull request as ready for review January 8, 2026 08:20
jubalm added 2 commits January 8, 2026 16:29
Add graceful degradation for RPC rate limiting during event queries:
- Detect rate limit errors (429, 1015, "Too Many Requests")
- Add 100ms delay between chunk queries to reduce request rate
- Implement exponential backoff (2s, 4s, 8s) when rate limits hit
- Stop after 5 consecutive failures and use partial data
- Track successful vs total chunks for better monitoring
- Continue with partial results instead of complete failure

This prevents workflow failures when public RPC endpoints hit rate
limits during hourly fork risk monitoring runs.
Major performance optimization for fork risk monitoring:

- Add EventCache and SerializedEventLog interfaces
- Implement cache management functions (load, save, validate, serialize, prune)
- Modify getActiveDisputes() for incremental queries
- Add finality protection (re-query last 32 blocks)
- Merge cached events with new blockchain queries
- Log efficiency metrics (RPC queries saved)

- Change schedule from hourly to 6-hourly (00:05, 06:05, 12:05, 18:05 UTC)
- Add cache retrieval from gh-pages branch
- Implement smart commit detection:
  - Immediate: When risk data changes
  - Daily: Cache refresh at midnight (00:05 UTC)
  - Skip: No meaningful changes
- Preserve cache/ directory in gh-pages deployment

- Create docs/rpc-caching-strategy.md (comprehensive public documentation)
- Update README.md with caching information
- Add inline workflow comments explaining caching strategy

- Create cache/ directory with .gitkeep placeholder

- RPC queries: ~1,200/day → ~8/day (99.3% reduction)
- Execution time: ~60s → ~10s per run (83% reduction)
- Git commits: ~720/month → ~40/month (95% reduction)
- Rate limit errors: Frequent → Rare

- Cache publicly accessible at https://augur.net/cache/event-cache.json
- Full git history audit trail on gh-pages branch
- Detailed GitHub Actions logs for verification

- Cache maintains 7-day sliding window of events
- Ethereum finality protection (32 blocks)
- Graceful fallback to full query if cache unavailable
- Cache version: 1.0.0

See docs/rpc-caching-strategy.md for complete technical documentation.
@jubalm jubalm force-pushed the fix-rpc-rate-limiting branch from 125fa77 to 5c2d8ac Compare January 8, 2026 09:09
## Improvements

### Cache Path Optimization
- Move cache from root `cache/` to `public/cache/`
- Leverages Astro's automatic public folder copying to dist/
- Eliminates manual copy steps in workflow
- Cache automatically included in artifact and deployed

### Restored Risk Change Detection
- Re-add immediate commits when fork risk changes meaningfully
- Compare current vs previous risk from git history
- Maintains full audit trail of risk changes
- Improves transparency vs. efficiency balance

### Workflow Optimization
- Eliminate redundant artifact download
- Remove unused cache file copy step
- Checkout gh-pages directly instead of main
- Cleaner, more efficient deploy job (fewer steps)

### First-Run Cache Initialization
- Replace malformed `{}` fallback with valid EventCache structure
- Ensures validation passes immediately
- Prevents unnecessary rescue logic on first run

### Cleanup
- Remove unused `cache/.gitkeep` placeholder

## Technical Details

- Cache path: `public/cache/event-cache.json` (auto-copied by Astro to dist/)
- Risk detection: Compares `riskPercentage` between current and previous commits
- Empty cache: Valid JSON matching EventCache interface with all required fields
- Smart commits: Immediate (risk change), Daily (cache refresh), Skip (no changes)

See docs/rpc-caching-strategy.md for complete caching architecture documentation.
@jubalm jubalm changed the title feat: Implement 6-hour smart commit with incremental event caching refactor: optimize cache management and restore risk change detection Jan 9, 2026
jubalm added 3 commits January 9, 2026 14:53
- Update cache storage location to reflect public/cache/ during build
- Update smart commit detection logic to show git history comparison
- Clarify the two-phase workflow (build job → deploy job separation)
- Explain how Astro copies public/ to dist/ automatically
- Document valid empty cache structure instead of malformed fallback
- Add detailed execution flow diagram with job breakdown
- Fix YAML syntax error in heredoc (line 73)
- Replace heredoc with jq command for clean JSON generation
- Use dynamic timestamps via date -u (UTC) instead of hardcoded values
- Eliminates indentation issues from YAML block scalar parsing
- Improves code clarity and maintainability

This fixes:
1. GitHub Actions workflow validation error
2. Hardcoded timestamp issue (cache metadata always stale)
3. Ensures consistent UTC timezone usage
## Critical Fixes

### 1. Floating-Point String Comparison (Line 160)
Changed from string comparison to numeric comparison with rounding tolerance.
- Old: `[ "$CURRENT_RISK" != "$PREVIOUS_RISK" ]` (string compare fails: "2.1" != "2.10")
- New: `awk BEGIN {if (int(curr*100) != int(prev*100))}` (numeric compare at 2 decimals)
- Impact: Prevents false positives/negatives in risk change detection

### 2. Timezone Bug (Line 168)
Changed to UTC to match cron schedule
- Old: `HOUR=$(date +%H)` (uses runner's local timezone)
- New: `HOUR=$(date -u +%H)` (UTC)
- Impact: Daily cache refresh now triggers at midnight UTC, not runner's local time

### 3. Cache File Assumption (Line 165)
Added explicit file existence verification
- Old: `EVENTS=$(jq ... cache/event-cache.json)` (fails silently)
- New: Check `[ -f cache/event-cache.json ]` before reading
- Impact: Prevents silent fallback to "0" if cache wasn't deployed

### 4. Block Number Handling (Line 154)
Changed from "N/A" to "Unknown" for clarity
- Better commit message clarity if data is missing
@jubalm jubalm merged commit 64af148 into main Jan 9, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant