Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 157 additions & 0 deletions .github/copilot/instructions/performance-monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Performance Monitoring Guide

## Overview
This guide covers performance measurement strategies for ctrld-sync, including how to efficiently measure impact, common bottlenecks, and testing approaches.

## Key Performance Metrics

### User-Facing Metrics
- **Cold start sync time**: First run without cache (baseline: ~10-30s for 23 folders)
- **Warm cache sync time**: Subsequent runs with valid cache (target: <5s)
- **API calls per sync**: Total API requests made (fewer is better, measure before/after)
- **Rules processed per second**: Throughput for large rule sets

### System Metrics
- **Cache hit rate**: (hits + validations) / total requests (target: >80% on warm runs)
- **Memory peak usage**: Maximum RSS during sync (baseline: <100MB for typical workloads)
- **API rate limit headroom**: Distance from 429 errors

## Critical Constraints

### API Rate Limits (Non-Negotiable)
The Control D API has strict rate limits. Existing safeguards:
- Serial processing (max_workers=1) to prevent 429 errors
- 60-second wait after deletions to prevent "zombie state"
- Conservative DELETE_WORKERS=3 for parallel deletes
- Batch size of 500 items per request (empirically chosen)

**Never increase parallelism without first implementing rate limit header parsing.**

### Batch Size Rationale
The 500-item batch size is not dynamic. It was chosen through production testing to stay under API limits. Rather than smart batching, focus on better retry logic (exponential backoff with jitter).

## Measurement Strategies

### Quick Synthetic Benchmarks
```bash
# Measure cache effectiveness (after implementation)
python main.py --dry-run # Should show cache stats

# Measure rule validation performance
pytest tests/test_security.py -v # Look for slow tests

# Profile memory usage
python -m memory_profiler main.py --dry-run
```

### Realistic User Journey Tests
```bash
# Cold start (clear cache first)
rm -rf ~/.cache/ctrld-sync/blocklists.json
time python main.py --dry-run

# Warm cache (run twice)
time python main.py --dry-run
time python main.py --dry-run # Second run should be faster
```

### CI Performance Tracking
- Monitor GitHub Actions workflow duration trends
- Track test suite execution time with `pytest --durations=10`
- Use CI artifacts to store timing data across runs

## Common Bottlenecks

### 1. Cold Start Downloads
**Symptom**: Slow first run
**Causes**:
- Downloading all blocklists from scratch
- No persistent cache or stale cache
**Solutions**:
- ✅ Persistent disk cache with ETag/Last-Modified (implemented)
- ✅ HTTP conditional requests (304 Not Modified) (implemented)
- Future: Parallel DNS validation (deferred due to complexity)

### 2. API Request Overhead
**Symptom**: High latency even with small rule sets
**Causes**:
- Too many small API calls
- No request batching
**Solutions**:
- Batch rule updates (500 items per request)
- Track API call counts to identify inefficiencies
- Connection pooling (httpx already configured)

### 3. Test Suite Duration
**Symptom**: Slow CI runs
**Causes**:
- Sequential test execution
- No dependency caching
**Solutions**:
- ✅ pytest-xdist for parallel execution (implemented)
- ✅ CI pip dependency caching (implemented)

## Performance Engineering Workflow

### 1. Identify Target
Review performance plan discussion and choose specific bottleneck.

### 2. Establish Baseline
Before making changes:
```bash
# Run tests and record timing
pytest --durations=10 > baseline-tests.txt

# Run actual sync and record metrics
python main.py --dry-run > baseline-sync.txt 2>&1
```

### 3. Implement Change
Make focused, minimal changes. Avoid premature optimization.

### 4. Measure Impact
After changes:
```bash
# Compare test timing
pytest --durations=10 > optimized-tests.txt
diff baseline-tests.txt optimized-tests.txt

# Compare sync performance
python main.py --dry-run > optimized-sync.txt 2>&1
diff baseline-sync.txt optimized-sync.txt
```

### 5. Validate Correctness
Ensure tests still pass:
```bash
pytest -n auto # Parallel test execution
```

## Success Criteria

### For This Optimization (API Call Tracking)
- ✅ API call counter added to global stats
- ✅ Summary output includes "API calls made"
- ✅ No regression in existing tests
- ✅ Non-intrusive implementation (minimal code changes)

### General Performance PR Requirements
- Clear before/after measurements
- No test regressions
- Documentation of trade-offs
- Reproducible test instructions

## Graceful Degradation

Performance optimizations must fail safely:
- Corrupted cache should trigger clean cold start, not crash
- Failed API calls should retry with exponential backoff
- Performance tracking failures should not block sync

## Future Optimization Targets

Based on maintainer priorities:
1. **Performance regression tests**: Automated benchmarks in CI
2. **Improved retry logic**: Exponential backoff with jitter
3. **Memory efficiency**: Streaming for 100k+ rule sets (low priority)
4. **Rate limit header parsing**: Required before any parallelization increase
41 changes: 36 additions & 5 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -495,6 +495,7 @@ def _api_client() -> httpx.Client:
# to enable fast cold-start syncs via conditional HTTP requests (304 Not Modified)
_disk_cache: Dict[str, Dict[str, Any]] = {} # Loaded from disk on startup
_cache_stats = {"hits": 0, "misses": 0, "validations": 0, "errors": 0}
_api_stats = {"control_d_api_calls": 0, "blocklist_fetches": 0}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _api_stats dictionary is accessed concurrently from multiple threads without synchronization. Both _gh_get (called from warm_up_cache with ThreadPoolExecutor) and _api_get/_api_post/_api_delete (called from get_all_existing_rules with ThreadPoolExecutor max_workers=5, and from delete operations with DELETE_WORKERS=3) can increment these counters simultaneously. This creates a race condition where increments can be lost.

Dictionary operations like += are not atomic in Python - they involve read-modify-write cycles that can interleave, causing lost updates. For example, if two threads read the same value 5, both increment to 6, and both write 6, one increment is lost.

Consider using the existing _cache_lock (or creating a dedicated lock) to protect these counter updates, similar to how _cache_stats updates within _cache_lock protection (line 901, 931). Alternatively, use threading.Lock() or atomic counter classes from the threading module.

Copilot uses AI. Check for mistakes.


def get_cache_dir() -> Path:
Expand Down Expand Up @@ -820,19 +821,36 @@ def validate_folder_data(data: Dict[str, Any], url: str) -> bool:
return True


# Lock to protect updates to _api_stats in multi-threaded contexts.
# Without this, concurrent increments can lose updates because `+=` is not atomic.
_api_stats_lock = threading.Lock()


def _api_get(client: httpx.Client, url: str) -> httpx.Response:
global _api_stats
with _api_stats_lock:
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.get(url))


def _api_delete(client: httpx.Client, url: str) -> httpx.Response:
global _api_stats
with _api_stats_lock:
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.delete(url))


def _api_post(client: httpx.Client, url: str, data: Dict) -> httpx.Response:
global _api_stats
with _api_stats_lock:
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.post(url, data=data))


def _api_post_form(client: httpx.Client, url: str, data: Dict) -> httpx.Response:
global _api_stats
with _api_stats_lock:
_api_stats["control_d_api_calls"] += 1
return _retry_request(
lambda: client.post(
url,
Expand Down Expand Up @@ -884,14 +902,18 @@ def _gh_get(url: str) -> Dict:

SECURITY: Validates data structure regardless of cache source
"""
global _cache_stats
global _cache_stats, _api_stats

# First check: Quick check without holding lock for long
with _cache_lock:
if url in _cache:
_cache_stats["hits"] += 1
return _cache[url]

# Track that we're about to make a blocklist fetch
with _cache_lock:
_api_stats["blocklist_fetches"] += 1

# Check disk cache for conditional request headers
headers = {}
cached_entry = _disk_cache.get(url)
Expand Down Expand Up @@ -2120,14 +2142,23 @@ def validate_profile_input(value: str) -> bool:
)
print("=" * table_width + "\n")

# Display API statistics
total_api_calls = _api_stats["control_d_api_calls"] + _api_stats["blocklist_fetches"]
if total_api_calls > 0:
print(f"{Colors.BOLD}API Statistics:{Colors.ENDC}")
print(f" • Control D API calls: {_api_stats['control_d_api_calls']:>7,}")
print(f" • Blocklist fetches: {_api_stats['blocklist_fetches']:>7,}")
print(f" • Total API requests: {total_api_calls:>7,}")
print()

# Display cache statistics if any cache activity occurred
if _cache_stats["hits"] + _cache_stats["misses"] + _cache_stats["validations"] > 0:
print(f"{Colors.BOLD}Cache Statistics:{Colors.ENDC}")
print(f" • Hits (in-memory): {_cache_stats['hits']:>6,}")
print(f" • Misses (downloaded): {_cache_stats['misses']:>6,}")
print(f" • Validations (304): {_cache_stats['validations']:>6,}")
print(f" • Hits (in-memory): {_cache_stats['hits']:>7,}")
print(f" • Misses (downloaded): {_cache_stats['misses']:>7,}")
print(f" • Validations (304): {_cache_stats['validations']:>7,}")
if _cache_stats["errors"] > 0:
print(f" • Errors (non-fatal): {_cache_stats['errors']:>6,}")
print(f" • Errors (non-fatal): {_cache_stats['errors']:>7,}")

# Calculate cache effectiveness
total_requests = _cache_stats["hits"] + _cache_stats["misses"] + _cache_stats["validations"]
Expand Down
125 changes: 125 additions & 0 deletions tests/test_api_tracking.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
"""Test API call tracking functionality"""
import unittest
from unittest.mock import patch, MagicMock
import sys
sys.path.insert(0, '.')


class TestAPITracking(unittest.TestCase):
"""Tests for API call tracking and statistics"""

def setUp(self):
"""Save original counter values before each test."""
import main
self.original_control_d_calls = main._api_stats["control_d_api_calls"]
self.original_blocklist_fetches = main._api_stats["blocklist_fetches"]

def tearDown(self):
"""Restore original counter values after each test."""
import main
main._api_stats["control_d_api_calls"] = self.original_control_d_calls
main._api_stats["blocklist_fetches"] = self.original_blocklist_fetches

def test_api_stats_initialized(self):
"""Test that _api_stats is properly initialized"""
import main
# Should have both counters
self.assertIn("control_d_api_calls", main._api_stats)
self.assertIn("blocklist_fetches", main._api_stats)
# Should start at 0
self.assertIsInstance(main._api_stats["control_d_api_calls"], int)
self.assertIsInstance(main._api_stats["blocklist_fetches"], int)

@patch('main.httpx.Client')
def test_api_get_increments_counter(self, mock_client_class):
"""Test that _api_get increments the API call counter"""
import main

# Record initial value
initial_count = main._api_stats["control_d_api_calls"]

# Mock the client and response
mock_client = MagicMock()
mock_response = MagicMock()
mock_response.raise_for_status = MagicMock()
mock_client.get.return_value = mock_response

# Call _api_get
main._api_get(mock_client, "http://test.url")

# Verify counter was incremented by 1
self.assertEqual(main._api_stats["control_d_api_calls"], initial_count + 1)

@patch('main.httpx.Client')
def test_api_post_increments_counter(self, mock_client_class):
"""Test that _api_post increments the API call counter"""
import main

# Record initial value
initial_count = main._api_stats["control_d_api_calls"]

# Mock the client and response
mock_client = MagicMock()
mock_response = MagicMock()
mock_response.raise_for_status = MagicMock()
mock_client.post.return_value = mock_response

# Call _api_post
main._api_post(mock_client, "http://test.url", {"key": "value"})

# Verify counter was incremented by 1
self.assertEqual(main._api_stats["control_d_api_calls"], initial_count + 1)

@patch('main.httpx.Client')
def test_api_delete_increments_counter(self, mock_client_class):
"""Test that _api_delete increments the API call counter"""
import main

# Record initial value
initial_count = main._api_stats["control_d_api_calls"]

# Mock the client and response
mock_client = MagicMock()
mock_response = MagicMock()
mock_response.raise_for_status = MagicMock()
mock_client.delete.return_value = mock_response

# Call _api_delete
main._api_delete(mock_client, "http://test.url")

# Verify counter was incremented by 1
self.assertEqual(main._api_stats["control_d_api_calls"], initial_count + 1)

@patch('main._gh')
def test_gh_get_increments_blocklist_counter(self, mock_gh_client):
"""Test that _gh_get increments the blocklist fetch counter"""
import main

# Record initial value
initial_count = main._api_stats["blocklist_fetches"]

# Clear cache to ensure we make a fresh request
main._cache.clear()

# Mock the streaming response

Check notice on line 104 in tests/test_api_tracking.py

View check run for this annotation

codefactor.io / CodeFactor

tests/test_api_tracking.py#L103-L104

Try, Except, Pass detected. (B110)
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.headers = {}
mock_response.iter_bytes.return_value = [b'{"test": "data"}']
mock_response.__enter__ = MagicMock(return_value=mock_response)
mock_response.__exit__ = MagicMock(return_value=False)

mock_gh_client.stream.return_value = mock_response

# Call _gh_get
try:
result = main._gh_get("http://test.blocklist.url")
except Exception:
pass # May fail on validation, we just care about the counter
Comment on lines +117 to +118

Check notice

Code scanning / Bandit

Try, Except, Pass detected. Note test

Try, Except, Pass detected.

# Verify blocklist counter was incremented by at least 1
self.assertGreaterEqual(main._api_stats["blocklist_fetches"], initial_count + 1)


if __name__ == '__main__':
unittest.main()
Loading