Principles from "Designing Data-Intensive Applications" and the System Design Primer, applied to daily engineering decisions — especially for frontend engineers who want to think beyond the UI layer.
- Identify the source of truth for every piece of data. If you can't point to it, the system has a design flaw.
- Optimistic updates are eventual consistency — always implement rollback for mutations that can fail.
- Prefer idempotent operations. Trade submissions, payment requests, and state mutations should be safe to retry.
- When the frontend makes 3+ API calls to render a single view, advocate for a BFF (Backend for Frontend) endpoint that aggregates the data server-side.
- WebSocket feeds are change data capture (CDC) — the backend streams state deltas. Understand whether you're receiving snapshots or incremental updates.
- Back pressure matters: if the server pushes events faster than the UI can render, throttle/debounce updates to the next animation frame.
- Stale data is a feature, not a bug, in eventually consistent systems. Know the staleness budget for each data type (order book: milliseconds; account balance: seconds; historical trades: minutes).
- Every cache entry needs an explicit invalidation strategy. "It'll just refetch" is not a strategy.
- Stale-while-revalidate: serve stale data immediately, fetch fresh in background — React Query's default. Good for read-heavy, staleness-tolerant data.
- Cache-aside (lazy loading): check cache → miss → fetch from source → populate cache. Most common pattern.
- Write-through: update cache AND source simultaneously. Use when consistency matters more than write latency.
- When debugging stale data, trace through ALL cache layers: browser, CDN, API gateway, application cache, database query cache.
- REST for CRUD resources, RPC-style for complex actions (executeTrade, settleMarket), WebSocket for real-time streams.
- Batch endpoints eliminate N+1 request patterns — fetch related resources in one call instead of N sequential calls.
- Pagination is not optional for list endpoints. Cursor-based > offset-based for large, changing datasets.
- API versioning strategy should be decided upfront, not bolted on.
- CAP theorem: in a network partition, you choose consistency (reject requests) or availability (serve potentially stale data). Know which your system chooses.
- Read-after-write consistency: after a user submits a trade, they should immediately see it in their order history. If the API is eventually consistent, the frontend must fake this with local state.
- Monotonic reads: a user should never see data go "backwards." If they saw their balance as $100, the next read shouldn't show $95 from a stale replica.
- Ask: "What happens when this service goes down?" (failure modes)
- Ask: "What's the consistency model of this endpoint?" (strong vs eventual)
- Ask: "Where is this data partitioned and how?" (scaling strategy)
- Ask: "What's the write path vs the read path?" (CQRS, read replicas)
- Ask: "Is this operation idempotent?" (retry safety)
- Trace the full request lifecycle: frontend → load balancer → API gateway → service → database → response. Know every hop.
- Normalization reduces redundancy but requires joins (slower reads). Denormalization duplicates data but enables fast reads. Trading systems often denormalize for read performance.
- Indexes make reads fast but slow down writes. If a read-heavy endpoint is slow, suggest adding an index.
- Transactions ensure atomicity — multiple related writes either all succeed or all fail. When your frontend needs multiple related API calls, ask if there's a single transactional endpoint.
- Vertical scaling (bigger machine) has a ceiling. Horizontal scaling (more machines) requires stateless services.
- If your API behaves inconsistently between requests, check if a load balancer is routing to different servers with different state.
- Read-heavy systems scale with read replicas. Write-heavy systems scale with partitioning/sharding. Know which your system is.
Infrastructure code operates below the feature layer — every request, every user flows through it. Think adversarially:
- What if this runs twice? Interceptors, retries, event handlers can fire multiple times. A retry that replays a trade submission creates a duplicate order.
- What if two of these race? Two refresh calls, two reconnects, two state resets. Use deduplication (singleton promises, coordinators, cooldowns).
- What if the first call succeeded but the response was lost? A 401 after a successful mutation means the mutation happened — retrying creates duplicates. Only auto-retry idempotent operations (GET).
- What HTTP methods flow through this? An interceptor that treats GETs and POSTs the same is a bug waiting to happen.
- One coordinator per operation — if multiple triggers (timer, visibility, error) all do the same thing, route through a single module with shared deduplication. Never have two independent code paths (React Query mutation + raw fetch) hitting the same endpoint.
- Simple scheduling over clever scheduling — a fixed interval beats computed-expiry-with-timeout-fallback chains. Fewer moving parts = fewer edge cases.
- Everything fails. Network requests, servers, databases, third-party APIs. Design flows that degrade gracefully.
- Circuit breaker pattern: after N failures, stop trying for a cool-down period. Don't hammer a failing service.
- Retry with exponential backoff + jitter. Never retry immediately, never retry forever.
- Timeouts are mandatory for every external call. No timeout = potential infinite hang.
- L1 cache reference: 0.5 ns
- Main memory reference: 100 ns
- SSD random read: 150 us
- Round trip within same datacenter: 500 us
- Disk seek: 10 ms
- Read 1 MB sequentially from network: 10 ms
- Round trip CA to Netherlands: 150 ms
- A single server can handle ~10k-100k concurrent WebSocket connections