Skip to content

Latest commit

 

History

History
87 lines (63 loc) · 6.4 KB

File metadata and controls

87 lines (63 loc) · 6.4 KB

Systems Architecture Principles

Principles from "Designing Data-Intensive Applications" and the System Design Primer, applied to daily engineering decisions — especially for frontend engineers who want to think beyond the UI layer.

When designing or reviewing data flows

  • Identify the source of truth for every piece of data. If you can't point to it, the system has a design flaw.
  • Optimistic updates are eventual consistency — always implement rollback for mutations that can fail.
  • Prefer idempotent operations. Trade submissions, payment requests, and state mutations should be safe to retry.
  • When the frontend makes 3+ API calls to render a single view, advocate for a BFF (Backend for Frontend) endpoint that aggregates the data server-side.

When working with real-time data (WebSockets, SSE, order books)

  • WebSocket feeds are change data capture (CDC) — the backend streams state deltas. Understand whether you're receiving snapshots or incremental updates.
  • Back pressure matters: if the server pushes events faster than the UI can render, throttle/debounce updates to the next animation frame.
  • Stale data is a feature, not a bug, in eventually consistent systems. Know the staleness budget for each data type (order book: milliseconds; account balance: seconds; historical trades: minutes).

When implementing caching (React Query, SWR, or otherwise)

  • Every cache entry needs an explicit invalidation strategy. "It'll just refetch" is not a strategy.
  • Stale-while-revalidate: serve stale data immediately, fetch fresh in background — React Query's default. Good for read-heavy, staleness-tolerant data.
  • Cache-aside (lazy loading): check cache → miss → fetch from source → populate cache. Most common pattern.
  • Write-through: update cache AND source simultaneously. Use when consistency matters more than write latency.
  • When debugging stale data, trace through ALL cache layers: browser, CDN, API gateway, application cache, database query cache.

When making API design decisions

  • REST for CRUD resources, RPC-style for complex actions (executeTrade, settleMarket), WebSocket for real-time streams.
  • Batch endpoints eliminate N+1 request patterns — fetch related resources in one call instead of N sequential calls.
  • Pagination is not optional for list endpoints. Cursor-based > offset-based for large, changing datasets.
  • API versioning strategy should be decided upfront, not bolted on.

When reasoning about consistency and availability

  • CAP theorem: in a network partition, you choose consistency (reject requests) or availability (serve potentially stale data). Know which your system chooses.
  • Read-after-write consistency: after a user submits a trade, they should immediately see it in their order history. If the API is eventually consistent, the frontend must fake this with local state.
  • Monotonic reads: a user should never see data go "backwards." If they saw their balance as $100, the next read shouldn't show $95 from a stale replica.

When discussing system design with backend engineers

  • Ask: "What happens when this service goes down?" (failure modes)
  • Ask: "What's the consistency model of this endpoint?" (strong vs eventual)
  • Ask: "Where is this data partitioned and how?" (scaling strategy)
  • Ask: "What's the write path vs the read path?" (CQRS, read replicas)
  • Ask: "Is this operation idempotent?" (retry safety)
  • Trace the full request lifecycle: frontend → load balancer → API gateway → service → database → response. Know every hop.

When reasoning about database design (even from the frontend)

  • Normalization reduces redundancy but requires joins (slower reads). Denormalization duplicates data but enables fast reads. Trading systems often denormalize for read performance.
  • Indexes make reads fast but slow down writes. If a read-heavy endpoint is slow, suggest adding an index.
  • Transactions ensure atomicity — multiple related writes either all succeed or all fail. When your frontend needs multiple related API calls, ask if there's a single transactional endpoint.

When thinking about scalability

  • Vertical scaling (bigger machine) has a ceiling. Horizontal scaling (more machines) requires stateless services.
  • If your API behaves inconsistently between requests, check if a load balancer is routing to different servers with different state.
  • Read-heavy systems scale with read replicas. Write-heavy systems scale with partitioning/sharding. Know which your system is.

When writing infrastructure code (interceptors, auth, middleware, providers)

Infrastructure code operates below the feature layer — every request, every user flows through it. Think adversarially:

  • What if this runs twice? Interceptors, retries, event handlers can fire multiple times. A retry that replays a trade submission creates a duplicate order.
  • What if two of these race? Two refresh calls, two reconnects, two state resets. Use deduplication (singleton promises, coordinators, cooldowns).
  • What if the first call succeeded but the response was lost? A 401 after a successful mutation means the mutation happened — retrying creates duplicates. Only auto-retry idempotent operations (GET).
  • What HTTP methods flow through this? An interceptor that treats GETs and POSTs the same is a bug waiting to happen.
  • One coordinator per operation — if multiple triggers (timer, visibility, error) all do the same thing, route through a single module with shared deduplication. Never have two independent code paths (React Query mutation + raw fetch) hitting the same endpoint.
  • Simple scheduling over clever scheduling — a fixed interval beats computed-expiry-with-timeout-fallback chains. Fewer moving parts = fewer edge cases.

When handling failures

  • Everything fails. Network requests, servers, databases, third-party APIs. Design flows that degrade gracefully.
  • Circuit breaker pattern: after N failures, stop trying for a cool-down period. Don't hammer a failing service.
  • Retry with exponential backoff + jitter. Never retry immediately, never retry forever.
  • Timeouts are mandatory for every external call. No timeout = potential infinite hang.

Key numbers to internalize

  • L1 cache reference: 0.5 ns
  • Main memory reference: 100 ns
  • SSD random read: 150 us
  • Round trip within same datacenter: 500 us
  • Disk seek: 10 ms
  • Read 1 MB sequentially from network: 10 ms
  • Round trip CA to Netherlands: 150 ms
  • A single server can handle ~10k-100k concurrent WebSocket connections