Skip to content

Latest commit

 

History

History
292 lines (271 loc) · 8.37 KB

File metadata and controls

292 lines (271 loc) · 8.37 KB

Systems Design

The design making process for a complex application

Sample Interview Questions

  • how would you build/design X?
  • choose an application and walk through its components
  • why do you think X framework was chosen over Y framework in this application?
  • suppose we build system S, how would we handle X, Y, Z
  • we want to design a service to perform X

Things to consider

  • what are the use cases and who will use it?
    • casual users or big corporate clients?
  • data storage
    • db style, data types, how long to store
  • frontend
  • collecting metrics
  • exposing logs

Cool Terms

  • job queue
  • cache data layer
  • load balancer
  • microservices

White papers, Design Docs

  • read them

Computer Networking

TCP/IP Model

  • standard internet communication protocol

Layers

Link (a link)

  • move data packets between Internet layer of 2 different hosts on same link
  • usually physical connection

Internet (the internet)

  • exchange data through routing/IP addressing
  • IP, ICMP, IGMP

Transport (delivery)

  • TCP, UDP

Application

  • HTTP, FTP, SMTP, …

TCP (Transport)

Reliable

Session-based

  • establish session through 3-way handshake
    • 3 messages
  • all packets guaranteed to reach destination in correct order without corruption
  • packet control to control data transfer rate

Workflow

  1. A initiates with B a. they exchange packets in a pattern like a cool handshake b. they begin TCP
  2. A sends packets to B with a sequence number and checksum for correct order + data integrity
  3. B acknowledges receipt of packets a. If B does not acknowledge a packet, A resends

UDP (Transport)

Speed

  • no session
  • sends data from A to B, no verification
    • hope for the best
  • best for streaming

HTTP (Application)

  • send HTML
  • client-response model

API Design

RPC

  • call a function on a remote server
    • just like a regular function call
  • action based APIs/execute processes remotely
  • tight coupling
    • needs detailed documentation for usage

REST

  • resource based API
  • stateless
  • native caching
  • idempotent
    • multiple identical requests is same as single request
  • loose coupling

Metrics

Latency

Distance

CDN (Content Delivery Network)

  • have many distributed CDN servers which store content closer to end users

Transport medium

  • physical wires
    • optic fibers

Storage

  • caching
  • compress files

Error Rates

RUM (Real User Monitoring)

  • observe how users use app

CPU Usage

  • server speed
    • high traffic will slow down website
  • optimize and clean extraneous data

Memory Usage

  • allocate more memory to VMs that need it

Monolith vs. Microservices

Low Coupling, High Cohesion

Coupling

  • dependencies between different modules of an application
    • how much do they rely on one another

Cohesion

  • how closely related are elements of a single module

Monolith

  • where the entire system has to be deployed at once
    • usually 1 server, 1 db, 1 file system

Pros

  • easy to scale
  • simple to develop/test
  • easy to deploy

Cons

  • low maintainability
    • high coupling
    • changes in one place will affect another place
  • one error means entire system is down
  • low design flexibility
    • stuck in 1 language

Microservices

  • distributed, decentralized
  • each function is an independent application with own server/db
    • communicate through API calls
  • API gateway redirects traffic to respective API
    • end user does not notice that product is based on microservices

Pros

  • independent deployment
  • flexibility
    • each service is developed separately
  • loose coupling
    • scale up individual services which require it
  • fault-tolerant
    • error is isolated to its service
  • parallel programming

Cons

  • more time and money to develop
  • latency introduced through API calls

Containerization

  • OS is shared by different containers

Docker

  • encapsulates microservices with dependencies as image
  • deploy images

Caching Strategies

Layers

  • logical separation (organization) of code
  • ex. UI layer, logic layer, DAL

Tiers

  • physical separation (deployment)
  • n-tier architecture
    • each tier hosts a layer
  • can host multiple layers in a single tier ex. combine logic layer and DAL as server tier

3-layer architectures

Presentation

  • UI

Application

  • logic, controllers

Data

  • data storage and handling

Caching

store data in kvp in fast volatile memory

HIT

  • data in cache, unexpired
    • return cached data

MISS

  • data not in cache
    • receive data from db, update cache, return data

Pros

  • fast data retrieval
  • reduce query load/times to db
  • can temporarily substitute for db if db unavailable

Cons

  • increased maintenance required
    • data security
  • must keep data fresh

Strategies

Aside

  • default
  • if cache has data, return to data else call db and update cache
  • no guarantee of data consistency between db and cache
  • first request is always a miss, must spend time to write

Read Through

  • similar to above, but the application always requests from cache
    • cache is DAL and handles all data operations
  • cache grabs data from DB and updates itself
  • the application only interacts with the cache and not the db

Write Through

  • application writes data to cache instead of db
  • cache then immediately writes to db
  • guaranteed data consistency
  • increased write latency

Write Back

  • application writes data to cache instead of db
  • cache periodically writes to db
  • bear db downtimes
  • lesser load on db
  • increased write latency
  • cache failure results in complete data loss

Load Balancing

  • manages and distributes network load across multiple servers
  • optimizes speed by ensuring that no server is overloaded
  • redirects requests to other servers
  • adds and removes servers on demand

User sessions

  • usually stored on client side and shared with server
  • under a distributed system with multiple servers and respective DBs, we want to reduce cache misses on user data
  • load balancers will redirect requests from a user session to a single server

Multiple Balancers

Active-Passive

  • use 1 active balancer until issue
    • swap over to the passive balancer

Active-Active

  • traffic is distributed over many servers using load balancing algorithm

Security

  • if you get ddosed your load balancer can offset it to your favourite cloud provider

Types

Hardware

  • Cisco, TP-Link, etc. machines

Software

  • Nginx, AWS Elastic load balancing

Algorithms

  • random
  • sequential round robin
  • IP hashing
  • other hashing (ex. request URL)
  • least connections
  • fastest response time/fewest active connections

MapReduce

  • perform distributed/parallel processing on large data sets
  • split big data into chunks and process them in parallel on multiple servers
  • it then merges and returns the processed data
  • input is in form of kvp

Map tasks

  • distribute data throughout the cluster of nodes, and each node will perform its computation

Reduce tasks

  • takes result of map phase and reduces it down into single value

Ex. Word Count

Count number of occurences of each word in text

  • distribute words into nodes
  • Map s.t. key-value is (word, 1)
  • sorting/grouping occurs where data of the same key (same words) are processed in the same node
  • reduce returns a single value for each of the keys
  • in this case, reduce will sum the key values (words) and return the word count

Event Driven Architecture (EDA)

  • connect distributed systems in real-time using events

Publisher/Subscriber Pattern

  • similar to observer, but the subject has no idea of the observers
  • observer -> subscriber
  • subject -> publisher
  • broadcast
    • publishers push to subscribers
  • polling
    • subscribers pull from publishers

Cloud Computing

As A Service

IaaS (Infrastructure)

  • rent machines

PaaS (Platform)

  • providees a service to develop/test/maintain software

SaaS (Software)

  • on-demand access of popular software

Proxy Servers

  • redirect user browsing activity

Forward Proxy Server

  • client sends request to proxy
  • proxy forwards request to target server
  • proxy returns obtained content
  • hide IP address

Reverse Proxy Server

  • sits in front of servers and routes incoming requests to appropriate web servers
  • protects servers on backend
    • extra layer of security between network requests and server
  • performs grunt work such as auth, caching, encryption/decryption for backend
    • performance improvement
  • can also act as load balancer (nginx)
  • gateway to backend web servers