292 lines (271 loc) · 8.37 KB

Systems Design

The design making process for a complex application

Sample Interview Questions

how would you build/design X?
choose an application and walk through its components
why do you think X framework was chosen over Y framework in this application?
suppose we build system S, how would we handle X, Y, Z
we want to design a service to perform X

Things to consider

what are the use cases and who will use it?
- casual users or big corporate clients?
data storage
- db style, data types, how long to store
frontend
collecting metrics
exposing logs

Cool Terms

job queue
cache data layer
load balancer
microservices

White papers, Design Docs

read them

Computer Networking

TCP/IP Model

standard internet communication protocol

Layers

Link (a link)

move data packets between Internet layer of 2 different hosts on same link
usually physical connection

Internet (the internet)

exchange data through routing/IP addressing
IP, ICMP, IGMP

Transport (delivery)

TCP, UDP

Application

HTTP, FTP, SMTP, …

TCP (Transport)

Reliable

Session-based

establish session through 3-way handshake
- 3 messages
all packets guaranteed to reach destination in correct order without corruption
packet control to control data transfer rate

Workflow

A initiates with B a. they exchange packets in a pattern like a cool handshake b. they begin TCP
A sends packets to B with a sequence number and checksum for correct order + data integrity
B acknowledges receipt of packets a. If B does not acknowledge a packet, A resends

UDP (Transport)

Speed

no session
sends data from A to B, no verification
- hope for the best
best for streaming

HTTP (Application)

send HTML
client-response model

API Design

RPC

call a function on a remote server
- just like a regular function call
action based APIs/execute processes remotely
tight coupling
- needs detailed documentation for usage

REST

resource based API
stateless
native caching
idempotent
- multiple identical requests is same as single request
loose coupling

Metrics

Latency

Distance

CDN (Content Delivery Network)

have many distributed CDN servers which store content closer to end users

Transport medium

physical wires
- optic fibers

Storage

caching
compress files

Error Rates

RUM (Real User Monitoring)

observe how users use app

CPU Usage

server speed
- high traffic will slow down website
optimize and clean extraneous data

Memory Usage

allocate more memory to VMs that need it

Monolith vs. Microservices

Low Coupling, High Cohesion

Coupling

dependencies between different modules of an application
- how much do they rely on one another

Cohesion

how closely related are elements of a single module

Monolith

where the entire system has to be deployed at once
- usually 1 server, 1 db, 1 file system

Pros

easy to scale
simple to develop/test
easy to deploy

Cons

low maintainability
- high coupling
- changes in one place will affect another place
one error means entire system is down
low design flexibility
- stuck in 1 language

Microservices

distributed, decentralized
each function is an independent application with own server/db
- communicate through API calls
API gateway redirects traffic to respective API
- end user does not notice that product is based on microservices

Pros

independent deployment
flexibility
- each service is developed separately
loose coupling
- scale up individual services which require it
fault-tolerant
- error is isolated to its service
parallel programming

Cons

more time and money to develop
latency introduced through API calls

Containerization

OS is shared by different containers

Docker

encapsulates microservices with dependencies as image
deploy images

Caching Strategies

Layers

logical separation (organization) of code
ex. UI layer, logic layer, DAL

Tiers

physical separation (deployment)
n-tier architecture
- each tier hosts a layer
can host multiple layers in a single tier ex. combine logic layer and DAL as server tier

3-layer architectures

Presentation

UI

Application

logic, controllers

Data

data storage and handling

Caching

store data in kvp in fast volatile memory

HIT

data in cache, unexpired
- return cached data

MISS

data not in cache
- receive data from db, update cache, return data

Pros

fast data retrieval
reduce query load/times to db
can temporarily substitute for db if db unavailable

Cons

increased maintenance required
- data security
must keep data fresh

Strategies

Aside

default
if cache has data, return to data else call db and update cache
no guarantee of data consistency between db and cache
first request is always a miss, must spend time to write

Read Through

similar to above, but the application always requests from cache
- cache is DAL and handles all data operations
cache grabs data from DB and updates itself
the application only interacts with the cache and not the db

Write Through

application writes data to cache instead of db
cache then immediately writes to db
guaranteed data consistency
increased write latency

Write Back

application writes data to cache instead of db
cache periodically writes to db
bear db downtimes
lesser load on db
increased write latency
cache failure results in complete data loss

Load Balancing

manages and distributes network load across multiple servers
optimizes speed by ensuring that no server is overloaded
redirects requests to other servers
adds and removes servers on demand

User sessions

usually stored on client side and shared with server
under a distributed system with multiple servers and respective DBs, we want to reduce cache misses on user data
load balancers will redirect requests from a user session to a single server

Multiple Balancers

Active-Passive

use 1 active balancer until issue
- swap over to the passive balancer

Active-Active

traffic is distributed over many servers using load balancing algorithm

Security

if you get ddosed your load balancer can offset it to your favourite cloud provider

Types

Hardware

Cisco, TP-Link, etc. machines

Software

Nginx, AWS Elastic load balancing

Algorithms

random
sequential round robin
IP hashing
other hashing (ex. request URL)
least connections
fastest response time/fewest active connections

MapReduce

perform distributed/parallel processing on large data sets
split big data into chunks and process them in parallel on multiple servers
it then merges and returns the processed data
input is in form of kvp

Map tasks

distribute data throughout the cluster of nodes, and each node will perform its computation

Reduce tasks

takes result of map phase and reduces it down into single value

Ex. Word Count

Count number of occurences of each word in text

distribute words into nodes
Map s.t. key-value is (word, 1)
sorting/grouping occurs where data of the same key (same words) are processed in the same node
reduce returns a single value for each of the keys
in this case, reduce will sum the key values (words) and return the word count

Event Driven Architecture (EDA)

connect distributed systems in real-time using events

Publisher/Subscriber Pattern

similar to observer, but the subject has no idea of the observers
observer -> subscriber
subject -> publisher
broadcast
- publishers push to subscribers
polling
- subscribers pull from publishers

Cloud Computing

As A Service

IaaS (Infrastructure)

rent machines

PaaS (Platform)

providees a service to develop/test/maintain software

SaaS (Software)

on-demand access of popular software

Proxy Servers

redirect user browsing activity

Forward Proxy Server

client sends request to proxy
proxy forwards request to target server
proxy returns obtained content
hide IP address

Reverse Proxy Server

sits in front of servers and routes incoming requests to appropriate web servers
protects servers on backend
- extra layer of security between network requests and server
performs grunt work such as auth, caching, encryption/decryption for backend
- performance improvement
can also act as load balancer (nginx)
gateway to backend web servers