Skip to content

Latest commit

 

History

History
960 lines (720 loc) · 40.7 KB

File metadata and controls

960 lines (720 loc) · 40.7 KB

Issue#12 - Implementing WebSocket Server See on GitHub repository

Overview

Real time communication protocols referes to continous exchange of data with minimal latency. Unlike traditional request-response model where data exchange happens on demand, live updates happens w/o requirement of refreshing UI.

The WebSocket Protocol is a real-time communication protocol that provides bi-directional data exchange, mostly this behavior called full-dublex communication, over persistent single TCP connection between client and the server.

Connection protocol consists of three phases, opening an upgrade request, server responds with 101 Switching Protocols then _establishing the connection. This preparation also called the handshake process.

    sequenceDiagram
        participant C as Client
        participant S as Server
        C->>S: Upgrade Request w/Upgrade Header
        Note over C,S: GET /chat HTTP/1.1<br/>HOST: example.com<br/>Upgrade: websocket<br/>Connection: Upgrade<br/>Sec-WebSocket-Key: xxx<br/>Sec-WebSocket-Versiion: 13
        S->>C: Handshake
        Note over S,C: 101 Switching Protocols<br/>Upgrade: Websockets<br/>Connection:Upgrade<br/>Sec-WebSocket-Accept: token
        S<<-->>C: Connection Established
Loading

Similar to HTTP and HTTPs, WebSockets have a unique set of prefixes:

  • ws: indicates an unencrypted connection without TLS
  • wss: indicates an encrypted connection secured by TLS.

While introducing less latency on data exchange and minimal overhead, it is mostly used in live-data exchange systems like chatting apps, online gaming and dashboards.

Research and Planning

Exploring Handshake

The handshake process is the bridge from HTTP to WebSockets. In the handshake, details of the connection are negotiated, and either party can back out before completion if the terms are unfavorable.

The server must be careful to understand everything the client asks for in order to maintain secure connection and data exchange.

Warning

PORTS: The server may listen on any port it chooses. However if it choosen any port rather than 80 or 443, it may raise problems with firewalls and proxies.

1. Initiating the handshake process, Client

Every case, a client must start the websocket handshake process by contacting the server and requesting a websocket connection.

The client will send a pretty standard HTTP request with required socket HEADERS, - the HTTP version must be 1.1 or greater - HTTP method must be GET

GET /chat HTTP/1.1
Host: example.com:8000
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

In addition, the client can define extension and/or subprotocol headers.

Note

Origin Header: All browsers send an Origin header. It can be used for security manners. This is effective against Cross site WebSocket Hijacking(CSWH). Most applications reject requests without this header.

Note

Status Codes: Regular HTTP status codes can be used only before the handshake. Use different set of codes after handshake process succession. See RFC Documentation

Server handshake response

When a server recieves a handshake request, it should respond with indicating that the protocol will be changing from HTTP to WebSocket.

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

The Sec-WebSocket-Accept header is important that the server must derive it from the client's request's Sec-WebSocket-Key header. Generation procedure: - Concatenate the client's Sec-WebSocket-Key and the magic string 258EAFA5-E914-47DA-95CA-C5AB0DC85B11. - Hash the concatenated string with SHA-1 - return hashed string in base64Encoded form in Sec-WebSocket-Accept header.

Warning

Magic string is a thing, look for it.

Note

The server can send other headers like Set-Cookie, or ask for authentication or redirects via other status codes, before sending the reply handshake.

Additionally the server can decide on extension/subprotocol requets here. See MDN WebSocket Documentation Miscellaneous part.

Exchanging Data Frames

Using websockets either the client or the server can send a message at any time. This message/data is transmitted using a sequence of frames. A client must mask all frames that it sends to the server whether or not the communication is over TLS. The server MUST NOT mask any frames that it sends to the client. Consequently, in order to communicate securely: - A client must close the connection upon receiving a MASKED frame with a status code 1002 (protocol error)

  • Similarly a server must close the connection upon receiving a NOT masked framed with a status code 1002 (protocol error)

Note

These rules might be relaxed in a future specification, says RFC WS Spec

A dataframe MAY be transmitted by either the client or the server at any time after opening handshake completion and before that endpoint has sent a CLOSE frame. See RFC WS SPEC - Section 5.5.1

The base framing protocol defines a frame type with an opcode, a payload length, and designated locations for "Extension data" and "Application data", which together define the "Payload data". Certain bits and opcodes are reserved for future expansion of the protocol.

Base Framing Protocol

Having to say that, all frames follow the same specific format. Data going from the client to the server is masked using XOR encryption(with a 32-bit key). See RFC WS SPEC - Section 5 for full description.

Data frame from the client to server (message length 0–125):

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |          Masking-key          |
|I|S|S|S|  (4)  |A|     (7)     |             (32)              |
|N|V|V|V|       |S|             |                               |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+-------------------------------+
|    Masking-key (continued)    |          Payload Data         |
+-------------------------------- - - - - - - - - - - - - - - - +
:                     Payload Data continued ...                :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
|                     Payload Data continued ...                |
+---------------------------------------------------------------+

Data frame from the client to server (16-bit message length):

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |             (16)              |
|N|V|V|V|       |S|   (== 126)  |                               |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+-------------------------------+
|                          Masking-key                          |
+---------------------------------------------------------------+
:                          Payload Data                         :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
|                     Payload Data continued ...                |
+---------------------------------------------------------------+

Data frame from the server to client (64-bit payload length):
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |             (64)              |
|N|V|V|V|       |S|   (== 127)  |                               |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
|               Extended payload length continued               |
+ - - - - - - - - - - - - - - - +-------------------------------+
|                               |          Masking-key          |
+-------------------------------+-------------------------------+
|    Masking-key (continued)    |          Payload Data         |
+-------------------------------- - - - - - - - - - - - - - - - +
:                     Payload Data continued ...                :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
|                     Payload Data continued ...                |
+---------------------------------------------------------------+
  • First Byte:

    • FIN: indicates whether that is the final fragment in a message. The first fragment may also be the final fragment. If it is 0 then the server keeps listening for more parts of the message. On the other hand if it is not, server should consider the message delivered to interpret what to do.
    • Bits 1-3, RSV1 & RSV2 & RSV3: Indicating the extensions. They MUST be 0 unless an extension is negoriated that defines meanings for non-zero values. Otherwise, the receiving endpoint MUST fail the websocket connection.
    • Bits 4-7, OPCODE: Defines how to interpret the payload data. If an unknown opcode is received, the receiving endpoint MUST fail the websocket connection.
      • 0x0: denotes frame continuation
      • 0x1: denotes text, which is always encoded in UTF-8
      • 0x2: denotes a binary frame
      • 0x8: denotes a connection close
      • 0x3 to 0x7: are reserved for further non-control frames
      • 0x9: denotes a ping
      • 0xA: denotes a pong
      • 0xB-F are reserved for further control frames. See RFC WS Spec - Control Frames
  • Bit 8, MASK: indicates whether the Payload data is masked or not. Besides client messages must be always masked, if it is set to 1, a masking key should be present in message in order to decode the message on server side.

  • Bits 9-15, Payload Length: Denotes the payload length in bytes 7 bits, 7+16 or 7+64 bits. See [Decoding Payload Length Section][### Decoding Payload Length] for further information

  • Masking-Key: 0 to 4 bytes: All frames sent from the client to the server are masked by a 32-bit value that is contained within the frame. This field is present if the mask bit is set to 1 and is absent of the mask bit iis set to 0. See RFC WS SPEC - Section 5.3 for further details on client-to-server masking.

  • All subsequent bytes are payload .

Decoding Payload Length and Reading Data

The Payload Data is defined as the concatenated data of Extension data and Application Data. The base framing protocol is formally defined by the following ABNF RFC5234.

Important

It is important to note that the representation of this data is binary, not ASCII characters.

Decoding Payload Length

Follow this guideline:

  • Read bits 9-15 inclusively, and interpret that as an unsigned integer . If it is 125 or less, then that's the payload length. If it s 126 go to the step 2. It if it is 127 go to the step 3.
  • If payload length is 126, read the next 16 bits and interpret those as an unsigned integer.
  • If payload length is 127 read the next 64 bits and interpret those as an unsigned integer. The most significant bit must be 0.

Reading and unmasking the data

Considering as the data is masked, read the next 4 octets / 32 bytes (octect is always 8 bits); this is the masking key. Once the payload length and masking key is decoded, read the corresponding payload length of bytes from the socket.

Let's call the data ENCODED, and the mask key as MASK. In order to get DECODED, loop through the octets of ENCODED and XOR the octet with the 4th octet of MASK. See the javascript code as an example

// The function receives the frame as a Uint8Array.
// firstIndexAfterPayloadLength is the index of the first byte
// after the payload length, so it can be 2, 4, or 10.
function getPayloadDecoded(frame, firstIndexAfterPayloadLength) {
  const mask = frame.slice(
    firstIndexAfterPayloadLength,
    firstIndexAfterPayloadLength + 4,
  );
  const encodedPayload = frame.slice(firstIndexAfterPayloadLength + 4);
  // XOR each 4-byte sequence in the payload with the bitmask
  const decodedPayload = encodedPayload.map((byte, i) => byte ^ mask[i % 4]);
  return decodedPayload;
}

const frame = Uint8Array.from([
  // FIN=1, RSV1-3=0, opcode=0x1 (text)
  0b10000001,
  // MASK=1, payload length=5
  0b10000101,
  // 4-byte mask
  1, 2, 3, 4,
  // 5-byte payload
  105, 103, 111, 104, 110,
]);

// Assume you got the number 2 from properly decoding the payload length
const decoded = getPayloadDecoded(frame, 2);

console.log(new TextDecoder().decode(decoded)); // "hello"

Note

Masking is a security measure to avoid malicious parties from predicting the data that is sent to the server. The client will generate a cryptographically random masking key for each message

Message Fragmentation

The primary purpose of fragmentation is to allow sending a message that is of unknown size when the message is started without having to buffer that message.

If messages could not be fragmented, then an server endpoint would have to buffer the entire message so its length could be counted before the first byte is sent. With fragmentation, a server or intermediary may choose a reasonable size buffer and, when the buffer is full, write a fragment to the network. See RFC WS Spec - Section 5.4 Fragmentation for further usecases and rules.

Client: FIN=1, opcode=0x1, msg="hello"
Server: (process complete message immediately) Hi.
Client: FIN=0, opcode=0x1, msg="and a"
Server: (listening, new message containing text started)
Client: FIN=0, opcode=0x0, msg="happy new"
Server: (listening, payload concatenated to previous message)
Client: FIN=1, opcode=0x0, msg="year!"
Server: (process complete message) Happy new year to you too!

The FIN and OPCODE fields work together to send a message split up into seperate frames. Fragmentation is only available for opcodes 0x0 to 0x2.

Important

The OPCODE denotes what a frame is meant to do. If it is 0x0 the frame i a continuation frame; this means the server should concatenate the frame's payload to the last frame it received from the client.

Notice the first frame contains an entire message (has FIN=1 and opcode!=0x0), so the server can process or respond as it sees fit. The second frame sent by the client has a text payload (opcode=0x1), but the entire message has not arrived yet (FIN=0). All remaining parts of that message are sent with continuation frames (opcode=0x0), and the final frame of the message is marked by FIN=1.

Control Frames

Control frames are used to communicate state about the websocket. They can be injected in the middle of a fragmented message.

Note

All control frames MUST have a payload length of 125 bytes or less and MUST NOT be fragmented.

Control frames can be declared using OPCODE part. Currently defined OPCODES for control frames includes 0x8 for CLOSE, 0x9 for PING and 0xA for PONG.

CLOSE, Closing the connection

To close a connection either party client or the server can send a control frame with data containing a specifid control sequence to begin the closing handshake. Upon receiving such a frame, the other party sends a CLOSE frame in response. The first party then closes the connection. Any further data recieved after closing is then discarded.

  • Close frames sent from client to server must be masked again.
  • The CLOSE frame MAY contain a body (the Application Data portion of the frame) that indicates a reason for closing.

Ping and Pongs

At any point after the handshake, either the client or the server can choose to send a ping to the other party. When the ping is received, the recipient must send back a pong as soon as possible. This mechanism generally used for health checks.

When server get a PING, it send back a PONG with the exact same PAYLOAD DATA as the ping. Either party might also get a PONG without ever sending a PING, it should be ignored.

Important

For both PING and PONG, the max payload length is 125 bytes.

Implementation Notes

Initiating WebSocket Connection

As mentioned, websocket connection starts with a couple of HTTP requests that constructing handshake process.

First client party initiates the handshake process with a HTTP requests similar to following:

GET /ws-endpoint HTTP/1.1
Host: example.com:PORT
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: some-key
Sec-WebSocket-Version: 13

On server party, handler should examine this request and upgrade the connection to websocket. While this process, server party should compute the Sec-WebSocket-Accept header value through Sec-WebSocket-Key header value which client sent. If server supports websocket connection, server should respond with 101 Switching Protocol:

HTTP/1.1 101 Switching Protocol
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: <computed-accept-header>

How to Generate Accept Key on Server Side ,Sec-WebSocket-Accept

Accept Key generation process involves,

  • Retrieving hashed Sec-WebSocket-Key value from the request headers
  • Concatenating raw Sec-WebSocket-Key with the magic-string 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 as Sec-WebSocket-Key+magic-string
  • Hashing the resulting string with SHA-1 chipher.
  • Base64 encoding the hashed string. Resultant base64Encoded string is the Sec-WebSocket-Accept value.

This header ensures and checks whether the issuer client requested websocket connection.

Using TCP Connection

After the handshake procedure is done with 101 Switching Protocol respond, the http.ResponseWriter cannot be used no more. In order to exchange data in real time using websockets, application is required to use raw TCP connection via net.Conn.

Thus in websocket connection handler, http.Hijacker interferface is used to construct TCP connection.

hijacker, k := w.(http.Hijacker)
conn, buffread, err := hijacker.Hijack()

Hijack() gains the control of the HTTP server and returns net.Conn interface instance. We can read data-frames over this conn from now on.

Instrumenting WebSocket Upgrade Handler

Similar to SSE part of this project, a Hub instance will orchestrate Clients and their connections.

Following code demonstrate the accept key generation and response construction private functions.

package ws

import (
	"crypto/sha1"
	"encoding/base64"
	"fmt"
	"net/http"
	"strings"
)

type websocket struct {
	host string
	port string
	h    *hub
}

const magicString string = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"

func generateAcceptKey(clientKey string) string {
	// Concatenate with magic string
	clientKey = clientKey + magicString
	// Hash the result using SHA-1
	hasher := sha1.New()
	hasher.Write([]byte(clientKey))
	// Encoding and creating Sec-WebSocket-Accept Header
	hashed := base64.StdEncoding.EncodeToString(hasher.Sum(nil))
	return hashed
}

func handshakeResponse(acceptHeader string) []byte {
	lines := []string{
		fmt.Sprintf("HTTP/1.1 %d %s", http.StatusSwitchingProtocols,http.StatusText(http.StatusSwitchingProtocols)),
		fmt.Sprintf("Sec-WebSocket-Accept: %s", acceptHeader),
		"Upgrade: websocket",
		"Connection: Upgrade",
		"",
	}

	return []byte(strings.Join(lines, "\r\n"))
}

After hijacking the connection in order to use raw TCP connection, http.ResponseWriter is able to be used. Instead application should read/write bytes of data over connection buffer. Also RFC Documentation states every line of the respond should include\r\n and an extra \r\n after the respond headers.

func (ws *websocket) Upgrade(w http.ResponseWriter, r *http.Request) {

	hijacker, k := w.(http.Hijacker)
	if !k {
		http.Error(w, "websocket connection is not supported", http.StatusInternalServerError)
		return
	}

	conn, buffRW, err := hijacker.Hijack()
	if err != nil {
		http.Error(w, err.Error(), http.StatusInternalServerError)
		return
	}
	client := NewClient(conn)

	// Read Sec-WebSocket-Key from request headers
	key := r.Header.Get("Sec-WebSocket-Key")

	// Generate Sec-WebSocket-Accept header value
	acceptHeader := generateAcceptKey(key)
	data := handshakeResponse(acceptHeader)
	buffRW.Write(data)

	buffRW.Flush()

	ws.h.register <- client

}

Upgrade handler, first Hijack() the HTTP connection to retrieve net.Conn interface instance and a bufio.ReadWriter instance. Returns a 500 Internal Server Error whether the http.Hijacker interface is not supported.

After that, in order to manage client connection on Client instance, it creates a new client and completes the handshake process with generating accept-key and using bufio.ReadWriter.Flush(). Finally sending the new created client instance to hub's register channel.

Reading and Decoding Data Frames

Since websocket messages involves essential binary components each containing meaningful data to process. Thus in order to read, decode, encode and write to connection bufio.ReadWriter interface was used. Considering buffRW is a bufio.ReadWriter instance, buffRW.ReadByte() allowes to read buffered message byte by bytes. Similarly to this, using bufio.WriteByte(c byte) allowes to write a byte of message c to buffer. Finally using buff.Flush() will flush buffer in order to forward message to connection. Otherwise connection will can not see the written message.

Reading incoming data frame

Reading incoming data frame consists of several parts. Thus in order to introduce clear instructions, following readDataFramesBytes function will be presented in parts.

func readDataFrameBytes(buff *bufio.ReadWriter) (*dataFrameBytes, error) {
	// reading first byte
	byte0, err := buff.ReadByte()
	if err != nil {
		return nil, err
	}
	opcode := byte0 & 0x0F
	fin := byte0 & 0x80
	rsv1, rsv2, rsv3 := byte0&0x40, byte0&0x20, byte0&0x10
    //... function continues

As mentioned before, using buff.ReadByte() allowes to read the next occuring byte of data from the buffer. First occuring byte of data will provide fin(1 bits), rsv#(1 bits each), opcode(4bits). Since using buff.ReadByte() gathers whole byte of data, in order to extract each component bitwise operations were used. See w3school guide for bitwise operations guide and, rapidtables.com for binary conversions tool and table.

Next byte carrying masked and payload-length information.

byte1, err := buff.ReadByte()
	if err != nil {
		return nil, err
	}
	isMasked := byte1 & 0x80

    //.. payload length calculations

Extracting masked information is very straightforward. However, as mentioned before we might required to read additional bytes of data if payload-length is greater than 125. Thus, following code part initially reads the remaning 7 bits of the second byte then conditionally reads next 2 bytes or 8 bytes. Lastly, payload-length is transformed and stored as uint## using binary.BigEndian.

// payload length calculations
	var payloadLengthUint uint64

	payloadLengthByte := byte1 & 0x7F

	if l := uint64(payloadLengthByte); l <= 125 {

		payloadLengthUint = l
	} else if l == 126 {
		// read 2 bytes more
		var lengthBytes []byte

		// reading next 2 bytes
		for range 2 {
			lengthByte, err := buff.ReadByte()
			if err != nil {
				return nil, err
			}
			lengthBytes = append(lengthBytes, lengthByte)
		}
		payloadLengthUint = uint64(binary.BigEndian.Uint16(lengthBytes))

	} else if l == 127 {
		var lengthBytes []byte
		for range 8 {
			lengthByte, err := buff.ReadByte()
			if err != nil {
				return nil, err
			}
			lengthBytes = append(lengthBytes, lengthByte)
		}
		payloadLengthUint = uint64(binary.BigEndian.Uint64(lengthBytes))
	}

Lastly, again reading masking-key and payload-data is straightforward. Whether the payload is masked, next 4 octets are presenting masking-key. All subsequent data should be the payload. One single note, in case of data is not masked, masking-key initiated with 4 bytes of 0. 0x0 has no effect on bitwise ^ XOR operation.

	// masking-key
	maskingKey := make([]byte, 4)
	if isMasked == 0x80 {
		for i := range 4 {
			maskingByte, err := buff.ReadByte()
			if err != nil {
				return nil, err
			}
			maskingKey[i] = maskingByte
		}
	}

	// reading payload
	var payload []byte
	for range payloadLengthUint {
		payloadFragment, err := buff.ReadByte()
		if err != nil {
			return nil, err
		}
		payload = append(payload, payloadFragment)
	}

complete code

func readDataFrameBytes(buff *bufio.ReadWriter) (*dataFrameBytes, error) {
	// reading first byte
	byte0, err := buff.ReadByte()
	if err != nil {
		return nil, err
	}
	opcode := byte0 & 0x0F
	fin := byte0 & 0x80
	rsv1, rsv2, rsv3 := byte0&0x40, byte0&0x20, byte0&0x10

	byte1, err := buff.ReadByte()
	if err != nil {
		return nil, err
	}
	isMasked := byte1 & 0x80

	// payload length calculations
	var payloadLengthUint uint64

	payloadLengthByte := byte1 & 0x7F

	if l := uint64(payloadLengthByte); l <= 125 {

		payloadLengthUint = l
	} else if l == 126 {
		// read 2 bytes more
		var lengthBytes []byte

		// reading next 2 bytes
		for range 2 {
			lengthByte, err := buff.ReadByte()
			if err != nil {
				return nil, err
			}
			lengthBytes = append(lengthBytes, lengthByte)
		}
		payloadLengthUint = uint64(binary.BigEndian.Uint16(lengthBytes))

	} else if l == 127 {
		var lengthBytes []byte
		for range 8 {
			lengthByte, err := buff.ReadByte()
			if err != nil {
				return nil, err
			}
			lengthBytes = append(lengthBytes, lengthByte)
		}
		payloadLengthUint = uint64(binary.BigEndian.Uint64(lengthBytes))
	}

	// masking-key
	maskingKey := make([]byte, 4)
	if isMasked == 0x80 {
		for i := range 4 {
			maskingByte, err := buff.ReadByte()
			if err != nil {
				return nil, err
			}
			maskingKey[i] = maskingByte
		}
	}

	// reading payload
	var payload []byte
	for range payloadLengthUint {
		payloadFragment, err := buff.ReadByte()
		if err != nil {
			return nil, err
		}
		payload = append(payload, payloadFragment)
	}

	return &dataFrameBytes{
		fin:           fin,
		rsv1:          rsv1,
		rsv2:          rsv2,
		rsv3:          rsv3,
		opcode:        opcode,
		isMasked:      isMasked,
		payloadLength: payloadLengthUint,
		maskingKey:    maskingKey,
		payload:       payload,
	}, nil
}
func ReadFrame(buffRW bufio.ReadWriter) (opcode byte, payload []byte, err error) {
	dataFrame, err := readDataFrameBytes(&buffRW)
	if err != nil {
		return 0, nil, fmt.Errorf("failed to read dataframe: %v", err)
	}

	maskedPayload, mask := dataFrame.payload, dataFrame.maskingKey
	var decodedPayload []byte
	for i, payloadByte := range maskedPayload {
		decodedPayload = append(decodedPayload, payloadByte^mask[i%4])
	}

	return dataFrame.opcode, decodedPayload, nil
}

Writing Message to Buffer

Writing payload to buffer follows the similar methodology. buff.WriteByte(b byte) allows to write bitwise messages in buffer in order. Since components occuring in bits, bitwise operations are used to construct components in bytes. For an example first byte carrying fin+rsv(3 bits)+opcode(4 bits) is constructed using bitwise | operator.

As RFC Spec declarations, messages from server to clients MUST NOT be masked. So written message carries 0 bytes of mask-key and 0 for masked bit. Lastly using buff.Write(b []byte) all subsequent buffer memory was filled with the payload. For now this operation only supports payload-length of 125.

func WriteFrame(buff bufio.ReadWriter,  payload []byte) error {
	// writing first byte : fin + opcode
	buff.WriteByte(0x80 | opcodeUTF8Text) // 1000|opcode
	// writing isMasked and payloadLengthBytes
	buff.WriteByte(byte(len(payload)))
	// no mask - so subsequent bytes are payload
	buff.Write(payload)

	if err := buff.Flush(); err != nil {
		return err
	}

	return nil

}

Similarly, despite those are carrying similar workflows, additional helper functions were defined corresponds to Close, Ping and Pong frames:

func WritePongFrame(buff *bufio.ReadWriter) error {
	buff.WriteByte(0x80 | opcodePong)

	if err := buff.Flush(); err != nil {
		return err
	}

	return nil
}

func WriteCloseFrame(buff *bufio.ReadWriter) error {
	buff.WriteByte(0x80 | opcodeClose)
	if err := buff.Flush(); err != nil {
		return err
	}

	return nil
}
Don't we need to use conn instance to send messages to client? No bufio is suffient

Using buff.Flush() instead net/conn is sufficient to read/write messages.

→ buff.WriteByte(0x80 | opcodeClose) → buff.Flush() → bufio.Writer flushes the buffer → writes on net.Conn → message is gone to client over TCP connection

bufio.ReadWriter wraps around net.Conn, Hijack() hijacks the connection, while bufio Flush() messages first goes to wrapped conn then goes to the client over TCP.

Managing Clients and Connections

Managing Clients

Despite either party can close the connection, application should be able to manage the data transaction process and connection status for each specific requesting party on the server side. A Client refers a single instance of connection to application's websocket endpoint. Through Clients, websocket server will be able to manage internet traffic and data transactions for each specific connection.

In order to represent each unique connection while managing the data transaction process for each connection, our Client modeled as follows:

type client struct {
	ID         string
	Connection net.Conn
	BuffRW     *bufio.ReadWriter
	Channel    chan []byte
	closeOnce  sync.Once
}
  • ID string: Unique identifier for clients
  • Connection net.Conn: holds a reference to the underlying TCP connection
  • BuffRW bufio.ReadWriter: provides buffered I/O
  • Channel chan []byte: provides message traffic buffer
  • closeOnce sync.Once: Guarantees single execution of clean-up logic

As mentioned, Client definition is encapsulating the data transaction process for each specific client's connection. Following client methods, readPump and writePump manages the process in concurrent threads:

func (c *client) readPump(h *hub) {

	for {
		opcode, _, err := ReadFrame(c.BuffRW)
		if err != nil {
			c.cleanUp(h)

			return
		}

		if opcode == opcodeClose {
			WriteCloseFrame(c.BuffRW)
			c.cleanUp(h)
			return
		}

		if opcode == opcodePing {
			if err := WritePongFrame(c.BuffRW); err != nil {
				c.cleanUp(h)
				return
			}
		}
	}
}

While using the helper functions from frame.go, readPump encapsulates the incoming message processing mechanism within the Client struct. Since data transaction is a continuous process until either party close the connection, method includes a continuous for loop. To summarize the mechanism, method consists of conditional statements and corresponding write operations.

func (c *client) writePump(h *hub) {
	for msg := range c.Channel {
		WriteFrame(c.BuffRW, msg)
	}
	c.cleanUp(h)

}

On the other hand, writePump method listens the client's channel for new messages and forwards each message to the client using WriteFrame helper function. Finally, cleanup function runs whether the channel is closed.

func (c *client) cleanUp(h *hub) {
	c.closeOnce.Do(func() {
		c.Connection.Close()
		close(c.Channel)
		h.disconnect <- c
	})
}

Lastly, cleanup makes sure the client connection is closed gracefully and message channel is closed. Also signals the Hub to disconnect the client by ssending it to the disconnect channel. This is ensures ending the connection operation is performed safely rather than mutatiing shared state and preventing deadlocks.

Managing Connections via Hub

Since each time a client connects to the websocket endpoint, a new isolated connection will be running for this client. Consequently, forwarding state changes through multiple connections will become a complex task if there is no centralized mechanism for managing connections and message channels.

Similar to SSE Hub system that we implemented on issue #5, a centralized connection orhestrator Hub will be aware of all connection actions while be able to broadcast messagesss through all or filtered client channels.

Hub orchestrator architecture will be similar to the previous one on the SSE part, with some little differences.

  • broadcast parameter type changed into chan []byte, while in sse it was chan string. Server sent event servers only supports the MIME type text/event-stream while websockets carrying data in bytes. Thus this conversion was required.
type hub struct {
	mu          sync.RWMutex
	register    chan *client
	disconnect  chan *client
	broadcast   chan []byte
	connections map[string]*client
}

On the other hand, method signatures and logics nearly stayed same. Having to said that, disconnectClient business logic is changed due to seperate client connection logic and hub orchestrator logic. In hub.disconnectClient method, orhecstrator only deletes the given client connection from the connection map it holding. Although, client instances can manage their connection and message channels without any dependency required.

func (h *hub) disconnectClient(c *client) error {
	h.mu.Lock()
	defer h.mu.Unlock()
	if _, exists := h.connections[c.ID]; !exists {
		return fmt.Errorf("client connection does not exists")
	}

	delete(h.connections, c.ID)
	return nil
}

Implementing Event Handling and Typed Events

The WebSocket implementation uses a typed event protocol to structure all messages exchanged between client and server. Every message must conform to the following structure:

type event struct {
    Type    string          `json:"type"`
    Payload json.RawMessage `json:"payload"`
}

Type identifies the event category — such as board.card.created or user.joined. Payload is stored as json.RawMessage, meaning it is kept as raw JSON bytes without being deserialized into a concrete type. This allows the Hub or handler to inspect the event type first, then deserialize the payload into the appropriate struct based on that type.

Marshalling converts an event struct into a byte slice for transmission over the WebSocket connection:

func (e event) Marshal() ([]byte, error) {
    return json.Marshal(&e)
}

Unmarshalling parses an incoming byte slice — read from a WebSocket text frame — back into an event struct:

func UnmarshallEvent(payload []byte) (*event, error) {
    var message event
    if err := json.Unmarshal(payload, &message); err != nil {
        return nil, fmt.Errorf("failed event unmarshalling: %v", err)
    }
    return &message, nil
}

All messages are transmitted as UTF-8 text frames (opcode 0x1). Binary frames are not used. On the frontend, messages must be sent via JSON.stringify to ensure correct encoding.

Registering Read/Write Channels to websocket Handler

As previously mentioned, Upgrade handler method is the entry point for all websocket connections. It enables the real-time data transmission by means of processing handshake procedure and transforming initial HTTP request into persistent Websocket connection.

Standard HTTP in Go manages the connection lifecyle using request-response model which not sufficient to construct a persistence connection between client and server. Thus Upgrade handler Hijacks the connection. Hijacker interface transfers the ownership of the TCP connection away from the HTTP server.

From this point all subsequent reads and writes are became the responsibilty of the websocket implementation.

To more detailed information visit Initiating WebSocket Connection again.

Starting the Connection Lifecycle

After the handshake is completed, a client is constructed from the hijacked net/connection conn and buffio.ReadWriter instances, and registered with the Hub.

go client.readPump(ws.h)
go client.writePump(ws.h)

Upgrade returns immediately after launching these goroutines. The connection lifecycle — reading frames, writing frames, and cleanup — is entirely delegated to readPump and writePump.


Implementation Decisions

Testing with a go client instead of websocket

Initially a websocket testing tool websocat was considered for verifying the implementation. However, running websocat inside the development container somehow failed to transmit messages over the connection despite successfully establishing the handshake and initiating the connection.

In order to work around this, a minimal Go test client was constructed. The client manually performs the websocket handshake over a raw TCP connection, constructs masked data frames for text, ping and close data frames with relevant opcodes. Finally reads the server response in order to test whether the prefered data transaction is committed. Server's responds are writtin into console in order to serve full visibility.

package main

import (
	"crypto/sha1"
	"encoding/base64"
	"fmt"
	"net"
)

func makeFrame(opcode byte, payload []byte) []byte {
	maskKey := []byte{0x37, 0xfa, 0x21, 0x3d}
	frame := []byte{0x80 | opcode, byte(0x80 | len(payload))}
	frame = append(frame, maskKey...)
	for i, b := range payload {
		frame = append(frame, b^maskKey[i%4])
	}
	return frame
}

func handshake(conn net.Conn) {
	key := "dGhlIHNhbXBsZSBub25jZQ=="
	h := sha1.New()
	h.Write([]byte(key + "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"))
	accept := base64.StdEncoding.EncodeToString(h.Sum(nil))
	fmt.Println("expect accept:", accept)

	req := "GET /ws HTTP/1.1\r\nHost: development:8080\r\nUpgrade: websocket\r\nConnection: Upgrade\r\nSec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==\r\nSec-WebSocket-Version: 13\r\n\r\n"
	conn.Write([]byte(req))

	buf := make([]byte, 1024)
	n, _ := conn.Read(buf)
	fmt.Println("handshake response:", string(buf[:n]))
}

func readResponse(conn net.Conn, label string) {
	buf := make([]byte, 1024)
	n, _ := conn.Read(buf)
	fmt.Printf("[%s] got back: %v\n", label, buf[:n])
}

func main() {
	conn, _ := net.Dial("tcp", "development:8080")
	handshake(conn)

	// Test 1: text frame
	fmt.Println("\n--- Test 1: Text Frame ---")
	msg := []byte(`{"type":"board.card.created","payload":{}}`)
	conn.Write(makeFrame(0x1, msg))
	readResponse(conn, "text")

	// Test 2: ping frame
	fmt.Println("\n--- Test 2: Ping ---")
	conn.Write(makeFrame(0x9, []byte{}))
	readResponse(conn, "ping")

	// Test 3: close frame
	fmt.Println("\n--- Test 3: Close ---")
	conn.Write(makeFrame(0x8, []byte{}))
	readResponse(conn, "close")
}

Following code-block represents the output of previous test block and confirms all the three frames were handled correctly by the server:

expect accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
handshake response: HTTP/1.1 101 Switching Protocols
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Upgrade: websocket
Connection: Upgrade

--- Test 1: Text Frame ---
[text] got back: [129 42 123 34 116 121 112 101 34 ...]

--- Test 2: Ping ---
[ping] got back: [138]   // 0x8A = pong frame

--- Test 3: Close ---
[close] got back: [136]  // 0x88 = close frame

Related ADRs

References