Vexa WebSocket Usage

Overview

WebSocket connections provide efficient, low-latency transcript updates compared to polling REST endpoints. Since REST transcript retrieval is not suitable for frequent polling due to server API efficiency concerns, WebSocket subscriptions offer real-time updates without the overhead of repeated HTTP requests.

This document describes how to connect to Vexa's WebSocket API for real-time meeting transcription. The protocol supports subscribing to active meetings and receiving live transcript updates with proper deduplication and speaker grouping.

Implementation Reference: The testing/ws_realtime_transcription.py script serves as a complete Python implementation of real-time transcript rendering using this WebSocket protocol. It demonstrates the full algorithm from REST bootstrap through WebSocket updates with proper deduplication, speaker grouping, and live terminal rendering.

Prerequisites: The meeting bot must already be running and active for the target meeting.

Starting a Bot (if not already running)

To start a transcription bot for a meeting:

POST /bots
Headers: X-API-Key: YOUR_API_KEY
Body: {
  "platform": "google_meet",
  "native_meeting_id": "your-meeting-id"
}

# For Microsoft Teams (requires passcode):
Body: {
  "platform": "teams",
  "native_meeting_id": "9387167464734",
  "passcode": "qxJanYOcdjN4d6UlGa"
}

Connection Details

WebSocket URL

Derive the WebSocket URL from your API base URL:

https://api.example.com → wss://api.example.com/ws
http://localhost:18056 → ws://localhost:18056/ws

Authentication

Authentication is performed using the X-API-Key header:

Headers: X-API-Key: YOUR_API_KEY

Meeting Identity

Meetings are identified by platform and native meeting ID:

{
  "platform": "google_meet",
  "native_id": "kzj-grsa-cqf"
}

Supported platforms: google_meet, teams

REST API Bootstrap

Before connecting to WebSocket, fetch the last full transcript via REST API:

GET /transcripts/{platform}/{native_id}[?meeting_id=...]
Headers: X-API-Key: YOUR_API_KEY

Response Format:

{
  "segments": [
    {
      "text": "Hello everyone",
      "speaker": "John",
      "absolute_start_time": "2025-01-15T10:30:00Z",
      "absolute_end_time": "2025-01-15T10:30:03Z"
    }
  ]
}

WebSocket Protocol

Subscription

Send subscription message after connecting:

{
  "action": "subscribe",
  "meetings": [
    {
      "platform": "google_meet",
      "native_id": "kzj-grsa-cqf"
    }
  ]
}

Fields:

action: Always "subscribe"
meetings: Array of meeting objects with platform and native_id

Message Types

`transcript.mutable`

Live transcript segments that may be updated.

{
  "type": "transcript.mutable",
  "meeting": {"platform": "google_meet", "native_id": "kzj-grsa-cqf"},
  "payload": {
    "segments": [
      {
        "text": "This text may change",
        "speaker": "John",
        "language": "en",
        "session_uid": "abc123-456-def",
        "speaker_mapping_status": "NO_SPEAKER_EVENTS",
        "start": 1234.567,
        "end_time": 1237.890,
        "absolute_start_time": "2025-01-15T10:30:05Z",
        "absolute_end_time": "2025-01-15T10:30:08Z",
        "updated_at": "2025-01-15T10:30:08Z"
      }
    ]
  },
  "ts": "2025-01-15T10:30:08Z"
}

Note: Additional fields like session_uid, speaker_mapping_status, and relative timing (start, end_time) may be present but are not required for basic transcript processing.

`transcript.finalized`

Finalized transcript segments that won't change.


**Note**: `transcript.finalized` messages are currently not used by the algorithm and are ignored. Only `transcript.mutable` messages are processed for live transcript updates.

#### `meeting.status`
Meeting status updates.

```json
{
  "type": "meeting.status",
  "meeting": {"platform": "google_meet", "native_id": "kzj-grsa-cqf"},
  "payload": {
    "status": "active"
  },
  "ts": "2025-01-15T10:30:00Z"
}

Status Values: requested, joining, awaiting_admission, connecting, active, stopping, completed, failed

`subscribed`

Confirmation of successful subscription.

{
  "type": "subscribed",
  "meetings": [1, 2, 3]
}

`pong`

Response to ping messages.

{
  "type": "pong"
}

`error`

Error messages.

{
  "type": "error",
  "error": "Invalid meeting ID"
}

Segment Schema

Minimum fields to consume:

Field	Type	Required	Description
`text`	string	Yes	Transcript text content
`speaker`	string	No	Speaker identifier
`language`	string	No	Language code (e.g., "en", "es")
`absolute_start_time`	string	Yes	UTC timestamp (ISO 8601)
`absolute_end_time`	string	Yes	UTC timestamp (ISO 8601)
`updated_at`	string	No	Last update timestamp

Algorithm

Implemented in testing/ws_realtime_transcription.py

1. Bootstrap

Fetch initial transcript via REST API
Seed in-memory map keyed by absolute_start_time
Ignore segments missing absolute_start_time for ordering

transcript_by_abs_start = {}
for segment in rest_segments:
    if segment.get('absolute_start_time'):
        transcript_by_abs_start[segment['absolute_start_time']] = segment

2. WebSocket Updates

For each transcript.mutable message:

For every segment with absolute_start_time:
- Upsert into map by key
- If updated_at exists on both existing and incoming, keep the newer (updated_at max)
- Discard segments with empty/whitespace-only text

for segment in ws_segments:
    abs_start = segment.get('absolute_start_time')
    if not abs_start or not segment.get('text', '').strip():
        continue
    
    existing = transcript_by_abs_start.get(abs_start)
    if existing and existing.get('updated_at') and segment.get('updated_at'):
        if segment['updated_at'] < existing['updated_at']:
            continue  # Keep existing (newer)
    
    transcript_by_abs_start[abs_start] = segment

3. Rendering Order

Sort by absolute_start_time ascending:

sorted_segments = sorted(
    transcript_by_abs_start.values(),
    key=lambda s: s['absolute_start_time']
)

4. Speaker Merging

Group consecutive segments by same speaker:

def group_by_speaker(segments):
    groups = []
    current_group = None
    
    for segment in segments:
        speaker = segment.get('speaker', 'Unknown')
        if current_group and current_group['speaker'] == speaker:
            current_group['text'] += ' ' + segment['text']
            current_group['end_time'] = segment['absolute_end_time']
        else:
            if current_group:
                groups.append(current_group)
            current_group = {
                'speaker': speaker,
                'text': segment['text'],
                'start_time': segment['absolute_start_time'],
                'end_time': segment['absolute_end_time']
            }
    
    if current_group:
        groups.append(current_group)
    return groups

5. Rendering Strategy

For maximum readability, re-render the entire transcript on every update:

def render_full_transcript():
    # Clear screen and move cursor to top
    print('\033[H\033[J', end='')
    
    # Render header
    print("=" * 60)
    print("📝 LIVE TRANSCRIPT")
    print("=" * 60)
    
    # Get sorted segments and group by speaker
    sorted_segments = sorted(transcript_by_abs_start.values(), key=lambda s: s['absolute_start_time'])
    groups = group_by_speaker(sorted_segments)
    
    # Render all groups
    for group in groups:
        start_time = format_time(group['start_time'])
        end_time = format_time(group['end_time'])
        speaker = group['speaker']
        text = clean_text(group['text'])
        print(f"[{start_time} - {end_time}] {speaker}: {text}")

ANSI Control Sequences:

\033[H: Move cursor to home position (top-left)
\033[J: Clear screen from cursor to end
end='': Suppress newline for immediate effect

This ensures the terminal always shows a clean, complete transcript without duplicate or stale lines.

Keepalive

Client may send ping messages:

{
  "action": "ping"
}

Server responds with pong. Recommended ping interval: 25 seconds.

Error Handling

Log error messages but continue processing
Handle connection drops gracefully
Reconnect and resubscribe as needed
Idempotent merging preserves order on reconnection

Environment Variables

export API_BASE="http://localhost:18056"
export WS_URL="ws://localhost:18056/ws"
export API_KEY="your_api_key_here"

Example Usage

See the real-time transcription script for a complete implementation:

# Basic usage
python testing/ws_realtime_transcription.py \
  --api-base http://localhost:18056 \
  --ws-url ws://localhost:18056/ws \
  --api-key $API_KEY \
  --platform google_meet \
  --native-id kzj-grsa-cqf

# Debug mode (show raw frames)
python testing/ws_realtime_transcription.py \
  --api-base http://localhost:18056 \
  --ws-url ws://localhost:18056/ws \
  --api-key $API_KEY \
  --platform google_meet \
  --native-id kzj-grsa-cqf \
  --raw

The real-time transcription script implements the exact algorithm described above and renders a live, grouped transcript in the terminal. It demonstrates the complete flow from REST bootstrap through WebSocket updates with proper deduplication and speaker merging.

Complete Implementation

The real-time transcription script (testing/ws_realtime_transcription.py) serves as a complete reference implementation of this WebSocket protocol. It demonstrates:

REST API Bootstrap: Fetching initial transcript data
WebSocket Connection: Proper authentication and subscription
Message Processing: Handling all WebSocket event types
Data Deduplication: Merging segments by absolute_start_time with updated_at precedence
Speaker Grouping: Combining consecutive segments by speaker
Live Rendering: Full re-render strategy with ANSI escape codes
Error Handling: Graceful handling of connection issues

The script includes comprehensive comments explaining each step of the algorithm, making it a valuable reference for implementing real-time WebSocket transcription clients in other languages.

Raw Debug Mode

Use the --raw flag to debug WebSocket message flow:

Display raw JSON frames in terminal with RAW: prefix
Log all messages to testing/logs/ws_raw.log

Log file location: testing/logs/ws_raw.log (single file, appends all runs)

Example log file line:

2025-10-04T14:50:35.101823 - {"type": "transcript.mutable", "meeting": {"platform": "google_meet", "native_id": "tys-tztv-nrj"}, "payload": {"segments": [...]}, "ts": "2025-10-04T11:50:35.100142+00:00"}

Use these logs to verify message structure, timing, and payload formats for your implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vexa WebSocket Usage

Overview

Starting a Bot (if not already running)

Connection Details

WebSocket URL

Authentication

Meeting Identity

REST API Bootstrap

WebSocket Protocol

Subscription

Message Types

`transcript.mutable`

`transcript.finalized`

`subscribed`

`pong`

`error`

Segment Schema

Algorithm

1. Bootstrap

2. WebSocket Updates

3. Rendering Order

4. Speaker Merging

5. Rendering Strategy

Keepalive

Error Handling

Environment Variables

Example Usage

Complete Implementation

Raw Debug Mode

FilesExpand file tree

websocket.md

Latest commit

History

websocket.md

File metadata and controls

Vexa WebSocket Usage

Overview

Starting a Bot (if not already running)

Connection Details

WebSocket URL

Authentication

Meeting Identity

REST API Bootstrap

WebSocket Protocol

Subscription

Message Types

transcript.mutable

transcript.finalized

subscribed

pong

error

Segment Schema

Algorithm

1. Bootstrap

2. WebSocket Updates

3. Rendering Order

4. Speaker Merging

5. Rendering Strategy

Keepalive

Error Handling

Environment Variables

Example Usage

Complete Implementation

Raw Debug Mode

`transcript.mutable`

`transcript.finalized`

`subscribed`

`pong`

`error`