Skip to content

adecarolis/AI-DX

Repository files navigation

AI-DX: Autonomous Amateur Radio Operator

An autonomous AI amateur radio operator that conducts real QSOs (radio conversations) over HF using the GPT-4o Realtime API and wfweb for radio audio and PTT control.

⚠️ Platform: Tested on macOS Apple Silicon (M4). May need adaptation for other systems.


⚠️ IMPORTANT LEGAL NOTICES

BEFORE USING THIS SOFTWARE, YOU MUST:

  1. READ THE DISCLAIMER — Contains critical safety and legal information
  2. Hold a valid amateur radio license — Required by law to operate with transmit capability
  3. Understand you are the control operator — You are legally responsible for all transmissions
  4. Review the LICENSE and NOTICE files

🔴 This software uses AI to control radio transmissions. You must actively monitor and supervise all operations at all times.

See DISCLAIMER.md for complete legal notices, liability disclaimers, and regulatory requirements.


How It Works

AI-DX uses a fully integrated audio pipeline built on GPT-4o Realtime:

wfweb WebSocket (RX audio, 48 kHz)
        │
        ▼
GPT-4o Realtime API ──── server-side VAD + STT + LLM + TTS
        │                 function calling for contact tracking
        ▼
TxBuffer (resample 24 kHz → 48 kHz, real-time pacing)
        │
        ▼
wfweb WebSocket (TX audio + PTT)
  • No local STT — transcription is handled server-side by GPT-4o Realtime
  • No local TTS — synthesis is handled server-side by GPT-4o Realtime
  • No local VAD — voice activity detection is server-side (server_vad mode)
  • No Hamlib — PTT is controlled via wfweb's {"cmd":"setPTT","value":true/false}
  • No PortAudio/sounddevice — all audio I/O goes through wfweb's WebSocket binary frames (production mode)

Contact Tracking via Function Calling

The model calls update_contact() in real time as it learns information during a QSO:

Event Tool call
Hears callsign update_contact(callsign="VK2TDX")
Learns name update_contact(name="John")
Learns QTH update_contact(qth="New South Wales")
QSO ends update_contact(closing=true) → logs to ADIF, returns to CQ

No text parsing. No regex. No heuristics.


Requirements

  • Python ≥ 3.10, < 3.14
  • OpenAI API key with access to gpt-4o-realtime-preview
  • wfweb running and accessible — production mode only (provides radio audio + PTT via WebSocket); not required for demo mode

Installation

# Clone and install dependencies
uv sync

Configuration

All settings are via environment variables or a .env file in the project root.

Required

OPENAI_API_KEY=sk-...              # OpenAI API key (GPT-4o Realtime access required)
CALLSIGN=W1AW                      # Your amateur radio callsign
WFWEB_URL=wss://192.168.x.x:8080  # wfweb WebSocket URL — production only (self-signed SSL accepted)

Station Info

YOUR_NAME=Hiram                    # Your name (spoken during QSOs)
LOCATION="Newington, CT"           # Your QTH
ANTENNA="Dipole"                   # Antenna description
POWER="100W"                       # Power output
TRANSCEIVER="IC-7300"              # Rig name

GPT-4o Realtime Settings

REALTIME_MODEL=gpt-4o-realtime-preview   # Model (default: gpt-4o-realtime-preview)
REALTIME_VOICE=ash                        # Voice: alloy, ash, ballad, coral, echo, sage, shimmer, verse

Model choice: gpt-4o-realtime-preview is the recommended default. It has significantly better audio comprehension than gpt-realtime-1.5 — in particular it handles weak HF signals, phonetic alphabet, and partial callsigns more accurately. gpt-realtime-1.5 is available as a lower-cost fallback but noticeably underperforms in noisy radio conditions.

Operator Style

OPERATOR_STYLE=CALLING_CQ   # CALLING_CQ | CONTESTING | MONITORING | SWL
Style Behaviour
CALLING_CQ Calls CQ periodically, engages in casual QSOs
CONTESTING Rapid serial-number exchanges, optimised for contest protocol
MONITORING Listens for direct calls, never initiates. IDs every 5 minutes
SWL Receive-only. No transmit under any circumstances

Tuning

VAD_THRESHOLD=0.5            # Server VAD speech probability threshold (0.0–1.0)
VAD_SILENCE_DURATION=0.6     # Seconds of silence to end a turn
CQ_INTERVAL_SEC=30           # Seconds between CQ calls
CQ_RESTART_DELAY_SEC=5       # Delay before restarting CQ after a QSO
WFWEB_CONNECT_TIMEOUT=15     # wfweb connection timeout in seconds

General

LOG_LEVEL=INFO               # DEBUG | INFO | WARNING | ERROR

Example .env

OPENAI_API_KEY=sk-...
WFWEB_URL=wss://192.168.1.10:8080

CALLSIGN=W1AW
YOUR_NAME=Hiram
LOCATION="Newington, CT"
ANTENNA="Dipole, 40m"
POWER=100W
TRANSCEIVER="IC-7300"

OPERATOR_STYLE=CALLING_CQ
REALTIME_VOICE=ash

LOG_LEVEL=INFO

Usage

# Normal operation (wfweb radio connection required)
uv run python radio_operator.py

# Demo mode — no radio hardware needed; uses your mic and speakers
uv run python radio_operator.py --demo
uv run python radio_operator.py -d

# Play RX and TX audio locally through your computer's speakers (production mode)
uv run python radio_operator.py --monitor-audio

# Suppress the terminal UI (log to console instead)
uv run python radio_operator.py --no-ui

Demo mode

--demo runs the full operator without any radio hardware:

  • Mic input → GPT-4o Realtime (server VAD detects when you speak)
  • Speaker output → GPT-4o Realtime audio played locally
  • Fake frequency — 14.225 MHz (20m USB) shown in the UI
  • Fake TX meters — 50 W / SWR 1.3 shown while the model is transmitting
  • S-meter driven by actual mic RMS — rises when you speak
  • Isolated logs — timestamped logs/demo_YYYYMMDD_HHMMSS.log and .adi files so production logs are never touched
  • UI — full Rich terminal UI with a ⬡ DEMO badge in the header

Terminal UI

AI-DX includes an HF radio-themed terminal UI that updates at 10 FPS:

● W1AW   Hiram · Newington, CT        14.225.000 MHz   00:42:15   RX:5  TX:12
┌──────────┬──────────────────────────────────────────────────────┬────────────┐
│          │  S  1    3    5    7    9   +20  +40                 │  USB       │
│   RX     │  ████████████████████████░░░░░░░░░░░░  S7           │            │
│          │  ▶ SIGNAL DETECTED                                   │ 14.225.000 │
│          │                                                      │    MHz     │
└──────────┴──────────────────────────────────────────────────────┴────────────┘
│ [14:32:01] ► TX  CQ CQ CQ de W1AW W1AW, QRZ?
│ [14:32:05] ◄ RX  W1AW this is VK2TDX, good afternoon from New South Wales...
│ [14:32:09] ► TX  Good afternoon VK2TDX, you are 59 here in Newington, Connecticut...
└─ ◆  IN QSO  ─  VK2TDX  ─  John  ─  New South Wales ─────────────────────────
  • PTT indicator — RX / VOICE↑ / ON AIR
  • S-meter — live signal strength (S0–S9+60 dB) from wfweb; in demo mode driven by mic RMS
  • TX meters — when transmitting: power (watts) + SWR bars; in demo mode shows 50 W / 1.3 SWR
  • Mode — USB / LSB / CW / AM / FM from wfweb
  • Frequency — VFO frequency from wfweb (14.225 MHz fixed in demo mode)
  • Communications log — full RX/TX transcripts, newest first
  • QSO bar — current contact: state, callsign, name, QTH
  • Demo badge⬡ DEMO shown in the header when running with --demo

In production, meter data (S-meter, power, SWR, mode) is streamed from wfweb status messages in near real time.


wfweb Protocol Notes

wfweb communicates over a browser-style WebSocket:

Direction Format Purpose
Client → Server {"cmd":"getStatus"} Request rig info
Client → Server {"cmd":"enableAudio","value":true} Start RX audio stream
Client → Server {"cmd":"setPTT","value":true/false} PTT on/off
Server → Client Binary frame 0x02 … RX PCM16 audio (48 kHz)
Client → Server Binary frame 0x03 … TX PCM16 audio (48 kHz)
Server → Client {"type":"status", "frequency":…, "mode":…, "sMeter":…, …} Radio state
Server → Client {"type":"meters", "sMeter":…, "powerMeter":…, "swrMeter":…} Meter updates

Meter value formats (as sent by wfweb):

  • sMeter — dB relative to S9 (−54 = S0, 0 = S9, +60 = S9+60 dB)
  • powerMeter — watts (0–100+)
  • swrMeter — actual SWR ratio (1.0 = perfect, 2.0 = 2:1, etc.)

Project Structure

ai-dx/
├── radio_operator.py          # Main application — QSO state machine, wfweb callbacks,
│                              #   GPT-4o Realtime session, function call handler, TxBuffer
├── ai/
│   └── realtime_client.py     # GPT-4o Realtime WebSocket session (asyncio in background thread)
├── audio/
│   └── wfweb_client.py        # wfweb browser WebSocket client (RX audio, TX audio, PTT)
├── core/
│   ├── config.py              # AppConfig, AudioConfig, RadioConfig, WfwebConfig, RealtimeConfig
│   ├── adif_logger.py         # ADIF QSO log writer
│   ├── band_utils.py          # Frequency → band name mapping
│   ├── operator_profiles.py   # System prompts per operator style
│   └── operator_profiles_base.py  # Shared prompt building blocks
├── ui/
│   └── radio_ui.py            # HF radio-themed Rich terminal UI (10 FPS)
├── logs/                      # Runtime logs and contacts.adi ADIF log
└── test_tools/                # Manual test utilities

QSO Flow

Start
  │
  ▼
CALLING_CQ ──── send CQ every N seconds (ephemeral context, no memory)
  │
  │  model calls update_contact(callsign="VK2TDX")
  ▼
IN_QSO ──── full QSO exchange; update_contact() fills in name, QTH, notes
  │
  │  model calls update_contact(closing=true) on final goodbye
  ▼
QSO_ENDED ──── contact logged to ADIF, brief pause
  │
  └──► CALLING_CQ (loop)

Station skip detection prevents looping conversations with the same station or detecting response hijacking via string similarity.


ADIF Logging

Contacts are logged in standard ADIF format. In production, the log file is logs/contacts.adi. In demo mode (--demo), a separate timestamped file logs/demo_YYYYMMDD_HHMMSS.adi is created per session so production logs are never affected.

Each record captures:

  • QSO_DATE, TIME_ON — UTC date and time
  • CALL — remote callsign
  • FREQ — frequency in MHz (live from wfweb at time of logging)
  • MODE — operating mode
  • RST_SENT, RST_RCVD — signal reports
  • NAME, QTH, NOTES — filled by the model via function calls during the QSO

Troubleshooting

wfweb connection fails

  • Verify WFWEB_URL points to the correct host/port
  • wfweb uses a self-signed TLS certificate — this is handled automatically
  • Check that wfweb is running and the radio is connected

Operator goes silent after hearing callsign

  • Ensure your OpenAI key has GPT-4o Realtime access
  • Check logs for WebSocket errors

Wrong frequency in ADIF log

  • Frequency is fetched live from wfweb at QSO close time; verify wfweb reports the correct VFO frequency

S-meter / power / SWR not moving

  • Confirm wfweb is sending status or meters messages (check DEBUG logs)
  • Meter data is only shown when wfweb provides it; no local fallback

License

MIT License — see LICENSE.

Note: Previous versions of this project used Hamlib (LGPL v2.1). The current version does not link to Hamlib. See NOTICE for historical third-party license information.


73!

Good DX and happy QSOs! 📻

About

An AI-powered autonomous amateur radio operator that conducts QSOs using speech recognition and natural language processing.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages