Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
# Copy this file to `.env` and fill in real values.
# `.env` is gitignored and never committed.

# Google AI Studio key (https://aistudio.google.com/app/apikey).
# Used by the orchestrator when perception.vlm.provider == "gemini".
GEMINI_API_KEY=

# IP of the Raspberry Pi running the relay + sensor publisher. Compose
# substitutes this into extra_hosts so containers resolve `drone.local`
# to the Pi without depending on the host's DNS / mDNS.
Expand Down
6 changes: 3 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ find_package(MAVSDK REQUIRED)
find_package(Boost REQUIRED COMPONENTS system)
find_package(Threads REQUIRED)
# Wide-char ncurses (libncursesw) is required so the TUI renders any UTF-8
# byte sequence — including Gemini-produced text in the orchestrator status
# row — as proper glyphs instead of raw multi-byte garbage. setlocale() in
# main() activates the byte-to-wchar pipeline.
# byte sequence — including VLA-emitted thought text in the orchestrator
# status row — as proper glyphs instead of raw multi-byte garbage.
# setlocale() in main() activates the byte-to-wchar pipeline.
set(CURSES_NEED_WIDE TRUE)
find_package(Curses REQUIRED)
find_package(yaml-cpp REQUIRED)
Expand Down
42 changes: 21 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Per-machine values (secrets and the Pi's IP) live in `.env`. Copy the example an

```bash
cp .env.example .env
# Edit .env: set GEMINI_API_KEY and DRONE_PI_IP
# Edit .env: set DRONE_PI_IP
```

### Pi setup
Expand Down Expand Up @@ -144,7 +144,7 @@ Different mechanism (the publisher's source isn't on the GCS, so the build conte
| Service | Source | Role |
|---|---|---|
| `perception` | [`python/services/perception/`](python/services/perception/) | Subscribes to the sensor publisher's RGB+depth streams, runs YOLO11 on each frame, publishes a scene graph (label + bbox + camera-frame xyz) at `perception.tick_hz`. |
| `orchestrator` | [`python/services/orchestrator/`](python/services/orchestrator/) | Pulls voice commands from the STT service, fuses them with the latest scene + telemetry + RGB frame, calls the Gemini 2.5 Flash VLM for a tool-call decision, dispatches the calls to the flight bridge over REQ/REP. Owns the mission state machine and re-prompt loop. |
| `orchestrator` | [`python/services/orchestrator/`](python/services/orchestrator/) | Pulls voice commands from the STT service, holds the operator's current instruction, and runs a continuous control loop: each tick it hands the latest RGB frame + telemetry + instruction to a VLA backend, gets back at most one tool call, and dispatches it to the flight bridge over REQ/REP. The VLA backend is pluggable (see [`python/services/orchestrator/vla.py`](python/services/orchestrator/vla.py)); default is a no-op until a model is wired in. |
| `flight-bridge` (C++) | [`src/flight_bridge/`](src/flight_bridge/) | The system's only MAVSDK consumer. Enforces the non-overridable safety envelope ([`include/lexaire/safety.hpp`](include/lexaire/safety.hpp)) below the tool-call layer; runs the heartbeat-loss watchdog (auto RTL/HOLD); publishes telemetry + QGC liveness on the PUB stream the TUI reads. |
| `stt` | [`python/services/stt/`](python/services/stt/) | Voice command source. Modes: text-input via stdin / `--once` / `--from-file`, or `--audio-file` for pre-recorded WAV (uses `faster-whisper`). Mic capture is a follow-up. Profile-gated: `docker compose --profile tools run --rm stt --once "land"`. |
| `replay` | [`python/services/replay/`](python/services/replay/) | Field-debug tool: SUBs the live sensor channels and writes a JSONL recording (`record`), or replays one back as PUBs (`play`). Profile-gated. |
Expand All @@ -157,19 +157,19 @@ The sensor publisher (default: [`RS-L515-Docker`](https://github.com/9LogM/RS-L5
┌──────────────────────────────────────────────────────────────────────┐
│ │
│ STT ──── PUSH ────► orchestrator ──── REQ/REP ────► flight-bridge │
│ (voice) ▲ │ │
│ │ MAVSDK │
│ scene telemetry ▼ │
│ autopilot │
│ PUB PUB │
│ │
│ │
Gemini │
│ (RGB)│ │ │
│ L515 ──► perception ──────┘ │
│ flight-bridge ──────────────────
│ (voice) │ │
│ MAVSDK │
telemetry ▼ │
autopilot │
PUB
VLA │
│ (RGB+lang)│ │
│ L515 ──► perception ──────────┘
│ │
│ flight-bridge ──────────────────────────────────────────────────────
│ │
└──────────────────────────────────────────────────────────────────────┘
```
Expand All @@ -180,23 +180,23 @@ Project-shared defaults live in [`common/config.yaml`](common/config.yaml); secr

- `sensor.publisher_repo` — URL of the docker-based ZMQ sensor publisher repo. The TUI auto-clones it on the Pi and keeps it in sync with origin (default L515).
- `sensor.channels.{rgb,depth,imu,infrared,confidence}` — ZMQ endpoints the publisher exposes. Each can be left blank to disable that stream; perception/orchestrator require `rgb` and `depth` and fail at startup if either is blank, replay subscribes to whichever are non-empty.
- `perception.vlm.{provider,model,api_key_env,temperature}` — currently `gemini` with `gemini-2.5-flash`. Requires `GEMINI_API_KEY` in `.env`.
- `perception.detector.{model,weights,score_threshold,...}` — YOLO11; `yolo11n.pt` is auto-downloaded on first run.
- `perception.vla.{model_path,device}` — VLA backend to run inside the orchestrator. `model_path` left blank gives a no-op backend (control loop runs, no actions emitted) so the stack boots without a model. Wire your chosen VLA in [`python/services/orchestrator/vla.py`](python/services/orchestrator/vla.py).
- `perception.detector.{model,weights,score_threshold,...}` — YOLO11; `yolo11n.pt` is auto-downloaded on first run. Detector output is observability-only under VLA mode (the orchestrator does not subscribe to scene messages).
- `safety.{max_altitude_m,geofence_radius_m,max_velocity_mps,require_spoken_arm,heartbeat_loss_action,heartbeat_loss_threshold_s}` — non-overridable bridge-side gate plus heartbeat-loss recovery thresholds.
- `orchestrator.{mission_max_steps,telemetry_history_seconds}` — mission re-prompt loop cap and telemetry ring-buffer depth fed to the VLM.
- `stt.{abort_keyword,whisper_model,whisper_device,...}` — voice command settings; the abort keyword (default `"abort"`) short-circuits the VLM and goes straight to the abort tool.
- `orchestrator.{control_hz,frame_max_age_s}` — VLA tick rate and the staleness gate that turns a missing/old RGB frame into an idle tick.
- `stt.{abort_keyword,whisper_model,whisper_device,...}` — voice command settings; the abort keyword (default `"abort"`) short-circuits the VLA and goes straight to the abort tool.

`.env` (copy from `.env.example`):

- `GEMINI_API_KEY` — required when `perception.vlm.provider == "gemini"`.
- `DRONE_PI_IP` — the Pi's IP address. Compose substitutes it into `extra_hosts` so every container resolves `drone.local`.

### Roadmap

Lexaire ships in phases:

- **Phase 1** — Perception + orchestrator + flight bridge end-to-end against a desk autopilot. ✅
- **Phase 2** — First flight: multi-step missions, telemetry-aware reasoning, recovery on connection loss. See [`docs/phase-2.md`](docs/phase-2.md).
- **Phase 2** — Safety + recovery shakedown: heartbeat watchdog, frame-staleness gate, voice-arm gate, abort drains the queue, telemetry-aware safety envelope. ✅
- **VLA control loop** — Continuous-control orchestrator (RGB + held instruction → tool call at `orchestrator.control_hz`). Drop a model in [`python/services/orchestrator/vla.py`](python/services/orchestrator/vla.py).
- **Phase 3** — Live microphone capture for STT.
- **Phase 4** — RTAB-Map SLAM for persistent spatial memory ("go back to the table you saw earlier").

Expand Down
27 changes: 18 additions & 9 deletions common/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,16 @@ sensor:

perception:
tick_hz: 2
vlm:
provider: gemini # gemini
model: gemini-2.5-flash
api_key_env: GEMINI_API_KEY
temperature: 0.2
vla:
# Path / repo / model identifier. The default backend is a no-op
# until you set this — orchestrator boots, control loop runs,
# but VLA emits no actions. See vla.py for backend wiring.
model_path: null # e.g. "openvla/openvla-7b" or a local checkpoint
device: auto # auto | cpu | cuda:0
# YOLO detector is independent of the orchestrator under VLA mode.
# The orchestrator no longer subscribes to scene messages; perception
# is kept as an opt-in observability service. Set tick_hz: 0 above
# to disable.
detector:
model: yolo # yolo
weights: yolo11n.pt # n|s|m|l|x size vs. speed
Expand All @@ -70,10 +75,14 @@ safety:
heartbeat_loss_threshold_s: 2.0

orchestrator:
# Telemetry ring-buffer depth (seconds of history retained for the VLM).
telemetry_history_seconds: 5.0
# Hard cap on a mission's re-prompt loop. Bounds Gemini calls per mission.
mission_max_steps: 10
# Continuous control loop tick rate. The VLA is invoked at this rate;
# each invocation may emit at most one tool call (set_velocity_ned in
# the common case). Typical VLA models: 5-30 Hz.
control_hz: 10.0
# Frame staleness gate. The orchestrator serves rgb=None to the VLA
# if no fresh frame has arrived within this window — closed-loop
# control on a frozen frame is unsafe.
frame_max_age_s: 0.2

services:
# Endpoints bound by the named service; everyone else connects.
Expand Down
98 changes: 0 additions & 98 deletions docs/phase-2.md

This file was deleted.

2 changes: 1 addition & 1 deletion include/lexaire/safety.hpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#pragma once

// Safety envelope enforced by the flight bridge below the tool-call layer.
// Regardless of what the VLM decides, a tool call that violates the envelope
// Regardless of what the VLA decides, a tool call that violates the envelope
// is rejected.

#include <atomic>
Expand Down
Loading