Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 25 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,28 @@ jobs:
run: ruff check src tests

- name: Tests
run: pytest tests -v
run: pytest tests -v -m "not functional"

functional:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Install dependencies
run: pip install -e ".[dev]"

- name: Set up QEMU (aarch64 emulation)
uses: docker/setup-qemu-action@v3

- name: Install Podman
uses: redhat-actions/podman-install@main
with:
ubuntu-version: "24.04"

- name: Functional tests
run: pytest tests -v -m functional
56 changes: 48 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,17 @@ When working on embedded boards (like Yocto or Buildroot builds), you often face

## How it Works

1. **The Host Server:** When you run `sshq`, it spins up a lightweight local web server in the background on your laptop. This server holds your API key and talks to Groq if `GROQ_API_KEY` is set, otherwise to Gemini.
1. **The Host Server:** When you run `sshq`, it spins up a lightweight local web server in the background on your laptop. This server talks to your chosen AI backend: local (RamaLama/Ollama) if `SSHQ_USE_LOCAL=1`, else Groq if `GROQ_API_KEY` is set, otherwise Gemini.
2. **The Reverse Tunnel:** `sshq` wraps your standard `ssh` command and adds a reverse port forward to a random local port, creating a secure tunnel from the board back to your laptop.
3. **Transparent Injection:** During login, `sshq` passes a Python one-liner to the board (the `q` client script) and drops it into `~/.local/bin/q`, and immediately hands you an interactive shell.

## Prerequisites

* Python 3.9 or higher (on your host machine).
* An API key for at least one supported AI provider:
* **Groq** (free tier): get a key from [Groq Console](https://console.groq.com/). If set, `GROQ_API_KEY` is used first.
* **Gemini** (default otherwise): get a key from [Google AI Studio](https://aistudio.google.com/) or Google Cloud console.
* An AI backend (choose one):
* **Local model** (RamaLama, Ollama, llama.cpp): set `SSHQ_USE_LOCAL=1` and run a local OpenAI-compatible server (see [Local / RamaLama](#local--ramalama) below).
* **Groq** (free tier): get a key from [Groq Console](https://console.groq.com/). If set, `GROQ_API_KEY` is used when local is not enabled.
* **Gemini**: get a key from [Google AI Studio](https://aistudio.google.com/) or Google Cloud console (used when neither local nor Groq is set).
* Python 3 installed on the target embedded board (standard library only; no external packages required).

## Installation
Expand All @@ -39,15 +40,23 @@ pip install git+https://github.com/pridolfi/sshq.git
(Note: You can also clone the repo and use `pip install -e .` if you plan to modify the code).

## Usage
1. Export your API key in your terminal (or add it to your `~/.bashrc` / `~/.zshrc`). If `GROQ_API_KEY` is set it is used; otherwise `GEMINI_API_KEY` is required.
1. Configure your AI backend (see [Environment variables](#environment-variables)). For cloud backends, export the API key; for local, set `SSHQ_USE_LOCAL=1` and ensure a local server is running.

**Local (RamaLama / Ollama / llama.cpp):**

```bash
export SSHQ_USE_LOCAL=1
# Optional: export SSHQ_LOCAL_BASE_URL="http://127.0.0.1:8080/v1"
# Optional: export SSHQ_LOCAL_MODEL="tinyllama"
```

**Groq** (free tier):

```bash
export GROQ_API_KEY="your_groq_api_key_here"
```

**Gemini** (used when `GROQ_API_KEY` is not set):
**Gemini** (used when local and Groq are not set):

```bash
export GEMINI_API_KEY="your_gemini_api_key_here"
Expand Down Expand Up @@ -123,11 +132,42 @@ CPU Features:
- asimddp: Advanced SIMD Dot Product - SIMD instructions for dot product operations, useful for machine learning workloads.
```

## Local / RamaLama

You can run inference entirely on your machine using [RamaLama](https://ramalama.ai/) (or any OpenAI-compatible server like Ollama or llama.cpp). No API keys are required.

**Managed mode (no extra commands):** If you set `SSHQ_USE_LOCAL=1` and do **not** set `SSHQ_LOCAL_BASE_URL`, sshq will start RamaLama automatically when you connect (on port 8080, or `SSHQ_RAMALAMA_PORT`) and stop it when you disconnect. Install RamaLama once, then just run sshq:

```bash
curl -fsSL https://ramalama.ai/install.sh | bash
export SSHQ_USE_LOCAL=1
sshq root@192.168.1.100
```

The first connection may take a minute while the model loads. If port 8080 is already in use (e.g. you started `ramalama serve` yourself), sshq uses that server and does not stop it on exit.

**Manual mode:** To run RamaLama yourself and have sshq only connect to it, set both:

```bash
export SSHQ_USE_LOCAL=1
export SSHQ_LOCAL_BASE_URL="http://127.0.0.1:8080/v1"
# Optional: SSHQ_LOCAL_MODEL defaults to llama3.2:1b.
sshq root@192.168.1.100
```

**Behavior with small local models:** Command responses are post-processed: markdown code blocks (e.g. `\`\`\`bash ... \`\`\``) are stripped and only the single shell command is returned. Analysis replies are limited in length to reduce repetitive run-on output. For best results, use a 1B+ instruction-tuned model (e.g. Llama 3.2 1B Instruct, SmolLM2, Phi-2) when your hardware allows.

The local backend uses the same OpenAI `chat/completions` API that RamaLama’s default llama.cpp server exposes, so no extra dependencies are needed beyond the existing `openai` package.

## Environment variables

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `GROQ_API_KEY` | No (tried first) | — | Your Groq API key (free at [console.groq.com](https://console.groq.com)). If set, Groq is used. |
| `GEMINI_API_KEY` | Yes (if Groq not set) | — | Your Gemini API key. |
| `SSHQ_USE_LOCAL` | No | — | Set to `1`, `true`, or `yes` to use a local OpenAI-compatible server (e.g. RamaLama). |
| `SSHQ_LOCAL_BASE_URL` | No | — | If unset with `SSHQ_USE_LOCAL=1`, sshq starts and stops RamaLama for you. If set, sshq uses this URL and does not manage the server. |
| `SSHQ_RAMALAMA_PORT` | No | `8080` | Port for sshq-managed RamaLama (only when `SSHQ_LOCAL_BASE_URL` is unset). |
| `SSHQ_LOCAL_MODEL` | No | `llama3.2:1b` | Model to serve (managed mode) or name for API (manual mode). Ollama uses colon e.g. `llama3.2:1b`. |
| `GROQ_API_KEY` | No (after local) | — | Your Groq API key (free at [console.groq.com](https://console.groq.com)). If set and local not used, Groq is used. |
| `GEMINI_API_KEY` | Yes (if neither local nor Groq) | — | Your Gemini API key. |
| `SSHQ_GEMINI_MODEL` | No | `gemini-2.5-flash` | Gemini model (e.g. `gemini-2.5-flash-lite` for higher quota). |
| `SSHQ_GROQ_MODEL` | No | `llama-3.3-70b-versatile` | Groq model (e.g. `llama-3.1-8b-instant` for faster replies). |
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@ sshq = "sshq.cli:main"

[tool.pytest.ini_options]
testpaths = ["tests"]
markers = [
"functional: marks tests as functional (require Podman, aarch64 container)",
]
filterwarnings = [
"ignore::DeprecationWarning:google.genai.*",
]
Expand Down
63 changes: 49 additions & 14 deletions src/sshq/backends.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,28 @@
"""AI provider backends for sshq. Each backend implements generate(prompt, system_instruction, temperature)."""
"""AI provider backends for sshq. Each backend implements generate(prompt, system_instruction, temperature, max_tokens)."""
import os


def _gemini_generate(prompt, system_instruction, temperature=0.0):
def _gemini_generate(prompt, system_instruction, temperature=0.0, max_tokens=None):
from google import genai
from google.genai import types

client = genai.Client()
model = os.environ.get("SSHQ_GEMINI_MODEL", "gemini-2.5-flash")
config = types.GenerateContentConfig(
system_instruction=system_instruction,
temperature=temperature,
)
if max_tokens is not None:
config.max_output_tokens = max_tokens
response = client.models.generate_content(
model=model,
contents=prompt,
config=types.GenerateContentConfig(
system_instruction=system_instruction,
temperature=temperature,
),
config=config,
)
return response.text.strip()


def _groq_generate(prompt, system_instruction, temperature=0.0):
def _groq_generate(prompt, system_instruction, temperature=0.0, max_tokens=None):
from openai import OpenAI

client = OpenAI(
Expand All @@ -29,23 +32,55 @@ def _groq_generate(prompt, system_instruction, temperature=0.0):
model = os.environ.get("SSHQ_GROQ_MODEL", "llama-3.3-70b-versatile")
# Groq converts temperature=0 to 1e-8; use a tiny value for deterministic output
t = max(1e-8, temperature)
response = client.chat.completions.create(
model=model,
messages=[
kwargs = {
"model": model,
"messages": [
{"role": "system", "content": system_instruction},
{"role": "user", "content": prompt},
],
temperature=t,
)
"temperature": t,
}
if max_tokens is not None:
kwargs["max_tokens"] = max_tokens
response = client.chat.completions.create(**kwargs)
return (response.choices[0].message.content or "").strip()


def _local_generate(prompt, system_instruction, temperature=0.0, max_tokens=None):
"""OpenAI-compatible local server (e.g. RamaLama, Ollama, llama.cpp)."""
from openai import OpenAI

base_url = os.environ.get("SSHQ_LOCAL_BASE_URL", "http://127.0.0.1:8080/v1")
if not base_url.endswith("/v1"):
base_url = base_url.rstrip("/") + "/v1"
model = os.environ.get("SSHQ_LOCAL_MODEL", "llama3.2:1b")
# Many local servers treat 0 as deterministic; use small value if needed
t = max(1e-8, temperature)
client = OpenAI(base_url=base_url, api_key=os.environ.get("SSHQ_LOCAL_API_KEY") or "not-used")
kwargs = {
"model": model,
"messages": [
{"role": "system", "content": system_instruction},
{"role": "user", "content": prompt},
],
"temperature": t,
}
if max_tokens is not None:
kwargs["max_tokens"] = max_tokens
response = client.chat.completions.create(**kwargs)
return (response.choices[0].message.content or "").strip()


def get_backend():
"""Return the active backend function (prompt, system_instruction, temperature=0.0) -> str.
Uses Groq if GROQ_API_KEY is set, otherwise Gemini (requires GEMINI_API_KEY).
Priority: SSHQ_USE_LOCAL (RamaLama/Ollama/etc) > GROQ_API_KEY > GEMINI_API_KEY.
"""
if os.environ.get("SSHQ_USE_LOCAL", "").lower() in ("1", "true", "yes"):
return _local_generate
if os.environ.get("GROQ_API_KEY"):
return _groq_generate
if os.environ.get("GEMINI_API_KEY"):
return _gemini_generate
raise ValueError("Set GROQ_API_KEY or GEMINI_API_KEY.")
raise ValueError(
"Set SSHQ_USE_LOCAL=1 for local (RamaLama/Ollama), or GROQ_API_KEY, or GEMINI_API_KEY."
)
69 changes: 67 additions & 2 deletions src/sshq/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,61 @@
import subprocess
import threading
import base64
import time
from importlib.metadata import version
from .server import start_server

# Container name for sshq-managed RamaLama (so we can stop it on exit)
RAMALAMA_CONTAINER_NAME = "sshq-ramalama"


def _is_port_in_use(host: str, port: int) -> bool:
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
try:
s.settimeout(0.5)
s.connect((host, port))
return True
except (OSError, socket.error):
return False


def _wait_for_port(host: str, port: int, timeout_sec: float = 120, interval_sec: float = 1.0) -> bool:
deadline = time.monotonic() + timeout_sec
while time.monotonic() < deadline:
if _is_port_in_use(host, port):
return True
time.sleep(interval_sec)
return False


def _start_ramalama(port: int, model: str) -> bool:
"""Start RamaLama serve in the background. Return True if we started it, False if port was in use."""
if _is_port_in_use("127.0.0.1", port):
return False
try:
subprocess.run(
["ramalama", "serve", "-d", "-p", str(port), "--name", RAMALAMA_CONTAINER_NAME, model],
check=True,
capture_output=True,
timeout=300,
)
except FileNotFoundError:
print("Error: 'ramalama' not found. Install it from https://ramalama.ai/ or set SSHQ_LOCAL_BASE_URL to an existing server.", file=sys.stderr)
print("To install RamaLama, run: curl -fsSL https://ramalama.ai/install.sh | bash", file=sys.stderr)
sys.exit(1)
except subprocess.CalledProcessError as e:
print(f"Error: ramalama serve failed: {e.stderr.decode() if e.stderr else e}", file=sys.stderr)
sys.exit(1)
if not _wait_for_port("127.0.0.1", port):
print("Error: RamaLama server did not become ready in time. Try running 'ramalama serve -d -p {} {}' manually.".format(port, model), file=sys.stderr)
subprocess.run(["ramalama", "stop", RAMALAMA_CONTAINER_NAME], capture_output=True)
sys.exit(1)
return True


def _stop_ramalama() -> None:
subprocess.run(["ramalama", "stop", RAMALAMA_CONTAINER_NAME], capture_output=True, timeout=10)

Q_SCRIPT = """#!/usr/bin/env python3
import sys
import json
Expand Down Expand Up @@ -113,8 +165,9 @@ def main():
"""

def main():
if not os.environ.get("GROQ_API_KEY") and not os.environ.get("GEMINI_API_KEY"):
print("Error: Set GROQ_API_KEY or GEMINI_API_KEY.", file=sys.stderr)
use_local = os.environ.get("SSHQ_USE_LOCAL", "").lower() in ("1", "true", "yes", "y")
if not use_local and not os.environ.get("GROQ_API_KEY") and not os.environ.get("GEMINI_API_KEY"):
print("Error: Set SSHQ_USE_LOCAL=1 for local (RamaLama), or GROQ_API_KEY, or GEMINI_API_KEY.", file=sys.stderr)
sys.exit(1)

prog = os.path.basename(sys.argv[0])
Expand All @@ -130,6 +183,15 @@ def main():
s.bind(("127.0.0.1", 0))
port = s.getsockname()[1]

# When using local (RamaLama), start the model server if not already running
we_started_ramalama = False
if use_local and not os.environ.get("SSHQ_LOCAL_BASE_URL"):
ramalama_port = int(os.environ.get("SSHQ_RAMALAMA_PORT", "8080"))
model = os.environ.get("SSHQ_LOCAL_MODEL", "llama3.2:1b")
if _start_ramalama(ramalama_port, model):
we_started_ramalama = True
os.environ["SSHQ_LOCAL_BASE_URL"] = f"http://127.0.0.1:{ramalama_port}/v1"

# Build the q script with the configured port
q_script = Q_SCRIPT.format(port=port)

Expand Down Expand Up @@ -164,3 +226,6 @@ def main():
except FileNotFoundError:
print("Error: 'ssh' command not found.", file=sys.stderr)
sys.exit(1)
finally:
if we_started_ramalama:
_stop_ramalama()
31 changes: 27 additions & 4 deletions src/sshq/server.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,30 @@
import logging
import re
import flask.cli
from flask import Flask, request, jsonify

from .backends import get_backend


def _extract_command(raw: str) -> str:
"""Extract a single shell command from model output that may include markdown or explanations."""
text = raw.strip()
# Prefer content inside first markdown code block (e.g. ```bash\n...\n```)
match = re.search(r"```(?:bash|sh)?\s*\n(.*?)```", text, re.DOTALL | re.IGNORECASE)
if match:
block = match.group(1).strip()
first_line = block.split("\n")[0].strip()
if first_line:
return first_line
# Otherwise take first non-empty line that looks like a command (not prose)
for line in text.splitlines():
line = line.strip()
if not line or line.startswith(("To ", "The ", "Note:", "Explanation", "Explanation:")):
continue
if len(line) > 2:
return line
return text.split("\n")[0].strip() if text else ""

# Suppress standard Werkzeug request logging
log = logging.getLogger('werkzeug')
log.setLevel(logging.ERROR)
Expand All @@ -24,12 +45,14 @@ def ask():
system_instruction = (
"You are an expert embedded Linux engineer. "
"Provide ONLY the exact shell command to achieve the user's request. "
"Do NOT use markdown formatting (like ```bash). Do NOT provide explanations."
"Do NOT use markdown formatting (like ```bash). Do NOT provide explanations. "
"Do NOT use sudo unless the task clearly requires root (e.g. installing system packages); prefer commands that work with the current user's permissions."
)

try:
text = backend(data['prompt'], system_instruction, temperature=0.0)
return jsonify({"command": text})
text = backend(data['prompt'], system_instruction, temperature=0.0, max_tokens=256)
command = _extract_command(text)
return jsonify({"command": command or text})
except Exception as e:
return jsonify({"error": str(e)}), 500

Expand All @@ -50,7 +73,7 @@ def analyze():
contents = f"Content to analyze:\n\n{data['content']}\n\nUser question: {data['prompt']}"

try:
text = backend(contents, system_instruction, temperature=0.0)
text = backend(contents, system_instruction, temperature=0.0, max_tokens=1024)
return jsonify({"analysis": text})
except Exception as e:
return jsonify({"error": str(e)}), 500
Expand Down
12 changes: 12 additions & 0 deletions tests/docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# aarch64 Ubuntu with SSH for sshq functional tests
FROM ubuntu:24.04
RUN apt-get update -qq && DEBIAN_FRONTEND=noninteractive apt-get install -y -qq \
openssh-server \
python3 \
&& rm -rf /var/lib/apt/lists/*
RUN mkdir -p /run/sshd /root/.ssh && chmod 700 /root/.ssh
# Allow root login with key only (we mount authorized_keys at run time)
RUN sed -i 's/#PermitRootLogin.*/PermitRootLogin prohibit-password/' /etc/ssh/sshd_config \
&& sed -i 's/#PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
Loading
Loading