pridolfi · pridolfi · Mar 18, 2026 · Mar 18, 2026 · Mar 18, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -28,4 +28,28 @@ jobs:
         run: ruff check src tests
 
       - name: Tests
-        run: pytest tests -v
+        run: pytest tests -v -m "not functional"
+
+  functional:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      - name: Install dependencies
+        run: pip install -e ".[dev]"
+
+      - name: Set up QEMU (aarch64 emulation)
+        uses: docker/setup-qemu-action@v3
+
+      - name: Install Podman
+        uses: redhat-actions/podman-install@main
+        with:
+          ubuntu-version: "24.04"
+
+      - name: Functional tests
+        run: pytest tests -v -m functional
diff --git a/README.md b/README.md
@@ -16,16 +16,17 @@ When working on embedded boards (like Yocto or Buildroot builds), you often face
 
 ## How it Works
 
-1. **The Host Server:** When you run `sshq`, it spins up a lightweight local web server in the background on your laptop. This server holds your API key and talks to Groq if `GROQ_API_KEY` is set, otherwise to Gemini.
+1. **The Host Server:** When you run `sshq`, it spins up a lightweight local web server in the background on your laptop. This server talks to your chosen AI backend: local (RamaLama/Ollama) if `SSHQ_USE_LOCAL=1`, else Groq if `GROQ_API_KEY` is set, otherwise Gemini.
 2. **The Reverse Tunnel:** `sshq` wraps your standard `ssh` command and adds a reverse port forward to a random local port, creating a secure tunnel from the board back to your laptop.
 3. **Transparent Injection:** During login, `sshq` passes a Python one-liner to the board (the `q` client script) and drops it into `~/.local/bin/q`, and immediately hands you an interactive shell.
 
 ## Prerequisites
 
 * Python 3.9 or higher (on your host machine).
-* An API key for at least one supported AI provider:
-  * **Groq** (free tier): get a key from [Groq Console](https://console.groq.com/). If set, `GROQ_API_KEY` is used first.
-  * **Gemini** (default otherwise): get a key from [Google AI Studio](https://aistudio.google.com/) or Google Cloud console.
+* An AI backend (choose one):
+  * **Local model** (RamaLama, Ollama, llama.cpp): set `SSHQ_USE_LOCAL=1` and run a local OpenAI-compatible server (see [Local / RamaLama](#local--ramalama) below).
+  * **Groq** (free tier): get a key from [Groq Console](https://console.groq.com/). If set, `GROQ_API_KEY` is used when local is not enabled.
+  * **Gemini**: get a key from [Google AI Studio](https://aistudio.google.com/) or Google Cloud console (used when neither local nor Groq is set).
 * Python 3 installed on the target embedded board (standard library only; no external packages required).
 
 ## Installation
@@ -39,15 +40,23 @@ pip install git+https://github.com/pridolfi/sshq.git
 (Note: You can also clone the repo and use `pip install -e .` if you plan to modify the code).
 
 ## Usage
-1. Export your API key in your terminal (or add it to your `~/.bashrc` / `~/.zshrc`). If `GROQ_API_KEY` is set it is used; otherwise `GEMINI_API_KEY` is required.
+1. Configure your AI backend (see [Environment variables](#environment-variables)). For cloud backends, export the API key; for local, set `SSHQ_USE_LOCAL=1` and ensure a local server is running.
+
+   **Local (RamaLama / Ollama / llama.cpp):**
+
+   ```bash
+   export SSHQ_USE_LOCAL=1
+   # Optional: export SSHQ_LOCAL_BASE_URL="http://127.0.0.1:8080/v1"
+   # Optional: export SSHQ_LOCAL_MODEL="tinyllama"
+   ```
 
    **Groq** (free tier):
 
    ```bash
    export GROQ_API_KEY="your_groq_api_key_here"
    ```
 
-   **Gemini** (used when `GROQ_API_KEY` is not set):
+   **Gemini** (used when local and Groq are not set):
 
    ```bash
    export GEMINI_API_KEY="your_gemini_api_key_here"
@@ -123,11 +132,42 @@ CPU Features:
 - asimddp: Advanced SIMD Dot Product - SIMD instructions for dot product operations, useful for machine learning workloads.
 ```
 
+## Local / RamaLama
+
+You can run inference entirely on your machine using [RamaLama](https://ramalama.ai/) (or any OpenAI-compatible server like Ollama or llama.cpp). No API keys are required.
+
+**Managed mode (no extra commands):** If you set `SSHQ_USE_LOCAL=1` and do **not** set `SSHQ_LOCAL_BASE_URL`, sshq will start RamaLama automatically when you connect (on port 8080, or `SSHQ_RAMALAMA_PORT`) and stop it when you disconnect. Install RamaLama once, then just run sshq:
+
+   ```bash
+   curl -fsSL https://ramalama.ai/install.sh | bash
+   export SSHQ_USE_LOCAL=1
+   sshq root@192.168.1.100
+   ```
+
+   The first connection may take a minute while the model loads. If port 8080 is already in use (e.g. you started `ramalama serve` yourself), sshq uses that server and does not stop it on exit.
+
+**Manual mode:** To run RamaLama yourself and have sshq only connect to it, set both:
+
+   ```bash
+   export SSHQ_USE_LOCAL=1
+   export SSHQ_LOCAL_BASE_URL="http://127.0.0.1:8080/v1"
+   # Optional: SSHQ_LOCAL_MODEL defaults to llama3.2:1b.
+   sshq root@192.168.1.100
+   ```
+
+**Behavior with small local models:** Command responses are post-processed: markdown code blocks (e.g. `\`\`\`bash ... \`\`\``) are stripped and only the single shell command is returned. Analysis replies are limited in length to reduce repetitive run-on output. For best results, use a 1B+ instruction-tuned model (e.g. Llama 3.2 1B Instruct, SmolLM2, Phi-2) when your hardware allows.
+
+The local backend uses the same OpenAI `chat/completions` API that RamaLama’s default llama.cpp server exposes, so no extra dependencies are needed beyond the existing `openai` package.
+
 ## Environment variables
 
 | Variable | Required | Default | Description |
 |----------|----------|---------|-------------|
-| `GROQ_API_KEY` | No (tried first) | — | Your Groq API key (free at [console.groq.com](https://console.groq.com)). If set, Groq is used. |
-| `GEMINI_API_KEY` | Yes (if Groq not set) | — | Your Gemini API key. |
+| `SSHQ_USE_LOCAL` | No | — | Set to `1`, `true`, or `yes` to use a local OpenAI-compatible server (e.g. RamaLama). |
+| `SSHQ_LOCAL_BASE_URL` | No | — | If unset with `SSHQ_USE_LOCAL=1`, sshq starts and stops RamaLama for you. If set, sshq uses this URL and does not manage the server. |
+| `SSHQ_RAMALAMA_PORT` | No | `8080` | Port for sshq-managed RamaLama (only when `SSHQ_LOCAL_BASE_URL` is unset). |
+| `SSHQ_LOCAL_MODEL` | No | `llama3.2:1b` | Model to serve (managed mode) or name for API (manual mode). Ollama uses colon e.g. `llama3.2:1b`. |
+| `GROQ_API_KEY` | No (after local) | — | Your Groq API key (free at [console.groq.com](https://console.groq.com)). If set and local not used, Groq is used. |
+| `GEMINI_API_KEY` | Yes (if neither local nor Groq) | — | Your Gemini API key. |
 | `SSHQ_GEMINI_MODEL` | No | `gemini-2.5-flash` | Gemini model (e.g. `gemini-2.5-flash-lite` for higher quota). |
 | `SSHQ_GROQ_MODEL` | No | `llama-3.3-70b-versatile` | Groq model (e.g. `llama-3.1-8b-instant` for faster replies). |
diff --git a/pyproject.toml b/pyproject.toml
@@ -27,6 +27,9 @@ sshq = "sshq.cli:main"
 
 [tool.pytest.ini_options]
 testpaths = ["tests"]
+markers = [
+    "functional: marks tests as functional (require Podman, aarch64 container)",
+]
 filterwarnings = [
     "ignore::DeprecationWarning:google.genai.*",
 ]

diff --git a/src/sshq/backends.py b/src/sshq/backends.py
@@ -1,25 +1,28 @@
-"""AI provider backends for sshq. Each backend implements generate(prompt, system_instruction, temperature)."""
+"""AI provider backends for sshq. Each backend implements generate(prompt, system_instruction, temperature, max_tokens)."""
 import os
 
 
-def _gemini_generate(prompt, system_instruction, temperature=0.0):
+def _gemini_generate(prompt, system_instruction, temperature=0.0, max_tokens=None):
     from google import genai
     from google.genai import types
 
     client = genai.Client()
     model = os.environ.get("SSHQ_GEMINI_MODEL", "gemini-2.5-flash")
+    config = types.GenerateContentConfig(
+        system_instruction=system_instruction,
+        temperature=temperature,
+    )
+    if max_tokens is not None:
+        config.max_output_tokens = max_tokens
     response = client.models.generate_content(
         model=model,
         contents=prompt,
-        config=types.GenerateContentConfig(
-            system_instruction=system_instruction,
-            temperature=temperature,
-        ),
+        config=config,
     )
     return response.text.strip()
 
 
-def _groq_generate(prompt, system_instruction, temperature=0.0):
+def _groq_generate(prompt, system_instruction, temperature=0.0, max_tokens=None):
     from openai import OpenAI
 
     client = OpenAI(
@@ -29,23 +32,55 @@ def _groq_generate(prompt, system_instruction, temperature=0.0):
     model = os.environ.get("SSHQ_GROQ_MODEL", "llama-3.3-70b-versatile")
     # Groq converts temperature=0 to 1e-8; use a tiny value for deterministic output
     t = max(1e-8, temperature)
-    response = client.chat.completions.create(
-        model=model,
-        messages=[
+    kwargs = {
+        "model": model,
+        "messages": [
             {"role": "system", "content": system_instruction},
             {"role": "user", "content": prompt},
         ],
-        temperature=t,
-    )
+        "temperature": t,
+    }
+    if max_tokens is not None:
+        kwargs["max_tokens"] = max_tokens
+    response = client.chat.completions.create(**kwargs)
+    return (response.choices[0].message.content or "").strip()
+
+
+def _local_generate(prompt, system_instruction, temperature=0.0, max_tokens=None):
+    """OpenAI-compatible local server (e.g. RamaLama, Ollama, llama.cpp)."""
+    from openai import OpenAI
+
+    base_url = os.environ.get("SSHQ_LOCAL_BASE_URL", "http://127.0.0.1:8080/v1")
+    if not base_url.endswith("/v1"):
+        base_url = base_url.rstrip("/") + "/v1"
+    model = os.environ.get("SSHQ_LOCAL_MODEL", "llama3.2:1b")
+    # Many local servers treat 0 as deterministic; use small value if needed
+    t = max(1e-8, temperature)
+    client = OpenAI(base_url=base_url, api_key=os.environ.get("SSHQ_LOCAL_API_KEY") or "not-used")
+    kwargs = {
+        "model": model,
+        "messages": [
+            {"role": "system", "content": system_instruction},
+            {"role": "user", "content": prompt},
+        ],
+        "temperature": t,
+    }
+    if max_tokens is not None:
+        kwargs["max_tokens"] = max_tokens
+    response = client.chat.completions.create(**kwargs)
     return (response.choices[0].message.content or "").strip()
 
 
 def get_backend():
     """Return the active backend function (prompt, system_instruction, temperature=0.0) -> str.
-    Uses Groq if GROQ_API_KEY is set, otherwise Gemini (requires GEMINI_API_KEY).
+    Priority: SSHQ_USE_LOCAL (RamaLama/Ollama/etc) > GROQ_API_KEY > GEMINI_API_KEY.
     """
+    if os.environ.get("SSHQ_USE_LOCAL", "").lower() in ("1", "true", "yes"):
+        return _local_generate
     if os.environ.get("GROQ_API_KEY"):
         return _groq_generate
     if os.environ.get("GEMINI_API_KEY"):
         return _gemini_generate
-    raise ValueError("Set GROQ_API_KEY or GEMINI_API_KEY.")
+    raise ValueError(
+        "Set SSHQ_USE_LOCAL=1 for local (RamaLama/Ollama), or GROQ_API_KEY, or GEMINI_API_KEY."
+    )
diff --git a/src/sshq/cli.py b/src/sshq/cli.py
@@ -4,9 +4,61 @@
 import subprocess
 import threading
 import base64
+import time
 from importlib.metadata import version
 from .server import start_server
 
+# Container name for sshq-managed RamaLama (so we can stop it on exit)
+RAMALAMA_CONTAINER_NAME = "sshq-ramalama"
+
+
+def _is_port_in_use(host: str, port: int) -> bool:
+    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+        try:
+            s.settimeout(0.5)
+            s.connect((host, port))
+            return True
+        except (OSError, socket.error):
+            return False
+
+
+def _wait_for_port(host: str, port: int, timeout_sec: float = 120, interval_sec: float = 1.0) -> bool:
+    deadline = time.monotonic() + timeout_sec
+    while time.monotonic() < deadline:
+        if _is_port_in_use(host, port):
+            return True
+        time.sleep(interval_sec)
+    return False
+
+
+def _start_ramalama(port: int, model: str) -> bool:
+    """Start RamaLama serve in the background. Return True if we started it, False if port was in use."""
+    if _is_port_in_use("127.0.0.1", port):
+        return False
+    try:
+        subprocess.run(
+            ["ramalama", "serve", "-d", "-p", str(port), "--name", RAMALAMA_CONTAINER_NAME, model],
+            check=True,
+            capture_output=True,
+            timeout=300,
+        )
+    except FileNotFoundError:
+        print("Error: 'ramalama' not found. Install it from https://ramalama.ai/ or set SSHQ_LOCAL_BASE_URL to an existing server.", file=sys.stderr)
+        print("To install RamaLama, run: curl -fsSL https://ramalama.ai/install.sh | bash", file=sys.stderr)
+        sys.exit(1)
+    except subprocess.CalledProcessError as e:
+        print(f"Error: ramalama serve failed: {e.stderr.decode() if e.stderr else e}", file=sys.stderr)
+        sys.exit(1)
+    if not _wait_for_port("127.0.0.1", port):
+        print("Error: RamaLama server did not become ready in time. Try running 'ramalama serve -d -p {} {}' manually.".format(port, model), file=sys.stderr)
+        subprocess.run(["ramalama", "stop", RAMALAMA_CONTAINER_NAME], capture_output=True)
+        sys.exit(1)
+    return True
+
+
+def _stop_ramalama() -> None:
+    subprocess.run(["ramalama", "stop", RAMALAMA_CONTAINER_NAME], capture_output=True, timeout=10)
+
 Q_SCRIPT = """#!/usr/bin/env python3
 import sys
 import json
@@ -113,8 +165,9 @@ def main():
 """
 
 def main():
-    if not os.environ.get("GROQ_API_KEY") and not os.environ.get("GEMINI_API_KEY"):
-        print("Error: Set GROQ_API_KEY or GEMINI_API_KEY.", file=sys.stderr)
+    use_local = os.environ.get("SSHQ_USE_LOCAL", "").lower() in ("1", "true", "yes", "y")
+    if not use_local and not os.environ.get("GROQ_API_KEY") and not os.environ.get("GEMINI_API_KEY"):
+        print("Error: Set SSHQ_USE_LOCAL=1 for local (RamaLama), or GROQ_API_KEY, or GEMINI_API_KEY.", file=sys.stderr)
         sys.exit(1)
 
     prog = os.path.basename(sys.argv[0])
@@ -130,6 +183,15 @@ def main():
         s.bind(("127.0.0.1", 0))
         port = s.getsockname()[1]
 
+    # When using local (RamaLama), start the model server if not already running
+    we_started_ramalama = False
+    if use_local and not os.environ.get("SSHQ_LOCAL_BASE_URL"):
+        ramalama_port = int(os.environ.get("SSHQ_RAMALAMA_PORT", "8080"))
+        model = os.environ.get("SSHQ_LOCAL_MODEL", "llama3.2:1b")
+        if _start_ramalama(ramalama_port, model):
+            we_started_ramalama = True
+        os.environ["SSHQ_LOCAL_BASE_URL"] = f"http://127.0.0.1:{ramalama_port}/v1"
+
     # Build the q script with the configured port
     q_script = Q_SCRIPT.format(port=port)
 
@@ -164,3 +226,6 @@ def main():
     except FileNotFoundError:
         print("Error: 'ssh' command not found.", file=sys.stderr)
         sys.exit(1)
+    finally:
+        if we_started_ramalama:
+            _stop_ramalama()
diff --git a/src/sshq/server.py b/src/sshq/server.py
@@ -1,9 +1,30 @@
 import logging
+import re
 import flask.cli
 from flask import Flask, request, jsonify
 
 from .backends import get_backend
 
+
+def _extract_command(raw: str) -> str:
+    """Extract a single shell command from model output that may include markdown or explanations."""
+    text = raw.strip()
+    # Prefer content inside first markdown code block (e.g. ```bash\n...\n```)
+    match = re.search(r"```(?:bash|sh)?\s*\n(.*?)```", text, re.DOTALL | re.IGNORECASE)
+    if match:
+        block = match.group(1).strip()
+        first_line = block.split("\n")[0].strip()
+        if first_line:
+            return first_line
+    # Otherwise take first non-empty line that looks like a command (not prose)
+    for line in text.splitlines():
+        line = line.strip()
+        if not line or line.startswith(("To ", "The ", "Note:", "Explanation", "Explanation:")):
+            continue
+        if len(line) > 2:
+            return line
+    return text.split("\n")[0].strip() if text else ""
+
 # Suppress standard Werkzeug request logging
 log = logging.getLogger('werkzeug')
 log.setLevel(logging.ERROR)
@@ -24,12 +45,14 @@ def ask():
     system_instruction = (
         "You are an expert embedded Linux engineer. "
         "Provide ONLY the exact shell command to achieve the user's request. "
-        "Do NOT use markdown formatting (like ```bash). Do NOT provide explanations."
+        "Do NOT use markdown formatting (like ```bash). Do NOT provide explanations. "
+        "Do NOT use sudo unless the task clearly requires root (e.g. installing system packages); prefer commands that work with the current user's permissions."
     )
 
     try:
-        text = backend(data['prompt'], system_instruction, temperature=0.0)
-        return jsonify({"command": text})
+        text = backend(data['prompt'], system_instruction, temperature=0.0, max_tokens=256)
+        command = _extract_command(text)
+        return jsonify({"command": command or text})
     except Exception as e:
         return jsonify({"error": str(e)}), 500
 
@@ -50,7 +73,7 @@ def analyze():
     contents = f"Content to analyze:\n\n{data['content']}\n\nUser question: {data['prompt']}"
 
     try:
-        text = backend(contents, system_instruction, temperature=0.0)
+        text = backend(contents, system_instruction, temperature=0.0, max_tokens=1024)
         return jsonify({"analysis": text})
     except Exception as e:
         return jsonify({"error": str(e)}), 500

diff --git a/tests/docker/Dockerfile b/tests/docker/Dockerfile
@@ -0,0 +1,12 @@
+# aarch64 Ubuntu with SSH for sshq functional tests
+FROM ubuntu:24.04
+RUN apt-get update -qq && DEBIAN_FRONTEND=noninteractive apt-get install -y -qq \
+    openssh-server \
+    python3 \
+    && rm -rf /var/lib/apt/lists/*
+RUN mkdir -p /run/sshd /root/.ssh && chmod 700 /root/.ssh
+# Allow root login with key only (we mount authorized_keys at run time)
+RUN sed -i 's/#PermitRootLogin.*/PermitRootLogin prohibit-password/' /etc/ssh/sshd_config \
+    && sed -i 's/#PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
+EXPOSE 22
+CMD ["/usr/sbin/sshd", "-D"]