This guide covers the manual steps for a fresh Fedora installation on an AMD Strix Halo system, followed by an automated setup script that handles everything from kernel configuration through service deployment.
Two-phase approach:
- Manual (this document): Fedora install, disk setup, clone the repo, bootstrap Python
- Automated (
scripts/setup_strixhalo.py): Everything else — GTT memory, toolboxes, llama.cpp servers, Python venv, PostgreSQL, OpenSearch, MCP server, firewall, VSCode
- Hardware: AMD Ryzen AI MAX+ 395 "Strix Halo" (gfx1151)
- RAM: 128 GB LPDDR5x (unified CPU+GPU memory)
- System Disk: 1 TB (Fedora OS,
/) - Data Disk: 4 TB (models, Wikipedia pipeline, project repo — mounted at
/mnt/databy default, configurable viaDEEPRED_ROOT) - OS: Fedora 43
| Component | Version | Notes |
|---|---|---|
| OS | Fedora 43 | |
| Linux Kernel | 6.18.6-200+ | Kernels < 6.18.4 have gfx1151 bugs — avoid them (AMD reference) |
| Linux Firmware | 20260110+ | Do NOT use linux-firmware-20251125 — breaks ROCm on Strix Halo |
| ROCm (toolbox) | 7.2 (AMD repo) | Latest stable; kernel 6.18.4+ compatibility. ROCm 6.4.4 available as fallback. |
⚠️ Critical: The kernel, firmware, and ROCm versions must be compatible. ROCm 7.1.1 is incompatible with kernels ≥ 6.18.4 and has been deprecated. Always use ROCm 7.2+ with modern kernels. ROCm 6.4.4 is available as a fallback if you encounter regressions — change the image tag in the setup script.Kernel patches: Two AMD KFD driver commits are required for correct queue creation and memory availability checks on gfx1151. These are merged upstream in Linux 6.18.4+ (
7f26af7,7445db6). Fedora 43 includes them natively.
| Factor | Ubuntu 25.10 | Fedora 43 |
|---|---|---|
| Kernel | 6.14+ | 6.18+ (critical for Strix Halo stability) |
| AMD GPU support | Requires manual ROCm repo setup | Strong out-of-box AMD support |
| Toolbox/Podman | Available but not default | First-class citizen (pre-installed) |
| ROCm | Manual repo + pinning | Available via native Fedora packages or AMD repos |
| Issue | LM Studio | llama.cpp server |
|---|---|---|
| Server management | Requires Xvfb + VNC + AppImage | Native CLI daemon, simple systemd unit |
| Updates | Manual AppImage download | git pull && cmake --build |
| Resource usage | Electron app + GUI in memory | Minimal — just the inference engine |
| OpenAI compatibility | ✅ /v1/chat/completions |
✅ Same endpoints, same API |
Tip: Before editing system config files (
/etc/fstab,/etc/default/grub, etc.), back them up:sudo cp /etc/fstab /etc/fstab.bak
- Download: Fedora Workstation or Fedora Server
- Create USB: Fedora Media Writer, Rufus (Windows), or
dd - Install to the 1 TB system disk using the Fedora installer
After the initial install (via KVM or local console), enable SSH so all remaining work can be done remotely:
# Install and enable SSH server
sudo dnf install -y openssh-server
sudo systemctl enable --now sshd
# Open SSH in firewall (Fedora Workstation has firewalld active by default)
sudo firewall-cmd --permanent --add-service=ssh
sudo firewall-cmd --reload
# Verify SSH is listening
ss -tlnp | grep :22From this point on, you can disconnect KVM and work entirely via SSH:
ssh your-user@fedora
Fedora defaults the hostname to fedora. Rename it to MiniAI:
sudo hostnamectl set-hostname MiniAIVerify the change:
hostnamectlThe new hostname takes effect immediately for hostnamectl and DNS, but your shell prompt will update after a new login. From now on you can SSH in with:
ssh your-user@MiniAI# Update system (critical: ensures kernel 6.18.4+ and firmware 20260110+)
sudo dnf upgrade --refresh -y
# ⚠️ Reboot after kernel/firmware update
sudo rebootAfter reboot (reconnect via SSH), verify:
# Must be 6.18.4+
uname -r
# Must NOT be 20251125
rpm -q linux-firmware
⚠️ Do not proceed if your kernel is older than 6.18.4 or firmware islinux-firmware-20251125. Update first:sudo dnf upgrade linux-firmware kernel --refresh.
⚠️ Important: Strix Halo systems left unattended will enter sleep mode (pulsating power LED) and may not wake via SSH or keyboard. A hard power-cycle is the only recovery. Disable all sleep states immediately after the first reboot.
# Disable all sleep targets so systemd never suspends/hibernates
sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target suspend-then-hibernate.target
# Disable idle suspend via logind (covers both GUI and headless sessions)
sudo mkdir -p /etc/systemd/logind.conf.d
cat <<'EOF' | sudo tee /etc/systemd/logind.conf.d/no-sleep.conf
[Login]
HandleSuspendKey=ignore
HandleHibernateKey=ignore
HandleLidSwitch=ignore
HandleLidSwitchExternalPower=ignore
HandleLidSwitchDocked=ignore
IdleAction=ignore
IdleActionSec=0
EOF
sudo systemctl restart systemd-logind
# ⚠️ The restart above will terminate all active desktop sessions (GNOME/Wayland/X11).
# Expect to be logged out — your screen will reset and you'll need to re-login.
# This is normal: systemd-logind manages login sessions, and restarting it
# invalidates them. SSH sessions are also dropped — just reconnect.
# If GNOME/Wayland desktop is installed, disable its automatic suspend too
if command -v gsettings &>/dev/null; then
# AC power — disable auto-suspend
gsettings set org.gnome.settings-daemon.plugins.power sleep-inactive-ac-type 'nothing'
gsettings set org.gnome.settings-daemon.plugins.power sleep-inactive-ac-timeout 0
# Battery (unlikely on desktop, but defensive)
gsettings set org.gnome.settings-daemon.plugins.power sleep-inactive-battery-type 'nothing'
gsettings set org.gnome.settings-daemon.plugins.power sleep-inactive-battery-timeout 0
fi
# Disable auto-suspend in GDM greeter session
# ⚠️ GDM has its own dconf database — without this, the login screen will
# trigger "The system will suspend now!" even with everything above disabled.
sudo mkdir -p /etc/dconf/db/gdm.d
# Ensure GDM dconf profile exists
cat <<'EOF' | sudo tee /etc/dconf/profile/gdm
user-db:user
system-db:gdm
file-db:/usr/share/gdm/greeter-dconf-defaults
EOF
# Override power settings in the GDM greeter
cat <<'EOF' | sudo tee /etc/dconf/db/gdm.d/99-no-suspend
[org/gnome/settings-daemon/plugins/power]
sleep-inactive-ac-type='nothing'
sleep-inactive-ac-timeout=uint32 0
sleep-inactive-battery-type='nothing'
sleep-inactive-battery-timeout=uint32 0
EOF
sudo dconf updateVerify all sleep targets are masked:
systemctl status sleep.target suspend.target hibernate.target
# All should show "Loaded: masked"This section only applies if your hardware is a Minisforum MS-S1 MAX. Skip this if you're using a different Strix Halo system (e.g., Framework Laptop). BIOS updates improve memory stability, NPU/GPU performance, USB4 V2 reliability, and patch AMD PSP security vulnerabilities.
⚠️ Disclaimer: Flashing BIOS/UEFI firmware carries inherent risk, including rendering your device inoperable ("bricking"). Ensure you have a stable power supply during the flash process and verify you are using the correct firmware for your specific hardware model. You do this entirely at your own risk.
TL;DR: Install deps → download BIOS .7z + UEFI Shell → verify checksums → partition USB as EFI → copy files → boot UEFI Shell → run
EfiFlash.nsh.
Minisforum only ships Windows-based BIOS update tools, but the BIOS package includes AfuEfix64.efi — AMI's EFI-native flash utility — which runs directly from the UEFI Shell before any OS loads. No Windows needed.
Requirements:
- A USB flash drive (512 MB or larger)
7z(p7zip + p7zip-plugins),sgdisk(gptfdisk), anddosfstoolspackages:sudo dnf install -y gdisk dosfstools p7zip p7zip-plugins
An automated script handles downloading, partitioning, and file copying with safety checks:
1. Identify your USB device (
lsblk -d -o NAME,SIZE,MODEL,TRAN | grep usbConfirm the device name (e.g., sda) matches your USB drive's size and model.
2. Wipe the USB drive (required if previously used as a Rufus ISO-mode boot disk):
⚠️ Why this is necessary: Rufus ISO-mode creates a hybrid MBR/GPT layout with ISO9660 and ISOHybrid signatures. Theprep-usb.shscript usessgdisk --zap-allwhich only removes GPT/MBR partition structures — it does not clear ISO9660 filesystem signatures. The kernel continues to see the old Fedora boot layout, and the script silently creates a partition alongside the stale content.
# Replace /dev/sdX with your device from step 1 — TRIPLE-CHECK before running!
# Unmount all partitions on the device
sudo umount /dev/sdX* 2>/dev/null || true
# Remove ALL filesystem signatures (ISO9660, FAT, GPT, MBR, etc.)
sudo wipefs -a /dev/sdX
# Zero out the first 1 MB to destroy any residual boot sectors
# and ISO9660 primary volume descriptors
sudo dd if=/dev/zero of=/dev/sdX bs=1M count=1 status=none
# Force kernel to re-read the (now empty) partition table
sudo partprobe /dev/sdXVerify the drive is clean:
lsblk /dev/sdX
# Should show the device with no partitions underneath
sudo wipefs /dev/sdX
# Should show no signatures3. Run the script with the verified device path:
git clone https://github.com/capetron/minisforum-ms-s1-max-bios.git
cd minisforum-ms-s1-max-bios
sudo ./scripts/prep-usb.sh /dev/sdX # Replace sdX with your device from step 14. Shut down and boot from USB to flash the BIOS:
sudo shutdown now- Plug the USB into the MS-S1 Max
- Power on and press Del repeatedly to enter BIOS Setup
- Disable Secure Boot: Navigate to Security menu (you may need to set an Administrator password first), then disable Secure Boot. Save and exit.
- Re-enter BIOS (press Del again)
- Look for "UEFI Shell" or "Launch EFI Shell from filesystem device" in the boot menu. If not available, go to Boot menu → Add Boot Option → point to
shellx64.efion the USB. - Boot into the UEFI Shell
At the Shell> prompt:
FS0:
dir
AfuEfix64.efi EfiFlash.nsh shellx64.efi SHWSA.BIN
EfiFlash.nsh
If
FS0:doesn't show your files, tryFS1:,FS2:, etc. Usemap -cto list all filesystem mappings.Troubleshooting: If you see
EFIandMachfolders instead of the flash files at root, the USB drive was not properly wiped before runningprep-usb.sh. Go back to step 2 (Wipe the USB drive) and re-run the preparation.
The flash process will write the new BIOS image and automatically shut down or reboot the system.
Don't panic! The first boot after a BIOS update takes 5–10 minutes while the system performs memory training (recharacterizing all 128 GB of LPDDR5X at 8000 MT/s). You may see a black screen, the power LED cycling, or several reboots — this is completely normal.
After the first boot completes:
- All BIOS settings will be reset to defaults
- Re-enter BIOS (Del key) to verify the new version and adjust settings (UMA Frame Buffer Size, etc.)
- Re-enable Secure Boot if desired
- Check boot order — your Fedora installation should still be there
- If the system won't boot after 15 minutes, try a CMOS reset (unplug power, remove CMOS battery for 30 seconds)
References: GitHub: capetron/minisforum-ms-s1-max-bios · Full guide: Petronella Technology Group
Enter BIOS and look for:
- UMA Frame Buffer Size → Set to minimum (e.g., 1 GB on MS-S1 MAX)
- VRAM Size or iGPU Memory → Leave at minimum / default
Why minimum? The UMA Frame Buffer (GART) is a fixed memory reservation that is never available to the OS. On Linux, GPU memory is allocated dynamically via GTT (Graphics Translation Table) using kernel parameters — the setup script configures
amdgpu.gttsizeandttm.pages_limitto allow the iGPU to access up to ~124 GB on demand while keeping the memory available to the CPU when idle. Setting UMA to maximum (e.g., 96 GB) would wastefully lock that memory away from the system. AMD's own Strix Halo system optimization guide recommends keeping VRAM reservation small (e.g., 0.5 GB) and increasing the shared TTM/GTT limit instead. The Strix Halo Toolboxes project tests with only 512 MB BIOS allocation and the strixhalo.wiki explicitly recommends: "set GART to the minimum (eg, 512MB) and then allocating automatically via GTT."
Identify the 4 TB data disk first:
# List disks — find the 4 TB drive (e.g., /dev/nvme1n1 or /dev/sdb)
lsblkChoose the appropriate option below based on your situation:
If the data disk already contains data from a previous installation (models, Wikipedia pipeline, repo, etc.), do not format it — just mount it:
sudo mkdir -p /mnt/data
# List partitions on the data disk to find the right one
lsblk -f /dev/nvme1n1
# Look for the partition with your data (typically /dev/nvme1n1p1)
# ⚠️ Don't run blkid on the raw disk (/dev/nvme1n1) — that only shows
# partition table info (PTUUID/PTTYPE), not the filesystem UUID/TYPE.
# Identify filesystem type and UUID from the PARTITION
sudo blkid /dev/nvme1n1p1
# Note the TYPE= (ext4/xfs/btrfs) and UUID= from the output
# Add to fstab using UUID and detected type (skip if already present)
# Replace <UUID> and <type> with your actual values from blkid
grep -q '<UUID>' /etc/fstab || \
echo 'UUID=<UUID> /mnt/data <type> defaults 0 2' | sudo tee -a /etc/fstab
sudo systemctl daemon-reload # reload fstab changes into systemd
sudo mount -a
ls /mnt/data
# Fix ownership so your user can write to the data disk
sudo chown -R $USER:$USER /mnt/dataIf this is a new or empty disk, format it:
# ⚠️ This DESTROYS all data on the disk — adjust device path as needed
sudo mkfs.ext4 -L data /dev/nvme1n1
sudo mkdir -p /mnt/data
# Add to fstab (skip if already present)
grep -q 'LABEL=data' /etc/fstab || \
echo 'LABEL=data /mnt/data ext4 defaults 0 2' | sudo tee -a /etc/fstab
sudo systemctl daemon-reload # reload fstab changes into systemd
sudo mount -a
sudo chown -R $USER:$USER /mnt/dataGitHub no longer supports password authentication for git operations. Set up SSH key authentication:
# Generate an SSH key (press Enter to accept defaults, no passphrase needed for a server)
ssh-keygen -t ed25519 -C "your-email@example.com"
# Start the SSH agent and add the key
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519
# Display the public key — copy this to GitHub
cat ~/.ssh/id_ed25519.pubAdd the key to your GitHub account:
- On GitHub, click your profile picture → Settings
- In the Access section of the sidebar, click SSH and GPG keys
- Click New SSH key, paste the public key, and save
For detailed steps with screenshots, see Adding a new SSH key to your GitHub account.
Verify the connection:
ssh -T git@github.com
# Should print: "Hi <username>! You've successfully authenticated..."sudo dnf install -y git python3 python3-pip
# Clone via SSH (or update existing repo)
if [ -d /mnt/data/DeepRedAI/.git ]; then
git -C /mnt/data/DeepRedAI pull
else
git clone git@github.com:ferzkopp/DeepRedAI.git /mnt/data/DeepRedAI
fi
cd /mnt/data/DeepRedAIMigrating an existing clone from HTTPS to SSH? If you already have a clone that used the HTTPS URL:
git -C /mnt/data/DeepRedAI remote set-url origin git@github.com:ferzkopp/DeepRedAI.git
The repository includes deepred-env.sh — a shell script that exports all path and service variables used by every DeepRedAI script. Source it once to enter development mode:
source /mnt/data/DeepRedAI/deepred-env.shTo load it automatically on every login, add the following to ~/.bashrc:
# Install an editor if you don't have one (nano is pre-installed, joe is an alternative)
sudo dnf install -y joe
# Edit ~/.bashrc and append the lines below
joe ~/.bashrcexport DEEPRED_ROOT="/mnt/data" [ -f "$DEEPRED_ROOT/DeepRedAI/deepred-env.sh" ] && source "$DEEPRED_ROOT/DeepRedAI/deepred-env.sh"
Verify the environment loads on login:
```bash
# Log out and back in (or reconnect SSH)
exit
# Then reconnect:
ssh your-user@MiniAI
# The env script prints all variables on load — confirm they appear
These path variables are printed on load:
| Variable | Default | Purpose |
|---|---|---|
DEEPRED_ROOT |
/mnt/data |
Data-disk mount point. All other paths derive from this. |
DEEPRED_REPO |
$DEEPRED_ROOT/DeepRedAI |
Location of this git clone |
WIKI_DATA |
$DEEPRED_ROOT/wikipedia |
Wikipedia pipeline data |
GUTENBERG_DATA |
$DEEPRED_ROOT/gutenberg |
Project Gutenberg data |
DEEPRED_MODELS |
$DEEPRED_ROOT/models |
LLM and embedding model files |
DEEPRED_VENV |
$DEEPRED_ROOT/venv |
Python virtual environment |
These service-endpoint variables are also exported (but not printed):
| Variable | Default | Purpose |
|---|---|---|
INFERENCE_HOST |
localhost |
Inference server host (LLM + embedding) |
INFERENCE_PORT |
1234 |
LLM inference server port |
EMBEDDING_PORT |
1235 |
Embedding server port |
PG_HOST / PG_PORT |
localhost / 5432 |
PostgreSQL connection |
OS_HOST / OS_PORT |
localhost / 9200 |
OpenSearch connection |
To change file locations, either:
- Override before sourcing:
export DEEPRED_ROOT="/alternate_data"in~/.bashrcbefore the source line - Override individual paths:
export WIKI_DATA="/other/path/wikipedia"before sourcing - Edit
deepred-env.shdirectly (not recommended — will conflict with git updates)
The env file also adds scripts/ to $PATH.
The setup script handles all remaining configuration. It reads DEEPRED_ROOT (and related variables) from the environment, falling back to /mnt/data when unset. Run as root:
source /mnt/data/DeepRedAI/deepred-env.sh # ensure env vars are loaded
sudo -E python3 $DEEPRED_REPO/scripts/setup_strixhalo.pySome stages require a reboot (see table below). After rebooting, SSH back in and run the same command again — the script tracks progress in .setup_state.json and automatically resumes from where it left off.
The script runs through these stages in order:
| Stage | Name | Reboot? | Description |
|---|---|---|---|
| 1 | system_packages |
No | Install build tools, development packages |
| 2 | disable_sleep |
No | Mask sleep/suspend/hibernate targets for always-on operation |
| 3 | gtt_memory |
Yes | Configure kernel parameters for GPU memory, regenerate GRUB (reconnect via SSH after reboot) |
| 4 | gpu_groups |
Yes | Add user to render/video groups (reconnect via SSH after reboot) |
| 5 | vscode |
No | Install VSCode + Python and Copilot extensions |
| 6 | toolbox_setup |
No | Install Podman/toolbox, create ROCm toolbox |
| 7 | model_directories |
No | Create $DEEPRED_MODELS/{llm,embedding}, download models |
| 8 | llama_server |
No | Deploy Podman Quadlet services for LLM + embedding servers |
| 9 | python_venv |
No | Create venv at $DEEPRED_VENV, install PyTorch ROCm + dependencies |
| 10 | postgresql |
No | Install, initialize, configure PostgreSQL + wiki database |
| 11 | wikipedia_schema |
No | Create Wikipedia database schema and extensions |
| 12 | opensearch |
No | Download, configure, deploy OpenSearch as systemd service |
| 13 | mcp_server |
No | Deploy MCP server systemd service |
| 14 | web_gui |
No | Build and deploy Wikipedia web GUI (port 8080) |
| 15 | firewall |
No | Configure firewalld rules for all service ports |
| 16 | llm_swap_helper |
No | Install /usr/local/bin/llm-swap helper script |
| 17 | training_tokenizers |
No | Download tokenizer files for CPT (TinyLlama-1.1B + SmolLM2-360M) |
| 18 | training_models |
No | Download base model checkpoints for CPT (~3 GB total) |
| 19 | training_toolbox |
No | Pull and create the gfx1151 fine-tuning container |
| 20 | verify |
Yes | Run health checks on all components (reboot to confirm boot persistence) |
| 21 | reverify |
No | Post-reboot health check — verify services survive a restart |
# Resume from where it left off (after reboot or interruption)
sudo -E python3 $DEEPRED_REPO/scripts/setup_strixhalo.py
# Run a specific stage only
sudo -E python3 $DEEPRED_REPO/scripts/setup_strixhalo.py --stage gtt_memory
# Re-run a specific stage (even if already completed)
sudo -E python3 $DEEPRED_REPO/scripts/setup_strixhalo.py --stage postgresql --force
# List all stages and their status
sudo -E python3 $DEEPRED_REPO/scripts/setup_strixhalo.py --list
# Start from a specific stage (skip earlier stages)
sudo -E python3 $DEEPRED_REPO/scripts/setup_strixhalo.py --from vscode
# Override the default non-root user (auto-detected from $DEEPRED_ROOT ownership)
sudo -E python3 $DEEPRED_REPO/scripts/setup_strixhalo.py --user myuserStage progress is tracked in $DEEPRED_REPO/.setup_state.json. After a reboot stage, SSH back in (ssh your-user@strixhalo), source the env (source $DEEPRED_ROOT/DeepRedAI/deepred-env.sh), and re-run the same command — the script reads the state file and resumes automatically.
The setup script installs VSCode and the Copilot extensions, but you still need to sign in:
- Open VSCode on the Strix Halo machine (via the desktop, or remotely with
code --tunnel) - Sign in to GitHub Copilot: Click the Copilot icon in the sidebar → Sign in to GitHub → follow the device-code flow (opens a browser URL where you enter a one-time code)
- Git credentials in VSCode: If you set up SSH keys in Step 6, VSCode will use them automatically for any
git@github.com:remote. No additional credential setup is needed.
Headless / SSH-only? Use VSCode Remote Tunnels: run
code tunnelon the Strix Halo, then connect from VSCode on your local machine. Copilot authentication happens on the local side.
| Service | Port | Bind | Purpose |
|---|---|---|---|
llama-server-llm |
1234 | 0.0.0.0 | LLM inference (chat completions) — Podman Quadlet |
llama-server-embed |
1235 | 0.0.0.0 | Embedding generation — Podman Quadlet |
opensearch.service |
9200 | 0.0.0.0 | Full-text and semantic search |
postgresql.service |
5432 | localhost | Wikipedia metadata storage |
mcp.service |
7000 | 0.0.0.0 | Wikipedia MCP server (REST API) |
wiki-gui.service |
8080 | 0.0.0.0 | Wikipedia web GUI (React frontend) |
Network exposure: Ports 1234, 1235, 7000, 8080, and 9200 are opened in firewalld (LAN-accessible). PostgreSQL is localhost-only. To restrict other services, adjust firewalld rules or service bind addresses.
┌──────────────┐ ┌──────────────────┐ ┌──────────────────────────┐
│ Web GUI │────▶│ mcp_server.py │────▶│ llama-server-embed │
│ (:8080) │:7000│ (FastAPI :7000) │:1235│ (port 1235) │
└──────────────┘ └────────┬─────────┘ └──────────────────────────┘
│
┌────────────────┐
│ OpenSearch + │
│ PostgreSQL │
└────────────────┘
┌────────────────────────────────────┐ ┌──────────────────────────┐
│ generate_theme/temporal_datasets │────▶│ llama-server-llm │
│ .py [inside toolbox] │:1234│ (port 1234) │
└────────────────────────────────────┘ └──────────────────────────┘
# Swap to a different model
llm-swap $DEEPRED_MODELS/llm/deepred-1b-q4_k_m.gguf "deepred/deepred" 4096
# Swap to 7B with 8 parallel slots (lightweight — leaves plenty of headroom)
llm-swap $DEEPRED_MODELS/llm/qwen2.5-7b-instruct-q4_k_m-00001-of-00002.gguf \
"qwen2.5-7b-instruct" 8192 --slots 8
# Swap back to default 14B with 4 parallel slots
llm-swap $DEEPRED_MODELS/llm/qwen2.5-14b-instruct-q4_k_m-00001-of-00003.gguf \
--slots 4The setup script creates a Python venv at $DEEPRED_VENV (default: /mnt/data/venv) with PyTorch ROCm, training libraries, and pipeline dependencies pre-installed. You must activate it before running any DeepRedAI Python script.
Activate the venv:
source $DEEPRED_VENV/bin/activateYour shell prompt will change to show (venv) at the beginning — this confirms the venv is active. All python and pip commands now use the venv's interpreter and packages (including ROCm environment variables for Strix Halo).
Run scripts with the venv active:
# Pipeline scripts
python $DEEPRED_REPO/scripts/process_and_index.py
python $DEEPRED_REPO/scripts/extract_wikipedia.py /path/to/dump.xml.bz2
# Training scripts
python $DEEPRED_REPO/scripts/finetune_temporal.py --config my_config.yaml
python $DEEPRED_REPO/scripts/finetune_theme.py --config my_config.yamlDeactivate the venv when done:
deactivateTip: If you prefer a one-liner without activating, use the venv's Python directly:
$DEEPRED_VENV/bin/python $DEEPRED_REPO/scripts/process_and_index.pyThe systemd services (e.g.,
mcp.service) already use this approach.
The system uses two containers for different purposes:
| Container | Image | Purpose | Internal Python |
|---|---|---|---|
llama-rocm-7.2 |
kyuz0/amd-strix-halo-toolboxes:rocm-7.2 |
llama.cpp inference, interactive AI work | 3.14 |
strix-halo-finetuning |
kyuz0/amd-strix-halo-llm-finetuning:latest |
GPU training (gfx1151-compiled PyTorch) | 3.13 |
The inference container also powers the Quadlet services (llama-server-llm on port 1234 and llama-server-embed on port 1235).
# Check both containers exist
podman container exists llama-rocm-7.2 && echo "inference: OK" || echo "inference: MISSING"
podman container exists strix-halo-finetuning && echo "training: OK" || echo "training: MISSING"
# Check if containers are running
podman ps --format '{{.Names}} {{.Status}}' --filter name=llama-rocm --filter name=strix-halo
# Check Quadlet services (inference)
systemctl --user status llama-server-llm llama-server-embedpodman start llama-rocm-7.2
podman exec -it llama-rocm-7.2 bash
# Inside container:
source /mnt/data/DeepRedAI/deepred-env.shpodman start strix-halo-finetuning
podman exec -it strix-halo-finetuning bash
# Inside container (bash-5.3$ prompt):
source /opt/venv/bin/activate
cd /mnt/data/DeepRedAI
python3 scripts/train_deepred_model.py --profile devOr run a single command without entering an interactive shell:
# GPU smoke test
podman exec strix-halo-finetuning /opt/venv/bin/python3 -c \
"import torch; x = torch.tensor([1.0]).cuda(); print('GPU OK:', x)"
# Run training directly
podman exec strix-halo-finetuning bash -c \
'source /opt/venv/bin/activate && cd /mnt/data/DeepRedAI && python3 scripts/train_deepred_model.py --profile dev'Why two containers? Standard PyTorch ROCm wheels (used by the inference container) do not include compiled GPU code for Strix Halo's
gfx1151architecture. GPU detection works but.cuda()segfaults. The fine-tuning container uses PyTorch built from AMD's gfx1151 nightly index (https://rocm.nightlies.amd.com/v2-staging/gfx1151/) with native gfx1151 kernels, plus gfx1151-compiled bitsandbytes, flash-attention, and RCCL.
| Script | Runs On | Container Needed |
|---|---|---|
process_and_index.py |
Host venv | None (calls embed server via HTTP :1235) |
mcp_server.py |
Host venv (systemd) | None (calls embed server via HTTP :1235) |
llm_temporal_analysis_augmentation.py |
Host venv | None (calls LLM server via HTTP :1234) |
train_deepred_model.py |
strix-halo-finetuning |
Yes — must run inside |
create_training_corpus.py |
Host venv | None (CPU only, tokenization) |
extract_wikipedia.py |
Host venv | None (file I/O only) |
retrieve_gutenberg.py |
Host venv | None (HTTP downloads) |
retrieve_chess_content.py |
Host venv | None (HTTP + python-chess) |
# Check all services at once
sudo -E python3 $DEEPRED_REPO/scripts/setup_strixhalo.py --stage verify --force| LM Studio Pattern | llama.cpp Server Equivalent |
|---|---|
lms load <model> --gpu=max |
llm-swap /path/to/model.gguf |
lms unload --all |
sudo systemctl stop llama-server-llm |
lms ps |
curl localhost:1234/v1/models |
API on localhost:1234 |
Identical — no change |
Embeddings on localhost:1234 |
Changed to localhost:1235 (separate server) |
All /v1/chat/completions, /v1/embeddings, /v1/models calls work identically with llama.cpp server.
- AMD Strix Halo Toolboxes — Pre-built containers with ROCm + llama.cpp for gfx1151
- Strix Halo Toolboxes on DockerHub — Available image tags* AMD Strix Halo LLM Fine-tuning Container — gfx1151-compiled PyTorch for training* Strix Halo Benchmarks (Interactive) — Performance data across ROCm versions
- Strix Halo VRAM Estimator
- ROCm is very sensitive to kernel version (Issue #45) — Kernel/firmware/ROCm compatibility matrix
- ROCm 7 Performance Regression Workaround — Mitigated with
-mllvm --amdgpu-unroll-threshold-local=600 - Read error: Bad address (Issue #41) —
--no-mmaprequired - Build 8070 prefill regression (Issue #58)