Skip to content

feat(cli): unified lattice CLI for users #91

@ohdearquant

Description

@ohdearquant

Context

Lattice has individual binaries (chat_metal, bench_decode_ab, quantize_q4, backfill_qwen3) but no unified user-facing CLI. To use lattice today, a consumer needs to know which binary does what and how to wire features/env vars.

Comparable tools have a single entry point:

  • ollama run qwen3.5:0.8b "hello"
  • llama-cli -m model.gguf -p "hello"
  • mlx_lm.generate --model Qwen/... --prompt "hello"

Goal

A single lattice binary with subcommands covering the common user flows.

Proposed surface (v1)

# Model management
lattice pull qwen3.5-0.8b              # download from HF to ~/.lattice/models
lattice list                            # list local models with sizes
lattice rm qwen3.5-0.8b                 # delete local model

# Inference
lattice chat qwen3.5-0.8b               # interactive REPL
lattice chat qwen3.5-0.8b -p "hello"    # one-shot
lattice complete qwen3.5-0.8b -p "Once" # raw completion (no chat template)
lattice embed bge-small-en -t "hello"   # embedding output

# Quantization
lattice quantize qwen3.5-0.8b --to q4        # safetensors output
lattice quantize qwen3.5-0.8b --to q4-quarot # QuaRot variant

# Server (depends on #92)
lattice serve --model qwen3.5-0.8b --port 8080

# Inspection
lattice info qwen3.5-0.8b                # architecture, params, dtype
lattice bench qwen3.5-0.8b               # decode tok/s, latency p50/p99

Implementation notes

  • Use clap derive macros (already in workspace deps)
  • Default model dir: ~/.lattice/models (env override: LATTICE_MODEL_CACHE)
  • Auto-pull on first use if model not local (with --no-pull to disable)
  • Per-subcommand --help with examples
  • Single binary in crates/inference/src/bin/lattice.rs (or new crates/cli crate)

Acceptance

  • lattice --help shows all subcommands
  • lattice chat qwen3.5-0.8b -p "hi" works end-to-end on a fresh machine (downloads, loads, generates)
  • Documented in README as the primary entry point
  • Existing binaries (chat_metal, bench_decode_ab) become aliases or get deprecated

Priority

P1 — gates real user adoption. Currently lattice is consumable only by Rust code authors who know the crate API.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions