feat(cli): unified `lattice` CLI for users

## Context

Lattice has individual binaries (`chat_metal`, `bench_decode_ab`, `quantize_q4`, `backfill_qwen3`) but no unified user-facing CLI. To use lattice today, a consumer needs to know which binary does what and how to wire features/env vars.

Comparable tools have a single entry point:
- `ollama run qwen3.5:0.8b "hello"`
- `llama-cli -m model.gguf -p "hello"`
- `mlx_lm.generate --model Qwen/... --prompt "hello"`

## Goal

A single `lattice` binary with subcommands covering the common user flows.

## Proposed surface (v1)

```bash
# Model management
lattice pull qwen3.5-0.8b              # download from HF to ~/.lattice/models
lattice list                            # list local models with sizes
lattice rm qwen3.5-0.8b                 # delete local model

# Inference
lattice chat qwen3.5-0.8b               # interactive REPL
lattice chat qwen3.5-0.8b -p "hello"    # one-shot
lattice complete qwen3.5-0.8b -p "Once" # raw completion (no chat template)
lattice embed bge-small-en -t "hello"   # embedding output

# Quantization
lattice quantize qwen3.5-0.8b --to q4        # safetensors output
lattice quantize qwen3.5-0.8b --to q4-quarot # QuaRot variant

# Server (depends on #92)
lattice serve --model qwen3.5-0.8b --port 8080

# Inspection
lattice info qwen3.5-0.8b                # architecture, params, dtype
lattice bench qwen3.5-0.8b               # decode tok/s, latency p50/p99
```

## Implementation notes

- Use `clap` derive macros (already in workspace deps)
- Default model dir: `~/.lattice/models` (env override: `LATTICE_MODEL_CACHE`)
- Auto-pull on first use if model not local (with `--no-pull` to disable)
- Per-subcommand `--help` with examples
- Single binary in `crates/inference/src/bin/lattice.rs` (or new `crates/cli` crate)

## Acceptance

- `lattice --help` shows all subcommands
- `lattice chat qwen3.5-0.8b -p "hi"` works end-to-end on a fresh machine (downloads, loads, generates)
- Documented in README as the primary entry point
- Existing binaries (`chat_metal`, `bench_decode_ab`) become aliases or get deprecated

## Priority

P1 — gates real user adoption. Currently lattice is consumable only by Rust code authors who know the crate API.

## Related

- #92 (daemon) — `lattice serve` is the CLI front-end for it
- #93 (OpenAI API) — server exposes that protocol

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): unified `lattice` CLI for users #91

Context

Goal

Proposed surface (v1)

Implementation notes

Acceptance

Priority

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(cli): unified lattice CLI for users #91

Description

Context

Goal

Proposed surface (v1)

Implementation notes

Acceptance

Priority

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

feat(cli): unified `lattice` CLI for users #91