Skip to content

feat(cli): Support pulling and running sharded GGUF models #1893

@noahgift

Description

@noahgift

Description

Currently, apr pull and aprender-serve do not seamlessly support sharded .gguf files (e.g., model-00001-of-00002.gguf).
Unlike sharded SafeTensors, which have a centralized model.safetensors.index.json that the CLI parses, sharded GGUFs require explicit multi-part download and stitch/load logic.

Many modern 7B+ GGUFs (like Qwen2.5-7B-Instruct-GGUF at Q4_K_M and higher) are sharded on HuggingFace, preventing apr pull and apr run from resolving and running them automatically.

Acceptance Criteria

  • apr pull detects sharded GGUF assets on HuggingFace and downloads all parts.
  • aprender-serve can mmap and infer across a split GGUF file correctly without manual pre-stitching.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions