Description
Currently, apr pull and aprender-serve do not seamlessly support sharded .gguf files (e.g., model-00001-of-00002.gguf).
Unlike sharded SafeTensors, which have a centralized model.safetensors.index.json that the CLI parses, sharded GGUFs require explicit multi-part download and stitch/load logic.
Many modern 7B+ GGUFs (like Qwen2.5-7B-Instruct-GGUF at Q4_K_M and higher) are sharded on HuggingFace, preventing apr pull and apr run from resolving and running them automatically.
Acceptance Criteria
apr pull detects sharded GGUF assets on HuggingFace and downloads all parts.
aprender-serve can mmap and infer across a split GGUF file correctly without manual pre-stitching.
Description
Currently,
apr pullandaprender-servedo not seamlessly support sharded.gguffiles (e.g.,model-00001-of-00002.gguf).Unlike sharded SafeTensors, which have a centralized
model.safetensors.index.jsonthat the CLI parses, sharded GGUFs require explicit multi-part download and stitch/load logic.Many modern 7B+ GGUFs (like
Qwen2.5-7B-Instruct-GGUFatQ4_K_Mand higher) are sharded on HuggingFace, preventingapr pullandapr runfrom resolving and running them automatically.Acceptance Criteria
apr pulldetects sharded GGUF assets on HuggingFace and downloads all parts.aprender-servecan mmap and infer across a split GGUF file correctly without manual pre-stitching.