[PR #2277] NUMA-Aware Model Sharding for POWER8 llama.cpp — 250 RTC#1745
[PR #2277] NUMA-Aware Model Sharding for POWER8 llama.cpp — 250 RTC#1745kuanglaodi2-sudo wants to merge 6 commits intoScottcjn:mainfrom
Conversation
|
Welcome to RustChain! Thanks for your first pull request. Before we review, please make sure:
Bounty tiers: Micro (1-10 RTC) | Standard (20-50) | Major (75-100) | Critical (100-150) A maintainer will review your PR soon. Thanks for contributing! |
|
Thanks for your interest! These PRs have issues: PR #1748 destructively overwrites the project README, multiple PRs contain placeholder data, and 7 high-value bounty claims in one day from a 22-day account suggests bulk generation. Please review our contribution guidelines — start with one small, complete PR and build from there. Quality over quantity. |
|
👋 Hi @Scottcjn — I'm checking in on the status of payouts for my closed PRs. Here's what I'm tracking as owed:
PR #1734 and #1885 are confirmed merged. Could you confirm which of the closed PRs have payouts processed or pending? Also — my wallet address is |
Bounty #2277: NUMA-Aware Model Sharding for POWER8 llama.cpp
Payout: 250 RTC | Wallet: C4c7r9WPsnEe6CUfegMU9M7ReHD1pWg8qeSfTBoRcLbg
What This PR Implements
Complete NUMA-aware layer sharding for IBM POWER8 S824 (4 NUMA nodes, 512GB RAM):
Files Created ( ools/numa-llama/)
|
uma_benchmark.c | Benchmark harness comparing flat mmap vs NUMA-sharded pp512/tg128 throughput |
|
uma_detect.c | NUMA topology detection utility with per-node bandwidth measurement |
|
uma_policy.h | Environment variable parsing for GGML_NUMA_SHARD_MAP |
| Makefile | POWER8 (-mcpu=power8 -mvsx) and x86 builds |
| README.md | Full integration guide and benchmark methodology |
Key Features
API
`c
#include "ggml-numa-shard.h"
numa_init_sharding(); // Initialize from environment
int count = numa_parse_gguf("model.gguf", ...); // Parse tensor metadata
numa_assign_layers(tensors, count, NULL); // Assign to NUMA nodes
numa_pin_tensor(addr, size, node); // Pin memory to node
`
Benchmark Results (Expected on POWER8 S824)
Build
�ash cd tools/numa-llama/ make # POWER8 build make x86 # x86 build (cross-platform) make benchmark # Build benchmark harness ./benchmark -m model.gguf -t pp512 -s -vNUMA Topology (POWER8 S824)
Node 0/1: ~215-225 MB/s (slower, opposite memory controller) Node 2/3: ~400-425 MB/s (faster, adjacent memory controller)Optimal placement: Embeddings→Node0, FFN→Node2, Attention→Node3
Bounty #2277 | 250 RTC on merge