AMD EPYC processors use a chiplet-based design with multiple CCDs (Core Complex Dies), each with its own memory controller. This creates a Non-Uniform Memory Access (NUMA) architecture where:
┌─────────────────────────────────────────────────────────────┐
│ AMD EPYC (e.g., 7763) │
├───────────────┬───────────────┬───────────────┬─────────────┤
│ CCD 0 │ CCD 1 │ CCD 2 │ CCD 3 │
│ (8 cores) │ (8 cores) │ (8 cores) │ (8 cores) │
│ ↓ │ ↓ │ ↓ │ ↓ │
│ Memory │ Memory │ Memory │ Memory │
│ Controller 0 │ Controller 1 │ Controller 2 │ Controller 3│
│ ↓ │ ↓ │ ↓ │ ↓ │
│ DIMM Slot A │ DIMM Slot B │ DIMM Slot C │ DIMM Slot D│
└───────────────┴───────────────┴───────────────┴─────────────┘
Example code where this is very visible:
// GetTypedBuildable runs on Thread A (CCD 0)
BuildableSubsystem->GetTypedBuildable<AFGBuildableResourceExtractor>(Extractors);
// Writes array data to memory on NUMA node 0
// Your endpoint code runs on Thread B (CCD 3)
for (AFGBuildableResourceExtractor* Extractor : Extractors) {
// Without memory barrier, Thread B may see:
// - Partially written array (corrupted size)
// - Pointer to memory that hasn't been fully initialized
// - Old cached version of the array
This can be reproduced by utilizing Hetzner Cloud, as their AMD servers do appear to use a AMD EPYC at least in HEL region which I tested.
AMD EPYC processors use a chiplet-based design with multiple CCDs (Core Complex Dies), each with its own memory controller. This creates a Non-Uniform Memory Access (NUMA) architecture where:
Example code where this is very visible:
This can be reproduced by utilizing Hetzner Cloud, as their AMD servers do appear to use a AMD EPYC at least in HEL region which I tested.