feat: Heterogeneous Hardware Dispatch Plane#14
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR closes #5 implementing a dynamic, multi-platform Heterogeneous Hardware Dispatch Plane for Project ORCHID.
Instead of locking compile targets and dynamic telemetry strictly to x86-64 micro-architectures featuring AVX-512 extensions, this change expands the compilation pipeline to target ARM64 and Apple Silicon architectures. It also modernizes the dynamic hardware telemetry with feature detection for ARM64 SVE and NEON instruction sets.
Proposed Changes
1. Emitter Target Refactoring (
orchid/assembler.py)prefetcht0latency masking.v0-v31) with nativeprfm pldl1keepsoftware lookahead prefetching offsets..word 0x00201000/0x00201020directives for coprocessor startup/shutdown constraints).--targetsupportingx86_64,arm64, andapple_amx).platform.machine()andsys.platformto determine the native host target by default.2. Multi-Platform Dynamic Telemetry (
locality/fair_harness.c)<sys/auxv.h>and queries system auxiliary vector flags (getauxval(AT_HWCAP)) forHWCAP_SVEandHWCAP_ASIMDsupport._mm_prefetchcompiler intrinsics in the C scalar fallback with the architecture-independent__builtin_prefetchhelper. This ensures compilation succeeds across diverse compilers (GCC/Clang) on all targets.Verification Results
1. Timing Benchmarks (
./scripts/run_locality.sh)Verification passes on the native host (x86_64) using the scalar fallback path (which implements portable
__builtin_prefetchinstructions):2. Multi-Target Compilation Assertions
Compiling the specific target configurations runs successfully:
3. Go Scheduler Regression Stability
No concurrency regressions or thread-safety issues detected in the Go plane: