Skip to content

[Feature] Add level-aware mode enforcement to DistWorker #440

@hw-native-sys-bot

Description

@hw-native-sys-bot

Summary

DistWorker currently accepts an int level_ constructor parameter (3=L3, 4=L4, …) but stores it without enforcing any level-specific constraints or capabilities. Add level-aware mode that validates sub-worker types, enforces dispatch rules, and surfaces the level identity through the Python API.

Motivation / Use Case

The distributed runtime is designed as a recursive hierarchy — L3 dispatches to ChipWorker (L2) and SubWorker; L4 dispatches to DistWorker(level=3) and SubWorker; and so on. Today the level_ field is purely informational: nothing prevents an L4 node from being given a ChipWorker directly, or an L3 node from holding another L3 node as a CHIP sub-worker.

Concrete problems this causes:

  1. No validation at construction time — incorrect wiring (wrong sub-worker type for the level) is silently accepted and only fails at runtime.
  2. No capability query — Python code cannot ask "what worker types are valid at this level?" and must know the hierarchy out-of-band.
  3. Dispatch routing is type-agnosticWorkerType::CHIP is used for both L2 ChipWorker and lower-level DistWorker nodes; a level-aware routing table would make this unambiguous.

Proposed API / Behavior

// C++ — enforce valid sub-worker types per level
class DistWorker : public IWorker {
public:
    // level 3 → accepts ChipWorker (CHIP) + SubWorker (SUB)
    // level 4+ → accepts DistWorker (DIST) + SubWorker (SUB)
    void add_worker(WorkerType type, IWorker* worker);  // throws if invalid for level

    // Query what worker types are accepted at this level
    bool accepts_worker_type(WorkerType type) const;
};
# Python
dw = DistWorker(level=3)
dw.accepts_worker_type(WorkerType.CHIP)   # True
dw.accepts_worker_type(WorkerType.DIST)   # False (L3 does not host L3 sub-nodes)

dw4 = DistWorker(level=4)
dw4.accepts_worker_type(WorkerType.CHIP)  # False
dw4.accepts_worker_type(WorkerType.DIST)  # True

Level rules (proposed defaults):

  • L3: CHIP (ChipWorker / L2 device) + SUB (SubWorker fork/shm)
  • L4+: DIST (lower-level DistWorker) + SUB (SubWorker fork/shm)

Alternatives Considered

Leave validation entirely to the caller (current behaviour). Rejected because multi-level composition is error-prone and the misuse is only caught at dispatch time, not at construction time.

Additional Context

  • DistWorker introduced in PR for Phase 2 (feat/chip-worker branch)
  • src/common/distributed/dist_worker.{h,cpp}, python/bindings/dist_worker_bind.h
  • Related architectural context: .docs/PHASE2_HOST_WORKER.md, .docs/UNIFIED_RUNTIME_PLAN.md

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

Ready

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions