Skip to content

[INFERENCE] Model Assignment#601

Open
samherring99 wants to merge 5 commits intomainfrom
checkpoint_tracking
Open

[INFERENCE] Model Assignment#601
samherring99 wants to merge 5 commits intomainfrom
checkpoint_tracking

Conversation

@samherring99
Copy link
Copy Markdown
Collaborator

@samherring99 samherring99 commented Feb 27, 2026

This PR introduces changes needed for specifying specific numbers of models to load on available inference nodes.

The two endpoints for admin access through the gateway are now:

  • /admin/assign-models - this endpoint was originally /admin/load-model, which would broadcast a LoadModel message to all inference nodes in the network. This new endpoint re-uses the LoadModel message type, but instead broadcasts to a tracked list of specific node IDs to forward the message to.

  • /admin/assignments - this endpoint displays the status of nodes in the network, which in essence, says which node IDs are hosting a given model, and whether it's loading, active, or idle. Idle means no model is loaded.

Changes to gateway-node.rs:

  • Model assignments are written to disk for the gateway nodes, this is something we can change over time but for now was the easiest way to persist assignments.
  • Both load_assignments and save_assignments methods are introduced to leverage this.
  • GatewayState also now has a field model_assignments which is a HashMap to store assignments in memory.
  • LoadModelSource is being renamed to ModelSourceType as a part of [INFERENCE] Push-based model loading #569.
  • LoadModelRequest is also being renamed to AssignModelsRequest, which is a vector of ModelAssignmentSpec type.
  • These code changes route requests to available nodes by which model they're serving.
  • handle_load_model is being renamed to handle_assign_models.
  • We introduce handle_get_assignments to parse assignments for the /assignments endpoint.
  • We now check for assignment drift every 60s with a reconciliation timer.

Changes to main.rs:

  • Inference nodes now confirm if a given LoadModel request is directed for them or not, since it is broadcasted through gossip.

Testing

nix develop .#dev-python

just test-model-assignment

Requests

curl http://127.0.0.1:8000/admin/assignments | jq
curl -X POST http://127.0.0.1:8000/admin/assign-models -H 'Content-Type: application/json' -d '{"assignments": [{"model_name": "gpt2", "num_nodes": 2, "source_type": "huggingface"}]}'"

curl -X POST http://127.0.0.1:8000/admin/assign-models -H 'Content-Type: application/json' -d '{"assignments": [{"model_name": "meta-llama/Llama-3.2-1B-Instruct", "num_nodes": 1, "source_type": "huggingface"}]}'"

Results from /assignments and inference with both models

Screenshot 2026-02-19 at 12 51 45 PM

@samherring99 samherring99 force-pushed the checkpoint_tracking branch 3 times, most recently from 612de9e to 7b8cd67 Compare March 2, 2026 17:45
@samherring99 samherring99 marked this pull request as ready for review March 2, 2026 17:45
@samherring99 samherring99 changed the title Checkpoint Tracking and Assignment [INFERENCE] Checkpoint Tracking and Assignment Mar 2, 2026
@dsocolobsky
Copy link
Copy Markdown
Contributor

I think the package might not be building? or is it an error on my side?

$ cargo build -p psyche-inference-node

error[E0425]: cannot find value `node` in this scope
   --> architectures/inference-only/inference-node/src/bin/gateway-node.rs:276:9
    |
276 |         node.model_name.as_deref().unwrap_or("unknown"),
    |         ^^^^ help: a local variable with a similar name exists: `nodes`

error[E0027]: pattern does not mention field `target_node_id`
   --> architectures/inference-only/inference-node/src/bin/test-network.rs:158:29
    |
158 | ...                   InferenceGossipMessage::LoadModel { model_name, model_source } => {
    |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ missing field `target_node_id`
    |
help: include the missing field in the pattern
    |
158 |                             InferenceGossipMessage::LoadModel { model_name, model_source, target_node_id } => {
    |                                                                                         ++++++++++++++++
help: if you don't care about this missing field, you can explicitly ignore it
    |
158 |                             InferenceGossipMessage::LoadModel { model_name, model_source, target_node_id: _ } => {
    |                                                                                         +++++++++++++++++++
help: or always ignore missing fields here
    |
158 |                             InferenceGossipMessage::LoadModel { model_name, model_source, .. } => {
    |                                                                                         ++++

For more information about this error, try `rustc --explain E0027`.
error: could not compile `psyche-inference-node` (bin "test-network") due to 1 previous error
warning: build failed, waiting for other jobs to finish...
For more information about this error, try `rustc --explain E0425`.
error: could not compile `psyche-inference-node` (bin "gateway-node") due to 1 previous error

I've never tested the inference node so perhaps it's something on my side but it looks like you might've forgotten to push or update some lines of code.

@samherring99
Copy link
Copy Markdown
Collaborator Author

samherring99 commented Mar 2, 2026

I think the package might not be building? or is it an error on my side?

$ cargo build -p psyche-inference-node

error[E0425]: cannot find value `node` in this scope
   --> architectures/inference-only/inference-node/src/bin/gateway-node.rs:276:9
    |
276 |         node.model_name.as_deref().unwrap_or("unknown"),
    |         ^^^^ help: a local variable with a similar name exists: `nodes`

error[E0027]: pattern does not mention field `target_node_id`
   --> architectures/inference-only/inference-node/src/bin/test-network.rs:158:29
    |
158 | ...                   InferenceGossipMessage::LoadModel { model_name, model_source } => {
    |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ missing field `target_node_id`
    |
help: include the missing field in the pattern
    |
158 |                             InferenceGossipMessage::LoadModel { model_name, model_source, target_node_id } => {
    |                                                                                         ++++++++++++++++
help: if you don't care about this missing field, you can explicitly ignore it
    |
158 |                             InferenceGossipMessage::LoadModel { model_name, model_source, target_node_id: _ } => {
    |                                                                                         +++++++++++++++++++
help: or always ignore missing fields here
    |
158 |                             InferenceGossipMessage::LoadModel { model_name, model_source, .. } => {
    |                                                                                         ++++

For more information about this error, try `rustc --explain E0027`.
error: could not compile `psyche-inference-node` (bin "test-network") due to 1 previous error
warning: build failed, waiting for other jobs to finish...
For more information about this error, try `rustc --explain E0425`.
error: could not compile `psyche-inference-node` (bin "gateway-node") due to 1 previous error

I've never tested the inference node so perhaps it's something on my side but it looks like you might've forgotten to push or update some lines of code.

try nix build .#psyche-inference-node instead, let me confirm this works too. we've since moved from cargo to nix to include vllm dependency 🙂 . also, as a note, make sure you're in the nix python devshell first

@dsocolobsky
Copy link
Copy Markdown
Contributor

try nix build .#psyche-inference-node instead, let me confirm this works too. we've since moved from cargo to nix to include vllm dependency 🙂 . also, as a note, make sure you're in the nix python devshell first

oops, seems like I was missing a few of the latest commits in the branch and the git pull didn't work the first time, after resetting to origin it's working now! Thanks!

@dsocolobsky
Copy link
Copy Markdown
Contributor

ok I tested a few of the curl commands and it seemed to be working fine, although I'm not familiar with the inference nodes.

@samherring99
Copy link
Copy Markdown
Collaborator Author

ok I tested a few of the curl commands and it seemed to be working fine, although I'm not familiar with the inference nodes.

you were able to load gpt2 and llama-1b on different nodes and inference with both of them? if so, awesome!! out of curiosity, was this tested on the H200 nodes?

@samherring99 samherring99 force-pushed the checkpoint_tracking branch 5 times, most recently from a5742ca to 72626d1 Compare March 6, 2026 16:06
@samherring99 samherring99 changed the title [INFERENCE] Checkpoint Tracking and Assignment [INFERENCE] Model Assignment Mar 6, 2026
@samherring99 samherring99 force-pushed the checkpoint_tracking branch from 72626d1 to 7cc28e5 Compare March 9, 2026 13:15
# Test dynamic model loading with multiple nodes (gateway + 2 inference nodes)
test-model-loading initial_model="gpt2":
# Test model assignment system with multiple nodes and models (gateway + 3 inference nodes)
test-model-assignment:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's a good time to move all this logic to a new script and just call the script from here, or only do the setup steps here and move the core logic somewhere else, just to avoid the Justfile getting too big and hard to read

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've been thinking about this. My general plan later is to clean up all the justfile commands with 'test' in the name (the ones that are used just for testing / demo purposes) and pretty much just leaving the inference-node, inference-stack, and gateway-node ones.

Let me know if you think that should be done sooner, but I definitely agree I'd rather cut down complexity here in the end state.


for (node_id, age) in stale_nodes {
warn!("Removing stale node {} (no heartbeat for {:?})", node_id.fmt_short(), age);
nodes.remove(&node_id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also remove the assignments for the stale node_id here? I didn't see anywhere else where we remove the assignments when a node disconnects, so I think this is the best place to add it

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, the removal change was done concurrently with these and I forgot to add this in when I rebased. Will do, and great callout :-)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, the removal change was done concurrently with these and I forgot to add this in when I rebased. Will do, and great callout :-)

struct AssignmentInfo {
node_id: String,
model_name: String,
status: String, // "loading", "loaded", "idle", "offline"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use an enum here with all the different variants?

Comment on lines 99 to +102
struct GatewayState {
available_nodes: RwLock<HashMap<EndpointId, InferenceNodeInfo>>,
pending_requests: RwLock<HashMap<String, mpsc::Sender<InferenceResponse>>>,
model_assignments: RwLock<HashMap<EndpointId, String>>, // node_id -> assigned model name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could turn all the gateway state into a new gateway actor. I think this is starting to accumulate a lot of resource-locking logic that could become a potential issue at some point. We could create an actor that owns all these resources and communicate with it via channels, which would avoid any race conditions or deadlocks in the future. It might just be that I'm not a big fan of heavy locking and relying on that, but maybe this is just for future reference and not something to tackle now, just worth noting

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree I don't love the heavy locking, and am not sure what an alternative looks like. We should discuss later on how this could best be optimized, will leave as a note for now!

@samherring99 samherring99 force-pushed the checkpoint_tracking branch 3 times, most recently from fe187ec to 7097950 Compare March 11, 2026 17:09
…ests, inference node changes for model and checkpoint reloading, gateway node changes to handle LoadModel messages, adding LoadModel request broadcasting to gateway node, updating test network file with the protocol changes, adding justfile command for testing, and fixing model routing for idle notes and add delay for memory free for reload
…sary async calls, and removing manual drops, gateway node changes to allow for model assignment by node and tracking, Justfile updates for model assignment testing
… display full node status and updating justfile
…node id from assignments when we have a stale node
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants