[INFERENCE] Model Assignment by samherring99 · Pull Request #601 · PsycheFoundation/nousnet

samherring99 · 2026-02-27T19:33:48Z

This PR introduces changes needed for specifying specific numbers of models to load on available inference nodes.

The two endpoints for admin access through the gateway are now:

/admin/assign-models - this endpoint was originally /admin/load-model, which would broadcast a LoadModel message to all inference nodes in the network. This new endpoint re-uses the LoadModel message type, but instead broadcasts to a tracked list of specific node IDs to forward the message to.
/admin/assignments - this endpoint displays the status of nodes in the network, which in essence, says which node IDs are hosting a given model, and whether it's loading, active, or idle. Idle means no model is loaded.

Changes to gateway-node.rs:

Model assignments are written to disk for the gateway nodes, this is something we can change over time but for now was the easiest way to persist assignments.
Both load_assignments and save_assignments methods are introduced to leverage this.
GatewayState also now has a field model_assignments which is a HashMap to store assignments in memory.
LoadModelSource is being renamed to ModelSourceType as a part of [INFERENCE] Push-based model loading #569.
LoadModelRequest is also being renamed to AssignModelsRequest, which is a vector of ModelAssignmentSpec type.
These code changes route requests to available nodes by which model they're serving.
handle_load_model is being renamed to handle_assign_models.
We introduce handle_get_assignments to parse assignments for the /assignments endpoint.
We now check for assignment drift every 60s with a reconciliation timer.

Changes to main.rs:

Inference nodes now confirm if a given LoadModel request is directed for them or not, since it is broadcasted through gossip.

Testing

nix develop .#dev-python

just test-model-assignment

Requests

curl http://127.0.0.1:8000/admin/assignments | jq

curl -X POST http://127.0.0.1:8000/admin/assign-models -H 'Content-Type: application/json' -d '{"assignments": [{"model_name": "gpt2", "num_nodes": 2, "source_type": "huggingface"}]}'"

curl -X POST http://127.0.0.1:8000/admin/assign-models -H 'Content-Type: application/json' -d '{"assignments": [{"model_name": "meta-llama/Llama-3.2-1B-Instruct", "num_nodes": 1, "source_type": "huggingface"}]}'"

Results from `/assignments` and inference with both models

dsocolobsky · 2026-03-02T18:23:42Z

I think the package might not be building? or is it an error on my side?

$ cargo build -p psyche-inference-node

error[E0425]: cannot find value `node` in this scope
   --> architectures/inference-only/inference-node/src/bin/gateway-node.rs:276:9
    |
276 |         node.model_name.as_deref().unwrap_or("unknown"),
    |         ^^^^ help: a local variable with a similar name exists: `nodes`

error[E0027]: pattern does not mention field `target_node_id`
   --> architectures/inference-only/inference-node/src/bin/test-network.rs:158:29
    |
158 | ...                   InferenceGossipMessage::LoadModel { model_name, model_source } => {
    |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ missing field `target_node_id`
    |
help: include the missing field in the pattern
    |
158 |                             InferenceGossipMessage::LoadModel { model_name, model_source, target_node_id } => {
    |                                                                                         ++++++++++++++++
help: if you don't care about this missing field, you can explicitly ignore it
    |
158 |                             InferenceGossipMessage::LoadModel { model_name, model_source, target_node_id: _ } => {
    |                                                                                         +++++++++++++++++++
help: or always ignore missing fields here
    |
158 |                             InferenceGossipMessage::LoadModel { model_name, model_source, .. } => {
    |                                                                                         ++++

For more information about this error, try `rustc --explain E0027`.
error: could not compile `psyche-inference-node` (bin "test-network") due to 1 previous error
warning: build failed, waiting for other jobs to finish...
For more information about this error, try `rustc --explain E0425`.
error: could not compile `psyche-inference-node` (bin "gateway-node") due to 1 previous error

I've never tested the inference node so perhaps it's something on my side but it looks like you might've forgotten to push or update some lines of code.

samherring99 · 2026-03-02T18:33:20Z

I think the package might not be building? or is it an error on my side?

$ cargo build -p psyche-inference-node

error[E0425]: cannot find value `node` in this scope
   --> architectures/inference-only/inference-node/src/bin/gateway-node.rs:276:9
    |
276 |         node.model_name.as_deref().unwrap_or("unknown"),
    |         ^^^^ help: a local variable with a similar name exists: `nodes`

error[E0027]: pattern does not mention field `target_node_id`
   --> architectures/inference-only/inference-node/src/bin/test-network.rs:158:29
    |
158 | ...                   InferenceGossipMessage::LoadModel { model_name, model_source } => {
    |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ missing field `target_node_id`
    |
help: include the missing field in the pattern
    |
158 |                             InferenceGossipMessage::LoadModel { model_name, model_source, target_node_id } => {
    |                                                                                         ++++++++++++++++
help: if you don't care about this missing field, you can explicitly ignore it
    |
158 |                             InferenceGossipMessage::LoadModel { model_name, model_source, target_node_id: _ } => {
    |                                                                                         +++++++++++++++++++
help: or always ignore missing fields here
    |
158 |                             InferenceGossipMessage::LoadModel { model_name, model_source, .. } => {
    |                                                                                         ++++

For more information about this error, try `rustc --explain E0027`.
error: could not compile `psyche-inference-node` (bin "test-network") due to 1 previous error
warning: build failed, waiting for other jobs to finish...
For more information about this error, try `rustc --explain E0425`.
error: could not compile `psyche-inference-node` (bin "gateway-node") due to 1 previous error

I've never tested the inference node so perhaps it's something on my side but it looks like you might've forgotten to push or update some lines of code.

try nix build .#psyche-inference-node instead, let me confirm this works too. we've since moved from cargo to nix to include vllm dependency 🙂 . also, as a note, make sure you're in the nix python devshell first

dsocolobsky · 2026-03-02T18:47:39Z

try nix build .#psyche-inference-node instead, let me confirm this works too. we've since moved from cargo to nix to include vllm dependency 🙂 . also, as a note, make sure you're in the nix python devshell first

oops, seems like I was missing a few of the latest commits in the branch and the git pull didn't work the first time, after resetting to origin it's working now! Thanks!

dsocolobsky · 2026-03-02T20:23:28Z

ok I tested a few of the curl commands and it seemed to be working fine, although I'm not familiar with the inference nodes.

samherring99 · 2026-03-02T21:03:11Z

ok I tested a few of the curl commands and it seemed to be working fine, although I'm not familiar with the inference nodes.

you were able to load gpt2 and llama-1b on different nodes and inference with both of them? if so, awesome!! out of curiosity, was this tested on the H200 nodes?

IAvecilla · 2026-03-09T14:27:59Z

justfile

-# Test dynamic model loading with multiple nodes (gateway + 2 inference nodes)
-test-model-loading initial_model="gpt2":
+# Test model assignment system with multiple nodes and models (gateway + 3 inference nodes)
+test-model-assignment:


Maybe it's a good time to move all this logic to a new script and just call the script from here, or only do the setup steps here and move the core logic somewhere else, just to avoid the Justfile getting too big and hard to read

Yeah I've been thinking about this. My general plan later is to clean up all the justfile commands with 'test' in the name (the ones that are used just for testing / demo purposes) and pretty much just leaving the inference-node, inference-stack, and gateway-node ones.

Let me know if you think that should be done sooner, but I definitely agree I'd rather cut down complexity here in the end state.

IAvecilla · 2026-03-09T15:11:31Z

architectures/inference-only/inference-node/src/bin/gateway-node.rs


                        for (node_id, age) in stale_nodes {
                            warn!("Removing stale node {} (no heartbeat for {:?})", node_id.fmt_short(), age);
                            nodes.remove(&node_id);


Shouldn't we also remove the assignments for the stale node_id here? I didn't see anywhere else where we remove the assignments when a node disconnects, so I think this is the best place to add it

Ah, yeah, the removal change was done concurrently with these and I forgot to add this in when I rebased. Will do, and great callout :-)

IAvecilla · 2026-03-09T15:13:55Z

architectures/inference-only/inference-node/src/bin/gateway-node.rs

+struct AssignmentInfo {
+    node_id: String,
+    model_name: String,
+    status: String, // "loading", "loaded", "idle", "offline"


Can we use an enum here with all the different variants?

IAvecilla · 2026-03-09T15:47:11Z

architectures/inference-only/inference-node/src/bin/gateway-node.rs

 struct GatewayState {
    available_nodes: RwLock<HashMap<EndpointId, InferenceNodeInfo>>,
    pending_requests: RwLock<HashMap<String, mpsc::Sender<InferenceResponse>>>,
+    model_assignments: RwLock<HashMap<EndpointId, String>>, // node_id -> assigned model name


I wonder if we could turn all the gateway state into a new gateway actor. I think this is starting to accumulate a lot of resource-locking logic that could become a potential issue at some point. We could create an actor that owns all these resources and communicate with it via channels, which would avoid any race conditions or deadlocks in the future. It might just be that I'm not a big fan of heavy locking and relying on that, but maybe this is just for future reference and not something to tackle now, just worth noting

Yeah I agree I don't love the heavy locking, and am not sure what an alternative looks like. We should discuss later on how this could best be optimized, will leave as a note for now!

…ests, inference node changes for model and checkpoint reloading, gateway node changes to handle LoadModel messages, adding LoadModel request broadcasting to gateway node, updating test network file with the protocol changes, adding justfile command for testing, and fixing model routing for idle notes and add delay for memory free for reload

…sary async calls, and removing manual drops, gateway node changes to allow for model assignment by node and tracking, Justfile updates for model assignment testing

… display full node status and updating justfile

…node id from assignments when we have a stale node

samherring99 force-pushed the checkpoint_tracking branch 3 times, most recently from 612de9e to 7b8cd67 Compare March 2, 2026 17:45

samherring99 marked this pull request as ready for review March 2, 2026 17:45

samherring99 changed the title ~~Checkpoint Tracking and Assignment~~ [INFERENCE] Checkpoint Tracking and Assignment Mar 2, 2026

samherring99 force-pushed the checkpoint_tracking branch 5 times, most recently from a5742ca to 72626d1 Compare March 6, 2026 16:06

samherring99 changed the title ~~[INFERENCE] Checkpoint Tracking and Assignment~~ [INFERENCE] Model Assignment Mar 6, 2026

samherring99 force-pushed the checkpoint_tracking branch from 72626d1 to 7cc28e5 Compare March 9, 2026 13:15

IAvecilla reviewed Mar 9, 2026

View reviewed changes

samherring99 force-pushed the checkpoint_tracking branch 3 times, most recently from fe187ec to 7097950 Compare March 11, 2026 17:09

samherring99 added 5 commits March 12, 2026 13:55

Addressing PR feedback: adding enums for source type, removing uneces…

e4b6797

…sary async calls, and removing manual drops, gateway node changes to allow for model assignment by node and tracking, Justfile updates for model assignment testing

Adding target by node id for LoadModel messages, updating endpoint to…

ccd94ef

… display full node status and updating justfile

Formatting

0757783

Changing AssignmentInfo status to AssignmentStatus enum and removing …

cb710ed

…node id from assignments when we have a stale node

samherring99 force-pushed the checkpoint_tracking branch from 7097950 to cb710ed Compare March 12, 2026 20:55

Conversation

samherring99 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Requests

Results from /assignments and inference with both models

Uh oh!

dsocolobsky commented Mar 2, 2026

Uh oh!

samherring99 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsocolobsky commented Mar 2, 2026

Uh oh!

dsocolobsky commented Mar 2, 2026

Uh oh!

samherring99 commented Mar 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

samherring99 commented Feb 27, 2026 •

edited

Loading

Results from `/assignments` and inference with both models

samherring99 commented Mar 2, 2026 •

edited

Loading