Conversation
612de9e to
7b8cd67
Compare
|
I think the package might not be building? or is it an error on my side? $ cargo build -p psyche-inference-node
error[E0425]: cannot find value `node` in this scope
--> architectures/inference-only/inference-node/src/bin/gateway-node.rs:276:9
|
276 | node.model_name.as_deref().unwrap_or("unknown"),
| ^^^^ help: a local variable with a similar name exists: `nodes`
error[E0027]: pattern does not mention field `target_node_id`
--> architectures/inference-only/inference-node/src/bin/test-network.rs:158:29
|
158 | ... InferenceGossipMessage::LoadModel { model_name, model_source } => {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ missing field `target_node_id`
|
help: include the missing field in the pattern
|
158 | InferenceGossipMessage::LoadModel { model_name, model_source, target_node_id } => {
| ++++++++++++++++
help: if you don't care about this missing field, you can explicitly ignore it
|
158 | InferenceGossipMessage::LoadModel { model_name, model_source, target_node_id: _ } => {
| +++++++++++++++++++
help: or always ignore missing fields here
|
158 | InferenceGossipMessage::LoadModel { model_name, model_source, .. } => {
| ++++
For more information about this error, try `rustc --explain E0027`.
error: could not compile `psyche-inference-node` (bin "test-network") due to 1 previous error
warning: build failed, waiting for other jobs to finish...
For more information about this error, try `rustc --explain E0425`.
error: could not compile `psyche-inference-node` (bin "gateway-node") due to 1 previous errorI've never tested the inference node so perhaps it's something on my side but it looks like you might've forgotten to push or update some lines of code. |
try |
oops, seems like I was missing a few of the latest commits in the branch and the git pull didn't work the first time, after resetting to origin it's working now! Thanks! |
|
ok I tested a few of the curl commands and it seemed to be working fine, although I'm not familiar with the inference nodes. |
you were able to load gpt2 and llama-1b on different nodes and inference with both of them? if so, awesome!! out of curiosity, was this tested on the H200 nodes? |
a5742ca to
72626d1
Compare
72626d1 to
7cc28e5
Compare
| # Test dynamic model loading with multiple nodes (gateway + 2 inference nodes) | ||
| test-model-loading initial_model="gpt2": | ||
| # Test model assignment system with multiple nodes and models (gateway + 3 inference nodes) | ||
| test-model-assignment: |
There was a problem hiding this comment.
Maybe it's a good time to move all this logic to a new script and just call the script from here, or only do the setup steps here and move the core logic somewhere else, just to avoid the Justfile getting too big and hard to read
There was a problem hiding this comment.
Yeah I've been thinking about this. My general plan later is to clean up all the justfile commands with 'test' in the name (the ones that are used just for testing / demo purposes) and pretty much just leaving the inference-node, inference-stack, and gateway-node ones.
Let me know if you think that should be done sooner, but I definitely agree I'd rather cut down complexity here in the end state.
|
|
||
| for (node_id, age) in stale_nodes { | ||
| warn!("Removing stale node {} (no heartbeat for {:?})", node_id.fmt_short(), age); | ||
| nodes.remove(&node_id); |
There was a problem hiding this comment.
Shouldn't we also remove the assignments for the stale node_id here? I didn't see anywhere else where we remove the assignments when a node disconnects, so I think this is the best place to add it
There was a problem hiding this comment.
Ah, yeah, the removal change was done concurrently with these and I forgot to add this in when I rebased. Will do, and great callout :-)
There was a problem hiding this comment.
Ah, yeah, the removal change was done concurrently with these and I forgot to add this in when I rebased. Will do, and great callout :-)
| struct AssignmentInfo { | ||
| node_id: String, | ||
| model_name: String, | ||
| status: String, // "loading", "loaded", "idle", "offline" |
There was a problem hiding this comment.
Can we use an enum here with all the different variants?
| struct GatewayState { | ||
| available_nodes: RwLock<HashMap<EndpointId, InferenceNodeInfo>>, | ||
| pending_requests: RwLock<HashMap<String, mpsc::Sender<InferenceResponse>>>, | ||
| model_assignments: RwLock<HashMap<EndpointId, String>>, // node_id -> assigned model name |
There was a problem hiding this comment.
I wonder if we could turn all the gateway state into a new gateway actor. I think this is starting to accumulate a lot of resource-locking logic that could become a potential issue at some point. We could create an actor that owns all these resources and communicate with it via channels, which would avoid any race conditions or deadlocks in the future. It might just be that I'm not a big fan of heavy locking and relying on that, but maybe this is just for future reference and not something to tackle now, just worth noting
There was a problem hiding this comment.
Yeah I agree I don't love the heavy locking, and am not sure what an alternative looks like. We should discuss later on how this could best be optimized, will leave as a note for now!
fe187ec to
7097950
Compare
…ests, inference node changes for model and checkpoint reloading, gateway node changes to handle LoadModel messages, adding LoadModel request broadcasting to gateway node, updating test network file with the protocol changes, adding justfile command for testing, and fixing model routing for idle notes and add delay for memory free for reload
…sary async calls, and removing manual drops, gateway node changes to allow for model assignment by node and tracking, Justfile updates for model assignment testing
… display full node status and updating justfile
…node id from assignments when we have a stale node
7097950 to
cb710ed
Compare
This PR introduces changes needed for specifying specific numbers of models to load on available inference nodes.
The two endpoints for admin access through the gateway are now:
/admin/assign-models- this endpoint was originally/admin/load-model, which would broadcast aLoadModelmessage to all inference nodes in the network. This new endpoint re-uses theLoadModelmessage type, but instead broadcasts to a tracked list of specific node IDs to forward the message to./admin/assignments- this endpoint displays the status of nodes in the network, which in essence, says which node IDs are hosting a given model, and whether it's loading, active, or idle. Idle means no model is loaded.Changes to
gateway-node.rs:load_assignmentsandsave_assignmentsmethods are introduced to leverage this.GatewayStatealso now has a fieldmodel_assignmentswhich is a HashMap to store assignments in memory.LoadModelSourceis being renamed toModelSourceTypeas a part of [INFERENCE] Push-based model loading #569.LoadModelRequestis also being renamed toAssignModelsRequest, which is a vector ofModelAssignmentSpectype.handle_load_modelis being renamed tohandle_assign_models.handle_get_assignmentsto parse assignments for the/assignmentsendpoint.Changes to
main.rs:Testing
Requests
Results from
/assignmentsand inference with both models