improve model debugging workflow by ethernet8023 · Pull Request #611 · PsycheFoundation/nousnet

ethernet8023 · 2026-03-02T23:23:19Z

adds the ability to run nix run .#train -- config config.toml on a regular psyche config toml, to test things locally before deploying to the network
adds a dump-config to run manager to dump a config for an existing rung out as a toml to be edited / played with.

samherring99 · 2026-03-13T00:12:37Z

+They're instantiated via `(LlamaForCausalLM|DeepseekForCausalLM)::from_pretrained` or `(PythonCausalLM::new|PythonDistributedCausalLM::new)`.

-There's alpha-level support for models written in Python. See the [Python](./python.md) docs for more information.
+We currently only support causal language models — to implement a new one, you can create a file similar to `llama_for_causal_lm` and implement your model, ensuring you provide a trait impl for `CausalLM` - or, preferrably, add your model to [our TorchTitan fork](https://github.com/nousResearch/torchtitan). See the [Python](./python.md) docs for more information.


add your model to our Torchtitan fork

what does this mean exactly? and is it torchtitan format or safetensors or something else?

it's for if you're adding a new model architecture - it's just "implement your model architecture in our torch titan fork". new model shapes in existing architectures don't need anything fancy like this.

ahhh ok I see ty

…o python model string also, type UnsupportedArchitecture more strictly

useful for testing combinations of featuresets to ensure compilation with all of them

- now you can `train` on a regular config toml! - moved train to its own package - modified train to work more like tp/dp from the main binary - updated docs to reflect changes - added a script to build a tiny version of any model architecture

which dumps a config in the exact toml shape required to feed it back in to setting a config!

there was no path that returned an error here

we try a few dirs to see if they are exec bfirst

rob-maron

This looks great Ari, I'm trying this out today. TY

rob-maron · 2026-03-19T13:48:02Z


+impl LLMArchitecture {
+    /// Separate from a display impl, since we actually match on these in the codebase
+    pub fn to_python_model_string(&self) -> String {
+        match self {
+            LLMArchitecture::HfLlama => "HfLlama",
+            LLMArchitecture::HfDeepseek => "HfDeepseek",
+            LLMArchitecture::HfAuto => "HfAuto",
+            LLMArchitecture::Torchtitan => "Torchtitan",
+        }
+        .to_string()
+    }
+}


ethernet8023 changed the title ~~Train refactor~~ improve model debuggingw orkflow Mar 2, 2026

ethernet8023 force-pushed the train-refactor branch 3 times, most recently from 628fbfb to 42b290b Compare March 3, 2026 19:23

ethernet8023 changed the title ~~improve model debuggingw orkflow~~ improve model debugging workflow Mar 3, 2026

ethernet8023 force-pushed the train-refactor branch 3 times, most recently from 969388d to 308eec8 Compare March 3, 2026 22:27

ethernet8023 force-pushed the main branch from 5fb2db9 to 859777b Compare March 4, 2026 18:02

ethernet8023 force-pushed the train-refactor branch 6 times, most recently from 066f766 to d74613b Compare March 11, 2026 18:25

ethernet8023 force-pushed the train-refactor branch from d74613b to cb70fb6 Compare March 11, 2026 19:31

ethernet8023 marked this pull request as ready for review March 11, 2026 19:39

ethernet8023 force-pushed the train-refactor branch 5 times, most recently from 984ef03 to f4475ce Compare March 12, 2026 14:45

samherring99 reviewed Mar 13, 2026

View reviewed changes

Comment thread tools/rust-tools/run-manager/src/commands/run/dump_config.rs

ethernet8023 enabled auto-merge March 18, 2026 22:06

ethernet8023 added 5 commits March 18, 2026 18:06

refactor: move Data Server Config type into data-provider crate

92f93c5

client: introduce a non-Display impl for LLMArchitecture to convert t…

5886929

…o python model string also, type UnsupportedArchitecture more strictly

fix: typo in run-manager readme

d125dd4

nix: add cargo-all-features to devShell

7549ad6

useful for testing combinations of featuresets to ensure compilation with all of them

ethernet8023 added 3 commits March 18, 2026 18:06

feat: add dump-config to run-manager

350afd8

which dumps a config in the exact toml shape required to feed it back in to setting a config!

refactor: remove unused Result from HttpDataProvider::new return type

ad5b5ad

there was no path that returned an error here

modeling: work around noexec tmp dirs for triton cache

2be4669

we try a few dirs to see if they are exec bfirst

ethernet8023 force-pushed the train-refactor branch from f4475ce to 2be4669 Compare March 18, 2026 22:06

rob-maron approved these changes Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve model debugging workflow#611

improve model debugging workflow#611
ethernet8023 wants to merge 8 commits intomainfrom
train-refactor

ethernet8023 commented Mar 2, 2026 •

edited

Loading

Uh oh!

samherring99 Mar 13, 2026

Uh oh!

ethernet8023 Mar 13, 2026

Uh oh!

samherring99 Mar 13, 2026

Uh oh!

Uh oh!

rob-maron left a comment

Uh oh!

rob-maron Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ethernet8023 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samherring99 Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

ethernet8023 Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

samherring99 Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rob-maron left a comment

Choose a reason for hiding this comment

Uh oh!

rob-maron Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ethernet8023 commented Mar 2, 2026 •

edited

Loading