Open
Conversation
628fbfb to
42b290b
Compare
969388d to
308eec8
Compare
066f766 to
d74613b
Compare
d74613b to
cb70fb6
Compare
984ef03 to
f4475ce
Compare
| They're instantiated via `(LlamaForCausalLM|DeepseekForCausalLM)::from_pretrained` or `(PythonCausalLM::new|PythonDistributedCausalLM::new)`. | ||
|
|
||
| There's alpha-level support for models written in Python. See the [Python](./python.md) docs for more information. | ||
| We currently only support causal language models — to implement a new one, you can create a file similar to `llama_for_causal_lm` and implement your model, ensuring you provide a trait impl for `CausalLM` - or, preferrably, add your model to [our TorchTitan fork](https://github.com/nousResearch/torchtitan). See the [Python](./python.md) docs for more information. |
Collaborator
There was a problem hiding this comment.
add your model to our Torchtitan fork
what does this mean exactly? and is it torchtitan format or safetensors or something else?
Contributor
Author
There was a problem hiding this comment.
it's for if you're adding a new model architecture - it's just "implement your model architecture in our torch titan fork". new model shapes in existing architectures don't need anything fancy like this.
…o python model string also, type UnsupportedArchitecture more strictly
useful for testing combinations of featuresets to ensure compilation with all of them
- now you can `train` on a regular config toml! - moved train to its own package - modified train to work more like tp/dp from the main binary - updated docs to reflect changes - added a script to build a tiny version of any model architecture
which dumps a config in the exact toml shape required to feed it back in to setting a config!
there was no path that returned an error here
we try a few dirs to see if they are exec bfirst
f4475ce to
2be4669
Compare
rob-maron
approved these changes
Mar 19, 2026
Collaborator
rob-maron
left a comment
There was a problem hiding this comment.
This looks great Ari, I'm trying this out today. TY
Comment on lines
45
to
+57
|
|
||
| impl LLMArchitecture { | ||
| /// Separate from a display impl, since we actually match on these in the codebase | ||
| pub fn to_python_model_string(&self) -> String { | ||
| match self { | ||
| LLMArchitecture::HfLlama => "HfLlama", | ||
| LLMArchitecture::HfDeepseek => "HfDeepseek", | ||
| LLMArchitecture::HfAuto => "HfAuto", | ||
| LLMArchitecture::Torchtitan => "Torchtitan", | ||
| } | ||
| .to_string() | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
nix run .#train -- config config.tomlon a regular psyche config toml, to test things locally before deploying to the networkdump-configto run manager to dump a config for an existing rung out as a toml to be edited / played with.