Hi.
2 things I struggle to figure out on my own:
gpt-oss-120b and its smaller sibling come with weights in mxfp4 format.
Can I use Ampere hardware (rtx 3090) to decensor gpt-oss-120b? Cuda arch 86, so no native support for mxfp4. Does that matter?
Also, what are the hard memory requirements for successfully decensoring gpt-oss-120b? Is there a trivial way to calculate that? I see #83 , but still fail to grasp if less memory just means the process takes longer, or if it will fail.
Hi.
2 things I struggle to figure out on my own:
gpt-oss-120b and its smaller sibling come with weights in mxfp4 format.
Can I use Ampere hardware (rtx 3090) to decensor gpt-oss-120b? Cuda arch 86, so no native support for mxfp4. Does that matter?
Also, what are the hard memory requirements for successfully decensoring gpt-oss-120b? Is there a trivial way to calculate that? I see #83 , but still fail to grasp if less memory just means the process takes longer, or if it will fail.