fix: resolve NameError when MultiHeadAttention is called with w_init=None#12
Open
exopoiesis wants to merge 1 commit intodeepmodeling:mainfrom
Open
fix: resolve NameError when MultiHeadAttention is called with w_init=None#12exopoiesis wants to merge 1 commit intodeepmodeling:mainfrom
exopoiesis wants to merge 1 commit intodeepmodeling:mainfrom
Conversation
Collaborator
|
Hi @exopoiesis, thanks for the PR and the detailed tests. Two notes from our side:
We still appreciate the |
…None `w_init_scale` was referenced but never defined, causing a NameError whenever MultiHeadAttention is instantiated with the default w_init=None. Fix replaces the undefined variable with the literal 1.0, which matches the upstream haiku VarianceScaling default. Adds two focused regression tests: - test_w_init_none_does_not_raise: exercises the formerly-broken code path - test_w_init_explicit_still_works: confirms explicit w_init is unaffected Fixes deepmodeling#11
d5a8866 to
d3ee9c3
Compare
Author
|
Slimmed down to just the w_init fix + regression test as suggested. Tree_map and extension-module changes dropped. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
MultiHeadAttention.__init__referencesw_init_scalein the fallback branch:w_init_scaleis never defined in this scope, so any call with the defaultw_init=NoneraisesNameError: name 'w_init_scale' is not defined.Fix
Replace the undefined variable with the literal
1.0, which matches theupstream haiku
VarianceScalingdefault:Tests
Two focused regression tests in
tests/test_attention.py:test_w_init_none_does_not_raise— exercises the formerly-broken default pathtest_w_init_explicit_still_works— confirms explicitw_initis unaffectedScope
This PR is intentionally minimal: only
crystalformer/src/attention.pyandtests/test_attention.pyare changed. No tree API or extension module changes.Fixes #11