You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 31, 2021. It is now read-only.
I have been looking at the code and I'm not sure why the output vocabulary size consists of both the word & key embeddings when the key is not tied -- link to code. The step is followed by a narrow operation limiting the logsoftmax to only the words. Is there any reason for the design choice or we can get rid of the extra rows from z/R.
Hi,
I have been looking at the code and I'm not sure why the output vocabulary size consists of both the word & key embeddings when the key is not tied -- link to code. The step is followed by a narrow operation limiting the logsoftmax to only the words. Is there any reason for the design choice or we can get rid of the extra rows from z/R.