The homogeneity of RHNs ease us to learn sparse structures within RHNs. In our recent work of ISS (https://arxiv.org/pdf/1709.05027.pdf), we find that the we can reduce "#Units/Layer" of "Variational RHN + WT" in your Table 1 from 830 to 517 without losing perplexity. This reduces the model size from 23.5M to 11.1M, which is much smaller than the model found by "Neural Architecture Search". For your interests, the results are covered in Table 2 in our work.
Let us know if this is interesting to you.
The homogeneity of RHNs ease us to learn sparse structures within RHNs. In our recent work of ISS (https://arxiv.org/pdf/1709.05027.pdf), we find that the we can reduce "#Units/Layer" of "Variational RHN + WT" in your Table 1 from 830 to 517 without losing perplexity. This reduces the model size from 23.5M to 11.1M, which is much smaller than the model found by "Neural Architecture Search". For your interests, the results are covered in Table 2 in our work.
Let us know if this is interesting to you.