Variational RHN + WT (depth=10) with 517 units per layer is enough vs original 830

The homogeneity of RHNs ease us to learn sparse structures within RHNs. In our recent work of ISS (https://arxiv.org/pdf/1709.05027.pdf), we find that the we can reduce "#Units/Layer" of "Variational RHN + WT" in your Table 1 from **830** to **517** without losing perplexity. This reduces the model size from **23.5M** to **11.1M**, which is much smaller than the model found by "Neural Architecture Search". For your interests, the results are covered in Table 2 in our work.

Let us know if this is interesting to you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variational RHN + WT (depth=10) with 517 units per layer is enough vs original 830 #17

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Variational RHN + WT (depth=10) with 517 units per layer is enough vs original 830 #17

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions