Added Project: NN Parameter Sparsity#8
Conversation
smdzvz
left a comment
There was a problem hiding this comment.
Great project. I think you should remove the (≈ 30 seconds) and (≈ 2 minutes) markers in the section titles of the report.
I notice that you prune the dense network, use the same initialization, and retrain the sparse network. It would be interesting to also test whether the same pruned network can still train successfully when re-initialized with different random weights. We can see whether performance depends on the specific initialization or only on the position of the remaining weights.
Also, although you first perform a search to find a small dense model that performs well, I think you should have a baseline where you randomly remove 80% of the weights and train the network. If this random sparse network performs similarly, it would suggest that the task may be easy enough that many sparse subnetworks work well; if not, it would support your claim that the identified weights (potentially paired with initialization) are meaningfully structured.
…tialization and masking
|
Thanks a lot for the review! I've made the changes, and added these in the Sanity Check section.
|
No description provided.