Straight through estimator

I noticed that you don't cancel gradient of the large values, when using straight through estimator [here](https://github.com/eladhoffer/quantized.pytorch/blob/master/models/modules/quantize.py#L89). 

In QNN paper it was claimed "Not cancelling the gradient when r is too large significantly worsens performance". 

Does it only matter for low precision quantization (e.g. binary?)