Hi, I'm trying to implement MCTSnet recently, and your repo is very inspiring. I have several questions regarding the submodules.
- The authors claim they use residual blocks in both the embedding network and the prior policy network. Each residual block contains two convolutional layers and a 'residual' step. Did you simplify it for the experiment?
- The authors use MLP with ONE hidden layer in the backup network as well as the readout network. I think your code maps the input to output directly.
P.S. Have you tried to limit the path depth at each state?
Hi, I'm trying to implement MCTSnet recently, and your repo is very inspiring. I have several questions regarding the submodules.
P.S. Have you tried to limit the path depth at each state?