In which part does it incorporate RL?

It's nice work! However I have a question. Since I'm not so familiar with Reinforce Learning, I wonder which part of it has RL? In 3.3.2 fine-tuning, "Update the model P(G,S) on the fine-tuning set $D^f$ using policy gradient method" It seems that it uses RL here. However, in the code, it just compute the topo, atom and bond type loss between the expanded $S_i$ and $G_i^k$.
Thanks!