TD-Linear中Reward list 坐标对应错误

![image](https://github.com/user-attachments/assets/8c4e5b3a-255d-4add-a771-c544c8fc35f1)

TD-Linear中reward list初始化有问题，和GridEnv PSA矩阵的初始化过程中的reward list的顺序不一致：
![image](https://github.com/user-attachments/assets/aecb06b7-1c49-41ff-ba50-c25613e6844a)
这会导致TD-Linear中的```policy_evaluation```函数得不到正确的状态值