changes for latest versions of BenchMARL#3
changes for latest versions of BenchMARL#3karthiks1701 wants to merge 2 commits intoproroklab:mainfrom
Conversation
| # tensor of shape [*batch, n_agents, n_actions], where the outputs | ||
| # along the n_agent dimension are taken with the same (agent_index) agent network | ||
| agent_out = self.agent_mlps.agent_networks[agent_index].forward(input) | ||
| # agent_out = self.agent_mlps.agent_networks[agent_index].forward(input) |
There was a problem hiding this comment.
This is the key change as agent_networks is no longer supported in torchRL.
matteobettini
left a comment
There was a problem hiding this comment.
Thanks a mil, just a few qs
|
|
||
| experiment: | ||
| max_n_frames: 5_000_000 | ||
| max_n_frames: 1_000_000 |
There was a problem hiding this comment.
I would not change the default config for reproducibility
There was a problem hiding this comment.
sorry, about that I didnt want to run for longer, so pushed this by mistake.
| distance = self.estimate_snd(input) | ||
| if update_estimate: | ||
| self.estimated_snd[:] = distance.detach() |
There was a problem hiding this comment.
Could you expalin this a bit?
If those conditions are met, we can avoid computing
There was a problem hiding this comment.
I did this to be able to log the estimated_snd during training when the desired snd is -1. Right now it logs Nan's. It was just to be able to see the evolution of snd while training as well. Can be remove if necessary.
There was a problem hiding this comment.
I see, but you can still see it under eval/snd no?
There was a problem hiding this comment.
yes, But if I understand that is only during evaluation right? This was helpful in understanding how the SND evolves while training. But you are right eval/snd is enough. Should we roll back to the previous version?
There was a problem hiding this comment.
ok got it, i ll take care of things don't worry
A few adaptations for DiCo to work with the latest BenchMARL codebase. No need to install the specific branches of tensordict, torchRL, and BenchMARL. Let me know if additional testing is required.
SND curves for the sampling environment


Final learned policy
https://github.com/user-attachments/assets/4ba5ae75-02d2-4d80-868b-03bd538af34a