Hi,
I would like to revisit this issue as I not sure where language instruction is actually fed into the model due to it not being one of the obs_shapes.
I see that you specifiy it as part of your batch data here. However, what I am unable to locate is where the language instruction from this batch actually gets passed to the model. From my understanding it seems the observation keys defined in file_utils.py (which does not seem to include language) are the only inputs to the observation encoders / feature concatenation in obs_nets.py. Additionally, I hadn't seen language instruction as one of the observation modalities in droid_runs_language_conditioned_rlds.py so I was hoping to get insight on where and how the language instruction is fed into the model.
Is this understanding of the code structure correct? If so where does the language instruction actually get fed into the model?
Hi,
I would like to revisit this issue as I not sure where language instruction is actually fed into the model due to it not being one of the obs_shapes.
I see that you specifiy it as part of your batch data here. However, what I am unable to locate is where the language instruction from this batch actually gets passed to the model. From my understanding it seems the observation keys defined in file_utils.py (which does not seem to include language) are the only inputs to the observation encoders / feature concatenation in obs_nets.py. Additionally, I hadn't seen language instruction as one of the observation modalities in droid_runs_language_conditioned_rlds.py so I was hoping to get insight on where and how the language instruction is fed into the model.
Is this understanding of the code structure correct? If so where does the language instruction actually get fed into the model?