Excellent work! I've been experimenting with the NAVSIM dataset, but my training loss seems to get stuck around 0.7 after 5 epoch training. I'm curious if this is the expected behavior for this architecture. Would it be possible for you to share your training loss graph for reference? It would be a huge help for my debugging process.
my data sample:
{"id": 0, "image": "/path/to/NAVSIM/dataset/sensor_blobs/trainval/2021.06.09.17.37.09_veh-12_04489_04816/CAM_F0/3021b0c8a6235b62.jpg", "conversations": [{"from": "system", "value": "You are a helpful language and vision assistant. You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language."}, {"from": "human", "value": "Here is front views of a driving vehicle:\n
\nThe navigation information is: straight\nThe historical motion context is: (-8.97,1.33,-0.26),(-5.96,0.62,-0.18),(-2.96,0.17,-0.10),(0.00,0.00,0.00)\nPredict the optimal driving action for the next 4 seconds with 8 new waypoints (x,y,heading)."}, {"from": "gpt", "value": "2.92,0.15,0.11,5.72,0.62,0.22,8.32,1.37,0.35,10.63,2.36,0.47,12.61,3.51,0.58,14.28,4.69,0.65,15.64,5.77,0.68,16.71,6.64,0.69"}]}
Excellent work! I've been experimenting with the NAVSIM dataset, but my training loss seems to get stuck around 0.7 after 5 epoch training. I'm curious if this is the expected behavior for this architecture. Would it be possible for you to share your training loss graph for reference? It would be a huge help for my debugging process.
my data sample:
\nThe navigation information is: straight\nThe historical motion context is: (-8.97,1.33,-0.26),(-5.96,0.62,-0.18),(-2.96,0.17,-0.10),(0.00,0.00,0.00)\nPredict the optimal driving action for the next 4 seconds with 8 new waypoints (x,y,heading)."}, {"from": "gpt", "value": "2.92,0.15,0.11,5.72,0.62,0.22,8.32,1.37,0.35,10.63,2.36,0.47,12.61,3.51,0.58,14.28,4.69,0.65,15.64,5.77,0.68,16.71,6.64,0.69"}]}
{"id": 0, "image": "/path/to/NAVSIM/dataset/sensor_blobs/trainval/2021.06.09.17.37.09_veh-12_04489_04816/CAM_F0/3021b0c8a6235b62.jpg", "conversations": [{"from": "system", "value": "You are a helpful language and vision assistant. You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language."}, {"from": "human", "value": "Here is front views of a driving vehicle:\n