I recorded a simulation by moving around with my camera and viewing a bunch of static objects. The camera intrinsic parameters were recorded as camera_intrinsic and extrinsic parameters were recorded as translation and rotation inside ego of the capture JSON (please correct me if this is wrong).
Now, I am experimenting with some triangulation methods (localizing objects using their 2D bounding box and the camera's projection matrix). Below is how I set up everything to get the projection matrix:
import numpy as np
from scipy.spatial.transform import Rotation as Rot
int_mat = np.array([
capture.sensor.camera_intrinsic[0],
capture.sensor.camera_intrinsic[1],
capture.sensor.camera_intrinsic[2]
])
r = Rot.from_quat(capture.ego.rotation)
rot_mat = r.as_matrix()
t = np.array([capture.ego.translation]).transpose()
ext_mat = np.hstack((rot_mat, t))
proj_mat = int_mat @ ext_mat
When I input the projection matrix and 2d bounding box into my custom algorithm, I noticed the triangulation was well-off. It was then that I noticed the matrix of intrinsic parameters only had values on the main diagonal and one of them is actually negative. I come from a computer vision background but not a computer graphics one. Can anyone here guide me on how to convert the intrinsic matrix provided by Unity Perception into a normal one with fx, fy, cx, cy, and skew?
I looked at the 3D Ground Truth Bounding Boxes inside the Perception_Statistics notebook but the authors seem to be considering the intrinsic parameter matrix as the projection matrix (Isn't the projection matrix supposed to be 3 by 4 due to the matrix multiplication of the 3 by 3 intrinsic parameter matrix with the 3 by 4 extrinsic parameter matrix?).
Furthermore, I also had a look at a similar question but the accepted answer there seems to assume the projection matrix is already provided while one of the answers is trying to use the FOV even though I don't see it in the data I have collected through Unity Perception.
For reference, below are the two relevant parts of the capture JSON file I am using:
"sensor": {
"sensor_id": "7fcdda27-3029-4bb9-83f3-9d3eac23a1a1",
"ego_id": "6a1ebd8b-1417-49a0-befc-8892801a9aa3",
"modality": "camera",
"translation": [
0.0,
0.0,
0.0
],
"rotation": [
0.0,
0.0,
0.0,
1.00000012
],
"camera_intrinsic": [
[
0.705989,
0.0,
0.0
],
[
0.0,
1.73205078,
0.0
],
[
0.0,
0.0,
-1.0006001
]
]
},
"ego": {
"ego_id": "6a1ebd8b-1417-49a0-befc-8892801a9aa3",
"translation": [
2.32,
1.203,
2.378
],
"rotation": [
-0.0229570474,
0.976061463,
-0.173389584,
-0.129279226
],
"velocity": null,
"acceleration": null
}
I recorded a simulation by moving around with my camera and viewing a bunch of static objects. The camera intrinsic parameters were recorded as camera_intrinsic and extrinsic parameters were recorded as translation and rotation inside ego of the capture JSON (please correct me if this is wrong).
Now, I am experimenting with some triangulation methods (localizing objects using their 2D bounding box and the camera's projection matrix). Below is how I set up everything to get the projection matrix:
When I input the projection matrix and 2d bounding box into my custom algorithm, I noticed the triangulation was well-off. It was then that I noticed the matrix of intrinsic parameters only had values on the main diagonal and one of them is actually negative. I come from a computer vision background but not a computer graphics one. Can anyone here guide me on how to convert the intrinsic matrix provided by Unity Perception into a normal one with fx, fy, cx, cy, and skew?
I looked at the 3D Ground Truth Bounding Boxes inside the Perception_Statistics notebook but the authors seem to be considering the intrinsic parameter matrix as the projection matrix (Isn't the projection matrix supposed to be 3 by 4 due to the matrix multiplication of the 3 by 3 intrinsic parameter matrix with the 3 by 4 extrinsic parameter matrix?).
Furthermore, I also had a look at a similar question but the accepted answer there seems to assume the projection matrix is already provided while one of the answers is trying to use the FOV even though I don't see it in the data I have collected through Unity Perception.
For reference, below are the two relevant parts of the capture JSON file I am using: