Fix non-deterministic lidar odometry#368
Merged
JanuszBedkowski merged 1 commit intoMapsHD:mainfrom Feb 20, 2026
Merged
Conversation
Collaborator
Author
|
It seems that the Clang-Format 21 check is not working correctly. But I'm sure that the code is formatted correctly. I ran a formatting script before committing the code. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Lidar odometry produces different results across runs, in both single-threaded and
multithreaded modes.
Fix: Zero-initialize NDT bucket structs
Root cause
Eigen's default constructors intentionally leave matrix/vector members uninitialized
for performance. When NDT bucket structs (
Bucket,BucketCoef,Bucket2) arecreated, fields like
normal_vector,cov,meancontain whatever was in memory.In
update_rgd(), when a new bucket is created,mean,cov,cov_inverse, andnumber_of_pointsare explicitly set -- butnormal_vectoris not. It only getscomputed when
number_of_pointsreaches 3 (via PCA eigenvectors + viewport-basedorientation).
Meanwhile,
compute_hessian()has no point count guard -- it uses any bucket foundin the map, including those with only 1-2 points. The viewpoint check in
add_indoor_hessian_contribution()andadd_outdoor_hessian_contribution()readsnormal_vectorunconditionally:With garbage in
normal_vector, the dot product produces random values, causingvalid buckets to be randomly skipped or included across runs. This is both a
correctness bug (valid observations randomly rejected) and a source of
non-determinism.
Fix
Added default member initializers to all NDT bucket structs in
core/include/ndt.h:Bucket:cov,cov_inverse,mean,normal_vector=Zero();number_of_points,index_begin,index_end=0BucketCoef:mean,cov,normal_vector=Zero()Bucket2:index_begin_inclusive,index_end_exclusive=0With
normal_vectorinitialized to zero,Zero().dot(anything) = 0, which is not< 0, so the viewpoint check consistently passes for buckets with < 3 points.This matches the intended behavior -- the viewpoint check should only reject buckets
that have a valid surface normal pointing away from the sensor.
Added comments in
add_indoor_hessian_contribution()andadd_outdoor_hessian_contribution()documenting this behavior.Files changed
core/include/ndt.happs/lidar_odometry_step_1/lidar_odometry_utils_optimizers.cpp(comments only)Note on multithreaded Hessian accumulation
The multithreaded Hessian accumulation uses
tbb::combinable+combine_each,which sums floating-point matrices in an order that varies by thread scheduling.
Since floating-point addition is not associative, this could in theory produce
slightly different results across runs. However, testing showed no noticeable effect
on determinism after the bucket initialization fix above -- the rounding-level
differences from summation order do not propagate into observable output differences.
No changes were made to the accumulation code.
Testing
Tested on a 5.5 km dataset in both single-threaded and multithreaded modes.
The resulting trajectories are absolutely identical across runs.
Origin
It is difficult to pinpoint when this bug was introduced from the git log, as the
normal_vectorfield has existed uninitialized inBucketsince 2023, and theviewpoint dot-product check was added, commented out, and uncommented across several
commits. My best guess is
19ebcb0("hybhrid approach for LO", Mar 13, 2025) whichuncommented the
nv.dot(viewport - this_bucket.mean) < 0check in the hessian pathwithout adding a point count guard.