Skip to content

Ensure deterministic data loading#104

Merged
StephenOman merged 1 commit intoNetHack-LE:mainfrom
jbcoe:jbcoe/deterministic-data-loading
Mar 18, 2026
Merged

Ensure deterministic data loading#104
StephenOman merged 1 commit intoNetHack-LE:mainfrom
jbcoe:jbcoe/deterministic-data-loading

Conversation

@jbcoe
Copy link

@jbcoe jbcoe commented Mar 8, 2026

This commit resolves a non-determinism issue in TtyrecDataset where the order of loaded games could vary between runs. This is probably the cause of #88.

The previous implementation used a single data loading function with a shared lock (threading.Lock) for all workers. This created a race condition where multiple threads competed to fetch the next game, resulting in an unpredictable data order.

This fix introduces a deterministic assignment of games to each worker:

  • A new _make_load_fns method creates a separate, dedicated data loading function for each batch dimension.
  • Each function is assigned a unique, non-overlapping sequence of games by striding over the global gameids list. This eliminates the need for a shared lock and guarantees a deterministic loading order.

The corresponding tests in test_minibatches have also been updated to validate this new, deterministic behavior.

This commit resolves a non-determinism issue in `TtyrecDataset` where
the order of loaded games could vary between runs.

The previous implementation used a single data loading function with a shared
lock (`threading.Lock`) for all workers. This created a race condition
where multiple threads competed to fetch the next game, resulting in an
unpredictable data order.

The fix introduces a deterministic assignment of games to each worker:

- A new `_make_load_fns` method creates a separate, dedicated data loading
  function for each batch dimension.
- Each function is assigned a unique, non-overlapping sequence of games
  by striding over the global `gameids` list. This eliminates the need
  for a shared lock and guarantees a deterministic loading order.

The corresponding tests in `test_minibatches` have also been updated to
validate this new, deterministic behavior.
@jbcoe jbcoe marked this pull request as ready for review March 9, 2026 08:11
@StephenOman StephenOman added the bug Something isn't working label Mar 10, 2026
@StephenOman StephenOman merged commit 4a5fb84 into NetHack-LE:main Mar 18, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants