Skip to content

Add cross-platform execution backend (Windows/macOS support)#1256

Open
the-shadow-0 wants to merge 1 commit intogoogle:mainfrom
the-shadow-0:feature/cross-platform-backend
Open

Add cross-platform execution backend (Windows/macOS support)#1256
the-shadow-0 wants to merge 1 commit intogoogle:mainfrom
the-shadow-0:feature/cross-platform-backend

Conversation

@the-shadow-0
Copy link

@the-shadow-0 the-shadow-0 commented Mar 21, 2026

Hi! 👋

This PR adds support for running Grain on Windows and macOS, while keeping the current multiprocessing behavior on Linux unchanged.

The goal was to make the library usable across platforms without introducing regressions or major refactors. The approach is minimal and focuses on correctness, determinism, safe shutdown behavior, and developer transparency.


What’s included

Execution backend abstraction

Introduced an ExecutionBackend layer to decouple concurrency logic from dataset iteration and data loading:

  • MultiprocessingBackend → Linux (uses fork/spawn as configured)
  • ThreadingBackend → Windows/macOS (threading fallback using queue.Queue and threading.Thread)

This design ensures platform-specific optimizations without modifying the core data pipeline logic.

Cross-platform support

  • Linux → multiprocessing (default fork)
  • Windows/macOS → threading fallback
  • Backend is automatically selected based on the OS, but can be overridden via environment variables.

Deterministic behavior

  • Dataset output order is strictly index-based, independent of worker execution timing.
  • Multi-worker iterations produce identical results across Linux, Windows, and macOS.
  • Verified via cross-backend determinism tests.

Safer shutdown behavior

  • All blocking queue calls wrapped in timeouts using .get(timeout=...) and should_stop.is_set().
  • Threads gracefully exit and emit warnings if they do not shut down correctly.
  • Avoids indefinite blocking, even on Windows/macOS where multiprocessing may fail.

Environment configuration

Grain behavior can be controlled via environment variables:

Variable Values Description
GRAIN_EXECUTION_BACKEND threading, multiprocessing Forces the backend type
GRAIN_MP_START fork, spawn, forkserver Controls multiprocessing start method on Linux
GRAIN_STRICT_PICKLING 0 or 1 Enforces pickling rules for debugging (MultiprocessingBackend only)

Example usage:

export GRAIN_EXECUTION_BACKEND=threading
export GRAIN_MP_START=fork
export GRAIN_STRICT_PICKLING=1

python train_model.py

Invalid values will raise a ValueError immediately to prevent silent fallback issues.


Pickling behavior

  • Multiprocessing: objects must be picklable; lambdas/closures trigger explicit errors.
  • Threading: no pickling required; lambdas/closures work normally.
  • Strict pickling mode (GRAIN_STRICT_PICKLING=1) ensures consistency across platforms during testing.

Shared memory handling

  • Enabled only in MultiprocessingBackend.
  • Disabled automatically in ThreadingBackend.
  • Performance warnings are logged when falling back to threads (CPU/GIL may limit throughput).

Usage examples

from grain._src.python.data_loader import DataLoader
from grain._src.python.dataset import MapDataset

# Example dataset
data = MapDataset(range(100)).map(lambda x: x * 2).batch(10)

# Automatic backend selection
loader = DataLoader(data, num_workers=4)

for batch in loader:
    print(batch)

On Linux, this will use multiprocessing with shared memory.

On Windows/macOS, this will use threading.

Overriding the backend:

import os
os.environ['GRAIN_EXECUTION_BACKEND'] = 'threading'
# ThreadingBackend is now enforced

Internal architecture overview

ExecutionBackend abstraction

  • Process, Queue, Event, and SynchronizedInt are mapped to either native multiprocessing or threading primitives.
  • Interface is consistent across platforms.

DataLoader iterator

  • Workers use ExecutionBackend.Process and queues to fetch dataset elements.
  • Threading backend avoids pickling and shared memory, while preserving deterministic order.

Checkpointing

  • get_state() and set_state() remain consistent.
  • Cross-platform iteration restores correctly regardless of backend.

Testing

  • Functional tests: PASS
  • Determinism tests across backends: PASS
  • Pickling validation: PASS (Multiprocessing triggers errors, Threading bypasses)
  • Shutdown & deadlock tests: PASS
  • Cross-platform simulation: PASS
  • Performance sanity: Linux throughput preserved, Threading backend limited by GIL (expected)

Performance

  • Linux: unchanged (native multiprocessing)
  • Windows/macOS: slightly slower due to threading and GIL; deterministic behavior guaranteed

Known limitations

  • Threads cannot be forcefully killed; infinite loops in user transforms emit warnings.
  • fork may cause deadlocks in libraries like CUDA/JAX; spawn can be forced via environment variable.
  • Shared memory is disabled in threads → may increase memory usage for large datasets.

Notes for maintainers

  • Minimal invasive changes; Linux behavior is untouched.
  • Backend selection logic is centralized in execution_backend.py.
  • Developers can extend ExecutionBackend for other concurrency mechanisms (e.g., Ray, Dask).

Thank you!

This PR aims to make Grain usable and deterministic on all major OS platforms, without sacrificing Linux performance or existing workflows.

Happy to iterate if maintainers have feedback! 🙏


📚 Documentation preview 📚: https://google-grain--1256.org.readthedocs.build/

@google-cla
Copy link

google-cla bot commented Mar 21, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant