Implement GPU sharing via NVIDIA MPS for small-system throughput

Enable multiple simulations to share a single GPU using NVIDIA MPS for small-to-medium systems that underutilize GPU compute.

## Motivation

For small systems, a single GPU is underutilized by one simulation. MPS allows multiple processes to share a GPU with near-linear scaling up to compute capacity. Running 8-16 replicas via MPS can deliver 8-12x aggregate throughput.

Related to but not strictly dependend on #9

## Scope

- `mdfactory/performance/mps.py`
- MPS daemon lifecycle: start (`nvidia-cuda-mps-control -d`), stop (`echo quit | nvidia-cuda-mps-control`), health check
- Two packing strategies:
  - `-multidir` approach: single `gmx_mpi mdrun -multidir dir1 dir2 ... dirN` call. Requires all systems to use the same `.mdp` parameters — natural for HT batches.
  - Independent srun approach: each simulation gets its own `srun` with MPS arbitrating GPU access. More flexible (different `.mdp` per system).
- `gpu_replicas` field in `run_schedules.yaml`
- Error handling: MPS daemon failure, GPU OOM detection

## MPS lifecycle pattern

```bash
# Start
nvidia-cuda-mps-control -d

# Run N simulations sharing the GPU
for i in $(seq 0 $((NUM_REPLICAS - 1))); do
    srun --exact -n 1 --gpus=1 --cpus-per-task=${CORES} \
        --cpu-bind=verbose,cores --distribution=block:block \
        --chdir=${SIM_DIR} \
        gmx_mpi mdrun -s topol.tpr -nb gpu -pme gpu &
done
wait

# Stop
echo quit | nvidia-cuda-mps-control
```

## Acceptance criteria

- MPS start/stop functions work correctly (tested with mock subprocess)
- GPU replica count derived from benchmark results
- Combines with CPU affinity for CPU-side binding
- Graceful error when MPS unavailable (driver not configured)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement GPU sharing via NVIDIA MPS for small-system throughput #10

Motivation

Scope

MPS lifecycle pattern

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Implement GPU sharing via NVIDIA MPS for small-system throughput #10

Description

Motivation

Scope

MPS lifecycle pattern

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions