Skip to content

MILS hangs during text generation (Llama-3.1 / GPT-2, Transformers 4.57, Torch 2.8) #4

@12mv2

Description

@12mv2

Title: MILS stalls at text generation after feature extraction (Llama-3.1-8B and GPT-2; HF 4.57, Torch 2.8, Py3.12)

Environment

  • Platform: RunPod Linux container
  • GPU: RTX 5090 32 GB (CUDA OK)
  • Python: 3.12
  • torch: 2.8.0+cu128
  • torchvision: 0.23.0+cu128
  • torchaudio: 2.8.0+cu128
  • transformers: 4.57.0
  • accelerate: 1.10.1
  • HF cache: /workspace/hf_cache (via HF_HOME)
  • MILS branch: main (HEAD matches upstream)

Models

  • meta-llama/Meta-Llama-3.1-8B-Instruct
  • gpt2 (control)

Repro command

export HF_HOME=/workspace/hf_cache
python -u main_image_captioning.py \
  --batch_size 1 --iterations 1 --requested_number 1 \
  --text_model "meta-llama/Meta-Llama-3.1-8B-Instruct"
# also reproduced with: --text_model gpt2

Observed

  • Preprocess + feature extraction completes, then MILS stalls at text generation. No errors, no per-iteration outputs (e.g., iteration_0.txt).
  • When run under a global timeout the process ends with EXIT_CODE=143 (SIGTERM from timeout), i.e., long-running generation without observable progress rather than a crash.

Log tail (representative):

Device set to use cuda:0
Length of the data is 49xx
100%|████████████████████████| 16/16 [00:01<00:00, 9.7x it/s]
# no further output before timeout; requires manual termination

Where it stalls

  • Call to self.text_pipeline(...) inside optimization_utils/generator.py (~lines 186/201).

Control tests (succeed outside MILS)

from transformers import pipeline
import torch
p = pipeline('text-generation',
    model='meta-llama/Meta-Llama-3.1-8B-Instruct',
    model_kwargs={'torch_dtype': torch.bfloat16},
    device='cuda:0', token=True)
out = p('Hello', max_new_tokens=10, do_sample=False)  # returns text OK

p2 = pipeline('text-generation',
    model='meta-llama/Meta-Llama-3.1-8B-Instruct',
    device_map='cuda:0', token=True)
out2 = p2('Hello', max_new_tokens=10, do_sample=False)  # returns text OK

GPT-2 pipeline also returns text normally outside MILS.

Tried (no change)

  • Filtered terminators to valid ints; pad token fallback if missing.
  • Deterministic generation (do_sample=False, temperature=None), max_new_tokens clamped to 128.
  • Small run params: --batch_size 1 --iterations 1 --requested_number 1 --keep_previous 1.
  • Verified HF cache/auth and that standalone pipeline calls succeed.

Questions for maintainers

  1. Is MILS intended to use HF pipeline for generation with current transformers (4.57), or should we switch to AutoModelForCausalLM.generate() directly?
  2. Any known issues with MILS batching/stop conditions or chat-template handling on recent HF/Torch/Python?
  3. Do you have pinned versions (Python / torch / transformers / accelerate) or a commit known to be stable for captioning?
  4. A minimal dataset/config you recommend for an end-to-end smoke test that should produce iteration_0.txt quickly?
  5. Is single GPU cloud usage supported? advisable?

Thanks,
Colin Rooney 12mv2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions