Title: MILS stalls at text generation after feature extraction (Llama-3.1-8B and GPT-2; HF 4.57, Torch 2.8, Py3.12)
Environment
- Platform: RunPod Linux container
- GPU: RTX 5090 32 GB (CUDA OK)
- Python: 3.12
- torch: 2.8.0+cu128
- torchvision: 0.23.0+cu128
- torchaudio: 2.8.0+cu128
- transformers: 4.57.0
- accelerate: 1.10.1
- HF cache:
/workspace/hf_cache (via HF_HOME)
- MILS branch: main (HEAD matches upstream)
Models
meta-llama/Meta-Llama-3.1-8B-Instruct
gpt2 (control)
Repro command
export HF_HOME=/workspace/hf_cache
python -u main_image_captioning.py \
--batch_size 1 --iterations 1 --requested_number 1 \
--text_model "meta-llama/Meta-Llama-3.1-8B-Instruct"
# also reproduced with: --text_model gpt2
Observed
- Preprocess + feature extraction completes, then MILS stalls at text generation. No errors, no per-iteration outputs (e.g.,
iteration_0.txt).
- When run under a global timeout the process ends with
EXIT_CODE=143 (SIGTERM from timeout), i.e., long-running generation without observable progress rather than a crash.
Log tail (representative):
Device set to use cuda:0
Length of the data is 49xx
100%|████████████████████████| 16/16 [00:01<00:00, 9.7x it/s]
# no further output before timeout; requires manual termination
Where it stalls
- Call to
self.text_pipeline(...) inside optimization_utils/generator.py (~lines 186/201).
Control tests (succeed outside MILS)
from transformers import pipeline
import torch
p = pipeline('text-generation',
model='meta-llama/Meta-Llama-3.1-8B-Instruct',
model_kwargs={'torch_dtype': torch.bfloat16},
device='cuda:0', token=True)
out = p('Hello', max_new_tokens=10, do_sample=False) # returns text OK
p2 = pipeline('text-generation',
model='meta-llama/Meta-Llama-3.1-8B-Instruct',
device_map='cuda:0', token=True)
out2 = p2('Hello', max_new_tokens=10, do_sample=False) # returns text OK
GPT-2 pipeline also returns text normally outside MILS.
Tried (no change)
- Filtered
terminators to valid ints; pad token fallback if missing.
- Deterministic generation (
do_sample=False, temperature=None), max_new_tokens clamped to 128.
- Small run params:
--batch_size 1 --iterations 1 --requested_number 1 --keep_previous 1.
- Verified HF cache/auth and that standalone pipeline calls succeed.
Questions for maintainers
- Is MILS intended to use HF
pipeline for generation with current transformers (4.57), or should we switch to AutoModelForCausalLM.generate() directly?
- Any known issues with MILS batching/stop conditions or chat-template handling on recent HF/Torch/Python?
- Do you have pinned versions (Python / torch / transformers / accelerate) or a commit known to be stable for captioning?
- A minimal dataset/config you recommend for an end-to-end smoke test that should produce
iteration_0.txt quickly?
- Is single GPU cloud usage supported? advisable?
Thanks,
Colin Rooney 12mv2
Title: MILS stalls at text generation after feature extraction (Llama-3.1-8B and GPT-2; HF 4.57, Torch 2.8, Py3.12)
Environment
/workspace/hf_cache(viaHF_HOME)Models
meta-llama/Meta-Llama-3.1-8B-Instructgpt2(control)Repro command
Observed
iteration_0.txt).EXIT_CODE=143(SIGTERM fromtimeout), i.e., long-running generation without observable progress rather than a crash.Log tail (representative):
Where it stalls
self.text_pipeline(...)insideoptimization_utils/generator.py(~lines 186/201).Control tests (succeed outside MILS)
GPT-2 pipeline also returns text normally outside MILS.
Tried (no change)
terminatorsto valid ints; pad token fallback if missing.do_sample=False,temperature=None),max_new_tokensclamped to 128.--batch_size 1 --iterations 1 --requested_number 1 --keep_previous 1.Questions for maintainers
pipelinefor generation with currenttransformers(4.57), or should we switch toAutoModelForCausalLM.generate()directly?iteration_0.txtquickly?Thanks,
Colin Rooney 12mv2