MILS hangs during text generation (Llama-3.1 / GPT-2, Transformers 4.57, Torch 2.8)

**Title:** MILS stalls at text generation after feature extraction (Llama-3.1-8B and GPT-2; HF 4.57, Torch 2.8, Py3.12)

**Environment**

* Platform: RunPod Linux container
* GPU: RTX 5090 32 GB (CUDA OK)
* Python: 3.12
* torch: 2.8.0+cu128
* torchvision: 0.23.0+cu128
* torchaudio: 2.8.0+cu128
* transformers: 4.57.0
* accelerate: 1.10.1
* HF cache: `/workspace/hf_cache` (via `HF_HOME`)
* MILS branch: main (HEAD matches upstream)

**Models**

* `meta-llama/Meta-Llama-3.1-8B-Instruct`
* `gpt2` (control)

**Repro command**

```bash
export HF_HOME=/workspace/hf_cache
python -u main_image_captioning.py \
  --batch_size 1 --iterations 1 --requested_number 1 \
  --text_model "meta-llama/Meta-Llama-3.1-8B-Instruct"
# also reproduced with: --text_model gpt2
```

**Observed**

* Preprocess + feature extraction completes, then MILS stalls at text generation. No errors, no per-iteration outputs (e.g., `iteration_0.txt`).
* When run under a global timeout the process ends with `EXIT_CODE=143` (SIGTERM from `timeout`), i.e., long-running generation without observable progress rather than a crash.

Log tail (representative):

```
Device set to use cuda:0
Length of the data is 49xx
100%|████████████████████████| 16/16 [00:01<00:00, 9.7x it/s]
# no further output before timeout; requires manual termination
```

**Where it stalls**

* Call to `self.text_pipeline(...)` inside `optimization_utils/generator.py` (~lines 186/201).

**Control tests (succeed outside MILS)**

```python
from transformers import pipeline
import torch
p = pipeline('text-generation',
    model='meta-llama/Meta-Llama-3.1-8B-Instruct',
    model_kwargs={'torch_dtype': torch.bfloat16},
    device='cuda:0', token=True)
out = p('Hello', max_new_tokens=10, do_sample=False)  # returns text OK

p2 = pipeline('text-generation',
    model='meta-llama/Meta-Llama-3.1-8B-Instruct',
    device_map='cuda:0', token=True)
out2 = p2('Hello', max_new_tokens=10, do_sample=False)  # returns text OK
```

GPT-2 pipeline also returns text normally outside MILS.

**Tried (no change)**

* Filtered `terminators` to valid ints; pad token fallback if missing.
* Deterministic generation (`do_sample=False`, `temperature=None`), `max_new_tokens` clamped to 128.
* Small run params: `--batch_size 1 --iterations 1 --requested_number 1 --keep_previous 1`.
* Verified HF cache/auth and that standalone pipeline calls succeed.

**Questions for maintainers**

1. Is MILS intended to use HF **`pipeline`** for generation with current `transformers` (4.57), or should we switch to **`AutoModelForCausalLM.generate()`** directly?
2. Any known issues with MILS batching/stop conditions or chat-template handling on recent HF/Torch/Python?
3. Do you have **pinned versions** (Python / torch / transformers / accelerate) or a commit known to be stable for captioning?
4. A minimal **dataset/config** you recommend for an end-to-end smoke test that should produce `iteration_0.txt` quickly?
5. Is single GPU cloud usage supported? advisable?

Thanks,
Colin Rooney 12mv2


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MILS hangs during text generation (Llama-3.1 / GPT-2, Transformers 4.57, Torch 2.8) #4

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

MILS hangs during text generation (Llama-3.1 / GPT-2, Transformers 4.57, Torch 2.8) #4

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions