Bug: padding_side not set for tokenizers with existing pad_token, causing empty responses
Problem
When a tokenizer already has a pad_token defined (e.g., allenai/Olmo-3-7B-Instruct), padding_side is not set to "left", causing empty responses during batched generation.
Decoder-only models require left-padding for generation. With right-padding (the default), models see [PROMPT] [PAD] [PAD] and stop generating immediately, returning empty strings.
Impact
- Empty responses are counted as "non-refusals" in
is_refusal() (no markers found → success)
- Results in falsely low refusal rates (e.g., 5/100 instead of 90/100)
- Only affects
batch_size > 1 (no padding needed for single prompts)
batch_size=1 shows correct results, larger batches show near-zero refusals
Reproduction
Test with any model that has its own pad_token (doesn't use EOS fallback):
# batch_size=1: Correct refusal count (e.g., 10/10)
# batch_size=128: Near-zero refusals, all empty responses counted as "success"
Tested with allenai/Olmo-3-7B-Instruct and squ11z1/Hypnos-i1-8B.
Current Code (model.py:50-52)
if self.tokenizer.pad_token is None:
self.tokenizer.pad_token = self.tokenizer.eos_token
self.tokenizer.padding_side = "left" # ← Only set when pad_token is None!
Suggested Fix
if self.tokenizer.pad_token is None:
self.tokenizer.pad_token = self.tokenizer.eos_token
# CRITICAL: Always use left padding for decoder-only models during generation.
# Right padding causes empty outputs because the model sees PAD tokens after
# the prompt and thinks the sequence is complete.
self.tokenizer.padding_side = "left"
Optional: Handle empty responses in evaluation
Additionally, is_refusal() in evaluator.py could treat empty/very short responses as failures rather than successes:
def is_refusal(self, response: str) -> bool:
# Empty response = generation failure, not successful uncensoring
if len(response.strip()) < 5:
return True
# ... rest of marker checking
This provides defense-in-depth against other potential generation failures.
Bug: padding_side not set for tokenizers with existing pad_token, causing empty responses
Problem
When a tokenizer already has a
pad_tokendefined (e.g.,allenai/Olmo-3-7B-Instruct),padding_sideis not set to"left", causing empty responses during batched generation.Decoder-only models require left-padding for generation. With right-padding (the default), models see
[PROMPT] [PAD] [PAD]and stop generating immediately, returning empty strings.Impact
is_refusal()(no markers found → success)batch_size > 1(no padding needed for single prompts)batch_size=1shows correct results, larger batches show near-zero refusalsReproduction
Test with any model that has its own
pad_token(doesn't use EOS fallback):Tested with
allenai/Olmo-3-7B-Instructandsqu11z1/Hypnos-i1-8B.Current Code (model.py:50-52)
Suggested Fix
Optional: Handle empty responses in evaluation
Additionally,
is_refusal()inevaluator.pycould treat empty/very short responses as failures rather than successes:This provides defense-in-depth against other potential generation failures.