Think masking by divya-kumari32 · Pull Request #45 · garrett361/open-instruct

divya-kumari32 · 2025-08-26T16:00:18Z

This PR adds support for optionally appending \n<think>\n to the default asst_tag during span-based label masking for SFT — but only for samples that contain <think> content and when the user explicitly specifies it in the mask_think_tag flag. This feature is designed to additionally support models that are thinking models, but do not specify thinking mode by default, and also in case the think and no-think samples are mixed in the dataset during training.

Some assistant responses in a sample might contain a <think> block that represents internal monologue/reasoning by the model. During SFT, the user might not want the model to learn to generate the <think> tag itself. Instead, we optionally allow <think> to be treated as part of the assistant tag span boundary, so masking cleanly includes or excludes it.

The behavior is controlled by two conditions:

mask_think_tag (passed by the user)
- False: masking ignores <think> completely.
  - This may be because the user doesn’t want <think> included,
    the model is not a “thinking” model, or for other design/training reasons.
- True: masking may append <think> to the assistant tag, depending on the passed sample's content.
has_thinking_content(messages) (sample check)
- Returns True if the assistant message:
  - Has an explicit thought field, or
  - Contains "<think>" inside its content string.

The <think> suffix is appended to the assistant tag only when both are true:

if mask_think_tag and has_thinking_content(messages):
   asst_tag += think_tag

Additional notes:
Other minor changes:
Although it is common for add_special_tokens to be set to False during tokenizer encoding, we set add_special_tokens=False here to disable the injection of tokenizer-specific BOS/EOS or SEP tokens that would otherwise be automatically added to the encoded sequence. This is essential for correct behavior in tasks like:

Span matching or pattern matching on tokenized control tags (e.g., <|start_of_role|>assistant<|end_of_role|>)
Label masking for SFT or instruction tuning pipelines, where precise position tracking is required
Why is this necessary:
By default, tokenizers for models like LLaMA, Mistral, and Phi-2 have add_bos_token=True, which means they automatically prepend a BOS token. This would shift all token positions by 1 and break any span matching logic.
Other models like Qwen, etc., have add_bos_token=False by default, so they behave differently — leading to inconsistent behavior across model families if not explicitly handled.

Co-authored-by: Yu Chin Fabian Lim <fabianlim@users.noreply.github.com>

garrett361 · 2025-08-26T17:43:28Z

The unit tests that @fabianlim wrote are failing. Please make sure this isn't indicative of a bug, and update the tests, if necessary

…n-instruct into think-masking merge#

Co-authored-by: Yu Chin Fabian Lim <fabianlim@users.noreply.github.com>

…n-instruct into think-masking merge#

Co-authored-by: Yu Chin Fabian Lim <fabianlim@users.noreply.github.com>

…n-instruct into think-masking merge

dangxuanhong · 2025-09-18T19:33:50Z

Hi @fabianlim @divya-kumari32 should we address above concerns in order to get this PR merged?

@Swanand-Kadhe Did you use this branch [think-masking](https://github.com/divya-kumari32/open-instruct/tree/think-masking) in running experiments wrt v7d1l2 dataset?

fabianlim · 2025-09-24T14:40:02Z

    tokenizer: PreTrainedTokenizer,
    max_seq_length: int,
-    asst_tag: str = "<|start_of_role|>assistant<|end_of_role|>",
+    asst_tag: str = "<|start_of_role|>assistant<|end_of_role|>\n<think>\n",


fabianlim · 2025-09-24T14:40:39Z

+            if message.get("role") == "assistant":
+                # Check for an explicit 'thought' field or a '<think>' tag in the content.
+                if message.get("thought") or (
+                    isinstance(message.get("content"), str) and "<think>" in message["content"]


Suggested change

isinstance(message.get("content"), str) and "<think>" in message["content"]

isinstance(message.get("content"), str) and think_tag.strip() in message["content"]

fabianlim

Can you accept the two suggestions

fabianlim · 2025-09-24T15:34:32Z

        **additional_inputs,
    )

+    # If the user has set `append_think_tag=True` and the current sample is a thinking sample,


ok thanks for the comment update.. so can we then change append_think_tag to mask_think_tag_if_present? Would that be more clear?

fabianlim · 2025-09-24T15:35:03Z

+    if append_think_tag:
+        if has_thinking_content(messages):
+            asst_tag += think_tag
+


Suggested change

if append_think_tag:

if has_thinking_content(messages):

asst_tag += think_tag

if append_think_tag and has_thinking_content(messages):

asst_tag += think_tag

divya-kumari32 and others added 4 commits August 14, 2025 15:15

Add special tokens argument added

ce1b1f1

Merge branch 'garrett361:main' into main

cbca16d

V7 tools string type change

0b796e0

Changed asst_tag for think

56b8a3a

divya-kumari32 requested review from Swanand-Kadhe, fabianlim and garrett361 August 26, 2025 16:00

fabianlim reviewed Aug 26, 2025

View reviewed changes

Comment thread open_instruct/dataset_transformation.py

fabianlim reviewed Aug 26, 2025

View reviewed changes

Comment thread open_instruct/dataset_transformation.py Outdated

fabianlim reviewed Aug 26, 2025

View reviewed changes

Comment thread open_instruct/dataset_transformation.py Outdated

fabianlim reviewed Aug 26, 2025

View reviewed changes

Comment thread open_instruct/dataset_transformation.py

Changed sft_span fn wrt reviews

25d4929

fabianlim reviewed Aug 26, 2025

View reviewed changes

Comment thread open_instruct/dataset_transformation.py

function Changes per reviews

3e42f9e

fabianlim reviewed Aug 26, 2025

View reviewed changes

Comment thread open_instruct/dataset_transformation.py Outdated

fabianlim reviewed Aug 26, 2025

View reviewed changes

Comment thread open_instruct/dataset_transformation.py Outdated

divya-kumari32 and others added 4 commits August 26, 2025 13:20

Update open_instruct/dataset_transformation.py

ae2c860

Co-authored-by: Yu Chin Fabian Lim <fabianlim@users.noreply.github.com>

function Changes per reviews

0939271

ruff changes

9cfa6b2

Update dataset_transformation.py

3e74717

divya-kumari32 added 2 commits August 26, 2025 14:10

ruff changes

2ad362a

Merge branch 'think-masking' of https://github.com/divya-kumari32/ope…

86beca8

…n-instruct into think-masking merge#

Swanand-Kadhe reviewed Aug 26, 2025

View reviewed changes

Comment thread open_instruct/dataset_transformation.py Outdated

fabianlim reviewed Aug 26, 2025

View reviewed changes

Comment thread open_instruct/dataset_transformation.py Outdated

Added try/catch for json loads

41d59c4

Co-authored-by: Yu Chin Fabian Lim <fabianlim@users.noreply.github.com>

fabianlim reviewed Aug 26, 2025

View reviewed changes

Comment thread open_instruct/dataset_transformation.py Outdated

divya-kumari32 added 3 commits August 26, 2025 15:14

Added check_sample flag

17d9dba

Merge branch 'think-masking' of https://github.com/divya-kumari32/ope…

896932d

…n-instruct into think-masking merge#

Added check_sample flag

7e6df5d

Swanand-Kadhe reviewed Aug 27, 2025

View reviewed changes

Comment thread open_instruct/dataset_transformation.py Outdated

fabianlim reviewed Aug 27, 2025

View reviewed changes

Comment thread open_instruct/dataset_transformation.py

divya-kumari32 and others added 6 commits August 27, 2025 14:23

Update open_instruct/dataset_transformation.py

45fa9a8

Co-authored-by: Yu Chin Fabian Lim <fabianlim@users.noreply.github.com>

Added think tag and changed masking flag

ea32413

Merge branch 'think-masking' of https://github.com/divya-kumari32/ope…

c03d076

…n-instruct into think-masking merge

Added think tag and changed masking flag

adcb1d2

Added think tag and changed masking flag

23a70f9

ruff checks

fdb1131

fabianlim reviewed Aug 27, 2025

View reviewed changes

Comment thread open_instruct/dataset_transformation.py Outdated

fabianlim reviewed Aug 27, 2025

View reviewed changes

Comment thread open_instruct/dataset_transformation.py

divya-kumari32 added 2 commits September 24, 2025 09:36

renamed variables and added more description

f1b5bde

renamed variables and added more description

ae2b00e

divya-kumari32 requested review from Swanand-Kadhe and fabianlim September 24, 2025 14:17

fabianlim reviewed Sep 24, 2025

View reviewed changes

fabianlim requested changes Sep 24, 2025

View reviewed changes

Chages after Fabian's review

99c1043

fabianlim reviewed Sep 24, 2025

View reviewed changes

Comment thread open_instruct/dataset_transformation.py Outdated

divya-kumari32 added 2 commits September 24, 2025 11:22

Chages after Fabian's review

988927d

ruff checks

78c5947

fabianlim reviewed Sep 24, 2025

View reviewed changes

Name changes

280025d

fabianlim approved these changes Sep 24, 2025

View reviewed changes

fabianlim merged commit 9db0eec into garrett361:main Sep 24, 2025
2 checks passed

	asst_tag: str = "<\|start_of_role\|>assistant<\|end_of_role\|>\n<think>\n",
	asst_tag: str = "<\|start_of_role\|>assistant<\|end_of_role\|>",

	isinstance(message.get("content"), str) and "<think>" in message["content"]
	isinstance(message.get("content"), str) and think_tag.strip() in message["content"]

Conversation

divya-kumari32 commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

garrett361 commented Aug 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dangxuanhong commented Sep 18, 2025

Uh oh!

fabianlim Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

fabianlim Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

fabianlim left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fabianlim Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

divya-kumari32 Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabianlim Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

divya-kumari32 Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

divya-kumari32 commented Aug 26, 2025 •

edited

Loading

divya-kumari32 Sep 24, 2025 •

edited

Loading

divya-kumari32 Sep 24, 2025 •

edited

Loading