dataset_utils: add mask_out_tokens to train_on_responses_only (fixes unslothai/unsloth#6695) by pjordanandrsn · Pull Request #852 · unslothai/unsloth-zoo

pjordanandrsn · 2026-07-03T02:14:53Z

Implements the feature requested in unslothai/unsloth#6695: a way to keep specific tokens —
canonically the </think> closer — out of the loss even inside kept response spans (the
Nemotron-Ultra recipe the issue cites).

What it does

train_on_responses_only(trainer, instruction_part=..., response_part=...,
                        mask_out_tokens=["</think>"])

After response spans are unmasked, each listed string is re-masked to −100 wherever its token-id
sequence occurs. Matching is by tokenized id sequence plus a leading-space variant (covers
SentencePiece-style in-context merges); atomic added tokens like </think> always match exactly.
The scan is O(n) integer comparisons inside the existing per-example closure; sequences are
precomputed once outside it.

Verification (real Qwen3-0.6B chat render, two assistant turns)

Baseline behavior preserved: without the kwarg, output labels are identical to current main
(asserted elementwise on the rendered example).
The motivating case: current main trains on </think> (id 151668) inside the kept response;
with mask_out_tokens=["</think>"] the label diff is exactly that position set to −100 — nothing
else changes.
Multi-token strings: mask_out_tokens=["Final answer:"] masks exactly the trainable
positions of both in-context occurrences (matched via the bare [19357, 4226, 25] and
space-prefixed [13023, 4226, 25] variants), all set to −100, nothing else touched.

Happy to add a unit test in your suite mirroring the above (render → label-diff assertions) if
you'd like it in-tree — say the word and I'll push it to this branch.

Re-mask listed token strings to -100 inside kept response spans (e.g. mask_out_tokens=["</think>"], the Nemotron-Ultra recipe from unslothai/unsloth#6695). Each entry matches as its tokenized id sequence, plus a leading-space variant for SentencePiece-style in-context merges; atomic added tokens always match exactly. No behavior change when the kwarg is absent. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a mask_out_tokens parameter to the train_on_responses_only function in unsloth_zoo/dataset_utils.py, allowing specific token sequences (like </think>) to be masked out even within kept response spans. The reviewer suggests adding explicit type validation for mask_out_tokens to prevent potential runtime errors if invalid types are passed.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-07-03T02:16:25Z

+    mask_out_sequences = []
+    if mask_out_tokens:
+        if isinstance(mask_out_tokens, str): mask_out_tokens = [mask_out_tokens]
+        for token_string in mask_out_tokens:
+            for candidate in dict.fromkeys((token_string, " " + token_string)):
+                ids = tokenizer(candidate, add_special_tokens = False).input_ids
+                if ids and ids not in mask_out_sequences:
+                    mask_out_sequences.append(ids)
+    pass


To ensure robust defensive programming and prevent cryptic runtime errors (such as TypeError during string concatenation or iteration), we should explicitly validate that mask_out_tokens is either a string or a sequence of strings.

mask_out_sequences = [] if mask_out_tokens: if isinstance(mask_out_tokens, str): mask_out_tokens = [mask_out_tokens] elif not isinstance(mask_out_tokens, (list, tuple)): raise TypeError("Unsloth: mask_out_tokens must be a string, list, or tuple of strings.") for token_string in mask_out_tokens: if not isinstance(token_string, str): raise TypeError(f"Unsloth: mask_out_tokens elements must be strings, but got {type(token_string).__name__}") for candidate in dict.fromkeys((token_string, " " + token_string)): ids = tokenizer(candidate, add_special_tokens = False).input_ids if ids and ids not in mask_out_sequences: mask_out_sequences.append(ids) pass

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 85a5fa6034

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-07-03T02:39:54Z

    return_function   = False, # Useful for iterating over lists
    num_proc          = None,
    last_response_only = False, # Train only on the last assistant turn
+    mask_out_tokens   = None,  # e.g. ["</think>"] - also mask these inside kept responses


Propagate mask_out_tokens to vision collators

When callers use this new kwarg with a vision trainer, train_on_responses_only takes the later _is_vision_collator branch and assigns data_collator.train_on_responses_only from a recursive call at lines 695-702, but that call omits mask_out_tokens, so the inner closure builds an empty mask_out_sequences. This means VLM response-only training still includes </think> or any requested token in the loss despite accepting mask_out_tokens=[...]; pass the kwarg through the collator masking setup as well.

Useful? React with 👍 / 👎.

Fixed in 76587db — the vision-collator recursive call now passes mask_out_tokens through, so the VLM path applies the same re-masking instead of silently ignoring the kwarg. Text-path label output is regression-identical.

chatgpt-codex-connector · 2026-07-03T02:39:54Z

    return_function   = False, # Useful for iterating over lists
    num_proc          = None,
    last_response_only = False, # Train only on the last assistant turn
+    mask_out_tokens   = None,  # e.g. ["</think>"] - also mask these inside kept responses


Forward mask_out_tokens through the MLX wrapper

When using the MLX API (unsloth_zoo.mlx.trainer.train_on_responses_only), this newly added parameter cannot be used because the wrapper signature at unsloth_zoo/mlx/trainer.py:2877-2886 and its call into this helper at 2933-2941 were not updated. Passing mask_out_tokens=[...] through the documented MLX mirror of the HF/Unsloth API raises TypeError before it reaches this implementation, so MLX training cannot apply the new masking behavior.

Useful? React with 👍 / 👎.

Fixed in 76587db — the MLX wrapper signature now accepts mask_out_tokens and forwards it to the HF implementation, mirroring the rest of the API surface.

…r and MLX paths The vision branch's recursive call omitted the new kwarg (silently ignoring it for VLM training) and the MLX wrapper's signature rejected it with a TypeError. Pass it through both. Text-path output is unchanged (label-diff regression identical). Addresses both Codex P2 review findings on unslothai#852. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

pjordanandrsn mentioned this pull request Jul 3, 2026

[Feature] Function to mask thinking closing token unslothai/unsloth#6695

Open

gemini-code-assist Bot reviewed Jul 3, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dataset_utils: add mask_out_tokens to train_on_responses_only (fixes unslothai/unsloth#6695)#852

dataset_utils: add mask_out_tokens to train_on_responses_only (fixes unslothai/unsloth#6695)#852
pjordanandrsn wants to merge 2 commits into
unslothai:mainfrom
pjordanandrsn:fix/6695-mask-out-tokens

pjordanandrsn commented Jul 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jul 3, 2026

Uh oh!

pjordanandrsn Jul 3, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot Jul 3, 2026

Uh oh!

pjordanandrsn Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pjordanandrsn commented Jul 3, 2026

What it does

Verification (real Qwen3-0.6B chat render, two assistant turns)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

pjordanandrsn Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

pjordanandrsn Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pjordanandrsn Jul 3, 2026 •

edited

Loading