dataset_utils: add mask_out_tokens to train_on_responses_only (fixes unslothai/unsloth#6695)#852
Conversation
Re-mask listed token strings to -100 inside kept response spans (e.g. mask_out_tokens=["</think>"], the Nemotron-Ultra recipe from unslothai/unsloth#6695). Each entry matches as its tokenized id sequence, plus a leading-space variant for SentencePiece-style in-context merges; atomic added tokens always match exactly. No behavior change when the kwarg is absent. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a mask_out_tokens parameter to the train_on_responses_only function in unsloth_zoo/dataset_utils.py, allowing specific token sequences (like </think>) to be masked out even within kept response spans. The reviewer suggests adding explicit type validation for mask_out_tokens to prevent potential runtime errors if invalid types are passed.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| mask_out_sequences = [] | ||
| if mask_out_tokens: | ||
| if isinstance(mask_out_tokens, str): mask_out_tokens = [mask_out_tokens] | ||
| for token_string in mask_out_tokens: | ||
| for candidate in dict.fromkeys((token_string, " " + token_string)): | ||
| ids = tokenizer(candidate, add_special_tokens = False).input_ids | ||
| if ids and ids not in mask_out_sequences: | ||
| mask_out_sequences.append(ids) | ||
| pass |
There was a problem hiding this comment.
To ensure robust defensive programming and prevent cryptic runtime errors (such as TypeError during string concatenation or iteration), we should explicitly validate that mask_out_tokens is either a string or a sequence of strings.
mask_out_sequences = []
if mask_out_tokens:
if isinstance(mask_out_tokens, str):
mask_out_tokens = [mask_out_tokens]
elif not isinstance(mask_out_tokens, (list, tuple)):
raise TypeError("Unsloth: mask_out_tokens must be a string, list, or tuple of strings.")
for token_string in mask_out_tokens:
if not isinstance(token_string, str):
raise TypeError(f"Unsloth: mask_out_tokens elements must be strings, but got {type(token_string).__name__}")
for candidate in dict.fromkeys((token_string, " " + token_string)):
ids = tokenizer(candidate, add_special_tokens = False).input_ids
if ids and ids not in mask_out_sequences:
mask_out_sequences.append(ids)
passThere was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 85a5fa6034
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return_function = False, # Useful for iterating over lists | ||
| num_proc = None, | ||
| last_response_only = False, # Train only on the last assistant turn | ||
| mask_out_tokens = None, # e.g. ["</think>"] - also mask these inside kept responses |
There was a problem hiding this comment.
Propagate mask_out_tokens to vision collators
When callers use this new kwarg with a vision trainer, train_on_responses_only takes the later _is_vision_collator branch and assigns data_collator.train_on_responses_only from a recursive call at lines 695-702, but that call omits mask_out_tokens, so the inner closure builds an empty mask_out_sequences. This means VLM response-only training still includes </think> or any requested token in the loss despite accepting mask_out_tokens=[...]; pass the kwarg through the collator masking setup as well.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 76587db — the vision-collator recursive call now passes mask_out_tokens through, so the VLM path applies the same re-masking instead of silently ignoring the kwarg. Text-path label output is regression-identical.
| return_function = False, # Useful for iterating over lists | ||
| num_proc = None, | ||
| last_response_only = False, # Train only on the last assistant turn | ||
| mask_out_tokens = None, # e.g. ["</think>"] - also mask these inside kept responses |
There was a problem hiding this comment.
Forward mask_out_tokens through the MLX wrapper
When using the MLX API (unsloth_zoo.mlx.trainer.train_on_responses_only), this newly added parameter cannot be used because the wrapper signature at unsloth_zoo/mlx/trainer.py:2877-2886 and its call into this helper at 2933-2941 were not updated. Passing mask_out_tokens=[...] through the documented MLX mirror of the HF/Unsloth API raises TypeError before it reaches this implementation, so MLX training cannot apply the new masking behavior.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 76587db — the MLX wrapper signature now accepts mask_out_tokens and forwards it to the HF implementation, mirroring the rest of the API surface.
…r and MLX paths The vision branch's recursive call omitted the new kwarg (silently ignoring it for VLM training) and the MLX wrapper's signature rejected it with a TypeError. Pass it through both. Text-path output is unchanged (label-diff regression identical). Addresses both Codex P2 review findings on unslothai#852. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Implements the feature requested in unslothai/unsloth#6695: a way to keep specific tokens —
canonically the
</think>closer — out of the loss even inside kept response spans (theNemotron-Ultra recipe the issue cites).
What it does
After response spans are unmasked, each listed string is re-masked to −100 wherever its token-id
sequence occurs. Matching is by tokenized id sequence plus a leading-space variant (covers
SentencePiece-style in-context merges); atomic added tokens like
</think>always match exactly.The scan is O(n) integer comparisons inside the existing per-example closure; sequences are
precomputed once outside it.
Verification (real Qwen3-0.6B chat render, two assistant turns)
(asserted elementwise on the rendered example).
</think>(id 151668) inside the kept response;with
mask_out_tokens=["</think>"]the label diff is exactly that position set to −100 — nothingelse changes.
mask_out_tokens=["Final answer:"]masks exactly the trainablepositions of both in-context occurrences (matched via the bare
[19357, 4226, 25]andspace-prefixed
[13023, 4226, 25]variants), all set to −100, nothing else touched.Happy to add a unit test in your suite mirroring the above (render → label-diff assertions) if
you'd like it in-tree — say the word and I'll push it to this branch.