fix(gguf): preserve Qwen3.5/3.6 MTP config so nextn tensors convert#847
fix(gguf): preserve Qwen3.5/3.6 MTP config so nextn tensors convert#847LeoBorcherding wants to merge 1 commit into
Conversation
convert_to_gguf stripped mtp_num_hidden_layers from config.json, which shrank the converter's block_count to num_hidden_layers. The leftover mtp.* weights were then remapped onto model.layers.<N>.eh_proj, a now out-of-range layer index, raising 'Can not map tensor ...eh_proj'. Keep the MTP config (gguf-py has full nextn support) and only strip the internal unsloth_fixed_mtp marker, so mtp.* maps cleanly to blk.N.nextn.*.
There was a problem hiding this comment.
Code Review
This pull request modifies the GGUF conversion process in unsloth_zoo/llama_cpp.py to preserve the mtp_num_hidden_layers configuration key instead of stripping it. This ensures that downstream conversion tools keep the extra nextn block and correctly map the MTP tensors for Qwen3.5/3.6 checkpoints, preventing tensor mapping errors. There are no review comments, so no feedback is provided.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 703b5eb803
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # Only strip Unsloth's internal marker, never the real MTP config. | ||
| _changed = False | ||
| for _key in ("mtp_num_hidden_layers", "unsloth_fixed_mtp"): | ||
| for _key in ("unsloth_fixed_mtp",): |
There was a problem hiding this comment.
Preserve MTP only when tensors are present
When input_folder is a saved/fine-tuned Qwen checkpoint whose config.json still advertises mtp_num_hidden_layers but the exported safetensors only contain the base num_hidden_layers, this loop no longer removes that stale key. The downstream converter will therefore inflate block_count/nextn metadata for a layer that has no weights (the MLX path has _sync_gguf_nextn_layer_config specifically to handle this mismatch), so direct convert_to_gguf() callers regress from a convertible base model to a failed or unloadable GGUF unless they manually scrub the config first.
Useful? React with 👍 / 👎.
Summary
GGUF conversion of Qwen3.5/3.6 models that have an MTP (nextn) head fails with:
convert_to_ggufstripsmtp_num_hidden_layersfromconfig.jsonbefore invoking the converter. That shrinks the converter'sblock_countback tonum_hidden_layers, but the checkpoint still ships themtp.*weights._Qwen35MtpMixin.filter_tensorsthen remapsmtp.fc→model.layers.<num_hidden_layers>.eh_proj— a layer index that no longer exists — somap_tensor_nameraises.gguf-py already has full nextn support (
blk.{bid}.nextn.eh_proj,enorm,hnorm,shared_head.*), so the correct fix is to keep the MTP config and let the converter's existing nextn path map themtp.*weights. Only the internalunsloth_fixed_mtpmarker is stripped.Test plan
mtp_num_hidden_layers: 1) to GGUF — previously failed ateh_proj, now converts cleanly and emitsblk.<N>.nextn.*tensors.Companion to unslothai/unsloth#6107 (GGUF shard size control), which surfaced this on Qwen3.5.