Skip to content

fix(gguf): preserve Qwen3.5/3.6 MTP config so nextn tensors convert#847

Open
LeoBorcherding wants to merge 1 commit into
unslothai:mainfrom
LeoBorcherding:fix/qwen35-mtp-gguf-preserve
Open

fix(gguf): preserve Qwen3.5/3.6 MTP config so nextn tensors convert#847
LeoBorcherding wants to merge 1 commit into
unslothai:mainfrom
LeoBorcherding:fix/qwen35-mtp-gguf-preserve

Conversation

@LeoBorcherding

Copy link
Copy Markdown
Contributor

Summary

GGUF conversion of Qwen3.5/3.6 models that have an MTP (nextn) head fails with:

ValueError: Can not map tensor 'model.layers.<N>.eh_proj.weight'

convert_to_gguf strips mtp_num_hidden_layers from config.json before invoking the converter. That shrinks the converter's block_count back to num_hidden_layers, but the checkpoint still ships the mtp.* weights. _Qwen35MtpMixin.filter_tensors then remaps mtp.fcmodel.layers.<num_hidden_layers>.eh_proj — a layer index that no longer exists — so map_tensor_name raises.

gguf-py already has full nextn support (blk.{bid}.nextn.eh_proj, enorm, hnorm, shared_head.*), so the correct fix is to keep the MTP config and let the converter's existing nextn path map the mtp.* weights. Only the internal unsloth_fixed_mtp marker is stripped.

Test plan

  • Export Qwen3.5-2B (mtp_num_hidden_layers: 1) to GGUF — previously failed at eh_proj, now converts cleanly and emits blk.<N>.nextn.* tensors.
  • Verified end-to-end via Unsloth Studio GGUF export (BF16, incl. sharded output).

Companion to unslothai/unsloth#6107 (GGUF shard size control), which surfaced this on Qwen3.5.

convert_to_gguf stripped mtp_num_hidden_layers from config.json, which
shrank the converter's block_count to num_hidden_layers. The leftover
mtp.* weights were then remapped onto model.layers.<N>.eh_proj, a now
out-of-range layer index, raising 'Can not map tensor ...eh_proj'.
Keep the MTP config (gguf-py has full nextn support) and only strip the
internal unsloth_fixed_mtp marker, so mtp.* maps cleanly to blk.N.nextn.*.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the GGUF conversion process in unsloth_zoo/llama_cpp.py to preserve the mtp_num_hidden_layers configuration key instead of stripping it. This ensures that downstream conversion tools keep the extra nextn block and correctly map the MTP tensors for Qwen3.5/3.6 checkpoints, preventing tensor mapping errors. There are no review comments, so no feedback is provided.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 703b5eb803

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread unsloth_zoo/llama_cpp.py
# Only strip Unsloth's internal marker, never the real MTP config.
_changed = False
for _key in ("mtp_num_hidden_layers", "unsloth_fixed_mtp"):
for _key in ("unsloth_fixed_mtp",):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve MTP only when tensors are present

When input_folder is a saved/fine-tuned Qwen checkpoint whose config.json still advertises mtp_num_hidden_layers but the exported safetensors only contain the base num_hidden_layers, this loop no longer removes that stale key. The downstream converter will therefore inflate block_count/nextn metadata for a layer that has no weights (the MLX path has _sync_gguf_nextn_layer_config specifically to handle this mismatch), so direct convert_to_gguf() callers regress from a convertible base model to a failed or unloadable GGUF unless they manually scrub the config first.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant