Skip to content

MLX: clear error when mlx-lm/mlx-vlm is too old for a QK-norm arch (gemma4/qwen3_5)#845

Open
danielhanchen wants to merge 5 commits into
mainfrom
mlx-qk-norm-clear-error
Open

MLX: clear error when mlx-lm/mlx-vlm is too old for a QK-norm arch (gemma4/qwen3_5)#845
danielhanchen wants to merge 5 commits into
mainfrom
mlx-qk-norm-clear-error

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Problem

Loading a QK-norm MLX checkpoint (gemma4, qwen3_5) on an mlx-lm / mlx-vlm that predates the architecture fails with a raw, unactionable error:

Received 140 parameters not in model:
language_model.model.layers.15.self_attn.k_norm.weight, ...

This is what users hit on mlx-lm 0.31.3 (see mlx-lm #1242). The q_norm / k_norm weights are load-bearing, so silently dropping them would produce a numerically broken model.

Fix

Detect the QK-norm strict-load signature (parameters not in model naming q_norm / k_norm) at both strict re-raise sites in the loader and raise a clear, actionable error that names the cause and the fix (reinstall an arch-complete mlx-lm, not 0.31.3) instead of the raw ValueError. Covers gemma4, qwen3_5, and any future QK-norm arch generically. Non-QK-norm mismatches still fall through to the existing gemma4 KV-sharing / quant-projection filters.

The loader does not add a q_norm/k_norm-dropping fallback: those weights matter, so the correct resolution is upgrading the MLX stack (companion fix in the Studio installer excludes mlx-lm==0.31.3).

Validation

pytest tests/test_mlx_qk_norm_version_gap.py (3 passing): the tester's exact gemma4 message and a qwen3_5 q_norm message raise the actionable error; a non-QK-norm mismatch and an unrelated error pass through untouched.

Loading a QK-norm checkpoint (gemma4, qwen3_5) on an mlx-lm/mlx-vlm that predates
the arch fails with a raw mlx ValueError: "Received N parameters not in model:
...self_attn.k_norm.weight...". This is the tester-facing failure on mlx-lm
0.31.3 (see mlx-lm #1242). Dropping q_norm/k_norm would give a numerically broken
model, so detect that signature at both strict re-raise sites and raise an
actionable error naming the fix (reinstall an arch-complete mlx-lm, not 0.31.3).
Non-QK-norm mismatches still fall through to the existing gemma4 filters.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a safety check _raise_if_qk_norm_version_gap to detect when the installed mlx-lm or mlx-vlm packages are too old or regressed (e.g., version 0.31.3) to support QK-norm architectures like Gemma 4 and Qwen 3.5, raising a clear, actionable error instead of silently dropping weights. It also adds corresponding unit tests. The reviewer identified a critical issue where the new check could be bypassed if the error message matches other known fallbacks or filters, and suggested executing the QK-norm version gap check before checking the known fallbacks/filters to prevent silently dropping load-bearing weights.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread unsloth_zoo/mlx/loader.py
Comment on lines 414 to 418
message = str(error)
rule = _KNOWN_MLX_LM_STRICT_FALLBACKS.get(model_type)
if rule is None or not _message_matches_known_fallback(message, rule):
_raise_if_qk_norm_version_gap(model_type, message, error)
raise

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If the error message contains both the QK-norm weights (e.g., k_norm) and the KV-sharing weights (e.g., k_proj), the check _message_matches_known_fallback(message, rule) will evaluate to True. As a result, the loader will skip the QK-norm version gap check and proceed to load the model with strict=False. This will silently drop the load-bearing k_norm and q_norm weights, leading to a numerically broken model.

To prevent this, we should execute the QK-norm version gap check before checking the known fallbacks. If a QK-norm weight is rejected, we must raise the actionable error immediately.

Suggested change
message = str(error)
rule = _KNOWN_MLX_LM_STRICT_FALLBACKS.get(model_type)
if rule is None or not _message_matches_known_fallback(message, rule):
_raise_if_qk_norm_version_gap(model_type, message, error)
raise
message = str(error)
_raise_if_qk_norm_version_gap(model_type, message, error)
rule = _KNOWN_MLX_LM_STRICT_FALLBACKS.get(model_type)
if rule is None or not _message_matches_known_fallback(message, rule):
raise

Comment thread unsloth_zoo/mlx/loader.py
Comment on lines 455 to 459
message = str(error)
rule = _KNOWN_VLM_EXTRA_WEIGHT_FILTERS.get(model_type)
if rule is None or not _message_matches_known_fallback(message, rule):
_raise_if_qk_norm_version_gap(model_type, message, error)
raise

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similarly, if the error message contains both the QK-norm weights and the extra quantized per-layer projection state, the check _message_matches_known_fallback(message, rule) will evaluate to True. This will bypass the QK-norm version gap check and proceed to load the model with the extra weight filter, silently dropping the load-bearing k_norm and q_norm weights.

Executing the QK-norm version gap check before checking the known filters ensures that we raise the actionable error immediately if any QK-norm weights are rejected.

Suggested change
message = str(error)
rule = _KNOWN_VLM_EXTRA_WEIGHT_FILTERS.get(model_type)
if rule is None or not _message_matches_known_fallback(message, rule):
_raise_if_qk_norm_version_gap(model_type, message, error)
raise
message = str(error)
_raise_if_qk_norm_version_gap(model_type, message, error)
rule = _KNOWN_VLM_EXTRA_WEIGHT_FILTERS.get(model_type)
if rule is None or not _message_matches_known_fallback(message, rule):
raise

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef4098bca9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/mlx/loader.py Outdated
Comment on lines +416 to +417
if rule is None or not _message_matches_known_fallback(message, rule):
_raise_if_qk_norm_version_gap(model_type, message, error)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Check QK-norm before falling back to strict=False

When a Gemma4 text load error contains both the known KV-sharing extras (self_attn.k_proj/self_attn.v_proj) and rejected QK-norm weights, this condition treats it as the safe fallback and skips the new guard, so the subsequent load_model(..., strict=False) will silently drop the load-bearing q_norm/k_norm tensors. Check _raise_if_qk_norm_version_gap before accepting the strict=False fallback so mixed mismatch messages do not return a numerically broken model.

Useful? React with 👍 / 👎.

Comment thread unsloth_zoo/mlx/loader.py Outdated
Comment on lines +457 to +458
if rule is None or not _message_matches_known_fallback(message, rule):
_raise_if_qk_norm_version_gap(model_type, message, error)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check QK-norm before applying the VLM retry

When a Gemma4 VLM mismatch includes both the allow-listed projection quant state and rejected q_norm/k_norm weights, the existing filter matches first and skips the new QK-norm guard. The retry only removes the projection keys, leaving the QK-norm keys to fail again with the raw MLX ValueError, so users still miss the actionable version-gap message in this mixed-error case; run the QK-norm check before the known-extra retry.

Useful? React with 👍 / 👎.

The gemma4 KV-sharing fallback loads strict=False when the mismatch names
k_proj/v_proj. A broken mlx-lm reports k_norm in the same message (the tester's
'Received 126 parameters not in model: ...k_norm..., ...k_proj..., ...v_proj...'),
so the fallback would silently drop the load-bearing QK-norm weights and yield a
numerically broken model. Run the QK-norm version-gap check before the
KV-sharing / extra-weight fallbacks in both loaders so any q_norm/k_norm mismatch
raises the clear upgrade error instead of dropping the norms.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3b585bd32f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/mlx/loader.py
message = str(error)
# QK-norm weights are load-bearing: never filter past them (see the
# mlx-lm path). Check before the known extra-weight filter below.
_raise_if_qk_norm_version_gap(model_type, message, error)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard runtime-quantized VLM loads too

This QK-norm check only runs through _load_mlx_vlm_with_extra_weight_filter, but the VLM runtime-quantization branch calls vlm_load(...) directly before quantizing (want_runtime_quant, lines 4082-4088). In that environment, loading a Gemma4/Qwen3.5 VLM on an mlx-vlm build that rejects q_norm/k_norm still surfaces the raw parameters not in model ValueError instead of the new actionable version-gap message. Route that direct load through the same guard or catch its ValueError before applying quantization.

Useful? React with 👍 / 👎.

The want_runtime_quant VLM branch calls vlm_load() directly before quantizing,
bypassing _load_mlx_vlm_with_extra_weight_filter and its QK-norm guard. Wrap it
so a q_norm/k_norm strict-load failure raises the clear upgrade error there as
well. The text runtime-quant path already routes through
_load_mlx_lm_with_strict_fallback, which is guarded.
@chatgpt-codex-connector

Copy link
Copy Markdown

💡 Codex Review

f'`pip install "mlx-lm!=0.31.3" "mlx-vlm"`. See mlx-lm #1242.{installed}'

P2 Badge Force an upgrade in the stale-MLX remediation

When the installed package is too old but not exactly 0.31.3 (for example an older MLX stack that already satisfies mlx-lm!=0.31.3), this suggested pip install can leave the stale package in place; I checked python -m pip install --help, where -U, --upgrade is the option documented to upgrade specified packages. Since this error path is reached specifically because the current MLX install rejects required QK-norm weights, the remediation should force an upgrade or specify a known-good minimum, otherwise users can follow the message and hit the same load failure again.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danielhanchen

Copy link
Copy Markdown
Member Author

The QK-norm guard now runs before every strict=False / extra-weight fallback (3b585bd), so an error listing both KV-sharing keys and q_norm/k_norm raises the clear upgrade error instead of dropping the norms. Also guarded the runtime-quantized VLM branch (0eabafd), which calls vlm_load() directly before quantizing and previously bypassed the check; the text runtime-quant path already routes through the guarded loader.

Comment thread unsloth_zoo/mlx/loader.py Fixed
…rsion probe

The suggested fix string used a plain pip install, so a user already on an
old-but-not-0.31.3 mlx-lm would reinstall the same too-old build and hit the
error again. Add -U and the known-good floor: pip install -U mlx-lm>=0.22.0,!=0.31.3.
Also add a one-line comment on the best-effort version-probe except block.
@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 158e2de688

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/mlx/loader.py
# first so an error carrying both KV-sharing keys and q_norm/k_norm (a
# broken mlx-lm) raises a clear error instead of silently dropping the
# norms via the strict=False fallback below.
_raise_if_qk_norm_version_gap(model_type, message, error)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve the Gemma4 shared-KV fallback

When a gemma4_text checkpoint hits the known shared-KV mismatch and the error includes the shared-layer k_norm alongside k_proj/v_proj (the exact 126-parameter error from mlx-lm #1242), this guard now runs before _KNOWN_MLX_LM_STRICT_FALLBACKS and raises instead of retrying load_model(..., strict=False). Those shared-layer K/V tensors are intentionally absent from the MLX model -- the repo’s own converter skips k_proj/v_proj and self_attn.k_norm for gemma4_kv_shared_layers in unsloth_zoo/vllm_utils.py:1147-1150,1188 -- so treating that signature as an unsupported QK-norm version gap unnecessarily breaks the existing safe Gemma4 text load path. Please exempt the Gemma4 shared-KV signature or let that fallback handle errors that only contain the shared k_norm with K/V extras.

Useful? React with 👍 / 👎.

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Another round soon, please!

Reviewed commit: 158e2de688

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

Reviewed commit: 158e2de688

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8766c18845

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth_zoo/mlx/loader.py
# first so an error carrying both KV-sharing keys and q_norm/k_norm (a
# broken mlx-lm) raises a clear error instead of silently dropping the
# norms via the strict=False fallback below.
_raise_if_qk_norm_version_gap(model_type, message, error)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep Gemma4 KV-sharing fallback from being preempted

When a Gemma4 text checkpoint has extra weights only for KV-shared layers, the existing fallback below is meant to retry with strict=False once the message contains self_attn.k_proj and self_attn.v_proj. Those same safe KV-sharing mismatches can also include self_attn.k_norm for layers that the MLX Gemma4 model intentionally does not instantiate, so this new guard raises before the fallback and turns supported Gemma4 text loads into hard failures instead of dropping the unused shared-layer extras.

Useful? React with 👍 / 👎.

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

Reviewed commit: 8766c18845

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

Reviewed commit: 8766c18845

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant