MLX: clear error when mlx-lm/mlx-vlm is too old for a QK-norm arch (gemma4/qwen3_5) by danielhanchen · Pull Request #845 · unslothai/unsloth-zoo

danielhanchen · 2026-07-02T09:26:22Z

Problem

Loading a QK-norm MLX checkpoint (gemma4, qwen3_5) on an mlx-lm / mlx-vlm that predates the architecture fails with a raw, unactionable error:

Received 140 parameters not in model:
language_model.model.layers.15.self_attn.k_norm.weight, ...

This is what users hit on mlx-lm 0.31.3 (see mlx-lm #1242). The q_norm / k_norm weights are load-bearing, so silently dropping them would produce a numerically broken model.

Fix

Detect the QK-norm strict-load signature (parameters not in model naming q_norm / k_norm) at both strict re-raise sites in the loader and raise a clear, actionable error that names the cause and the fix (reinstall an arch-complete mlx-lm, not 0.31.3) instead of the raw ValueError. Covers gemma4, qwen3_5, and any future QK-norm arch generically. Non-QK-norm mismatches still fall through to the existing gemma4 KV-sharing / quant-projection filters.

The loader does not add a q_norm/k_norm-dropping fallback: those weights matter, so the correct resolution is upgrading the MLX stack (companion fix in the Studio installer excludes mlx-lm==0.31.3).

Validation

pytest tests/test_mlx_qk_norm_version_gap.py (3 passing): the tester's exact gemma4 message and a qwen3_5 q_norm message raise the actionable error; a non-QK-norm mismatch and an unrelated error pass through untouched.

Loading a QK-norm checkpoint (gemma4, qwen3_5) on an mlx-lm/mlx-vlm that predates the arch fails with a raw mlx ValueError: "Received N parameters not in model: ...self_attn.k_norm.weight...". This is the tester-facing failure on mlx-lm 0.31.3 (see mlx-lm #1242). Dropping q_norm/k_norm would give a numerically broken model, so detect that signature at both strict re-raise sites and raise an actionable error naming the fix (reinstall an arch-complete mlx-lm, not 0.31.3). Non-QK-norm mismatches still fall through to the existing gemma4 filters.

gemini-code-assist

Code Review

This pull request introduces a safety check _raise_if_qk_norm_version_gap to detect when the installed mlx-lm or mlx-vlm packages are too old or regressed (e.g., version 0.31.3) to support QK-norm architectures like Gemma 4 and Qwen 3.5, raising a clear, actionable error instead of silently dropping weights. It also adds corresponding unit tests. The reviewer identified a critical issue where the new check could be bypassed if the error message matches other known fallbacks or filters, and suggested executing the QK-norm version gap check before checking the known fallbacks/filters to prevent silently dropping load-bearing weights.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-07-02T09:28:34Z

        message = str(error)
        rule = _KNOWN_MLX_LM_STRICT_FALLBACKS.get(model_type)
        if rule is None or not _message_matches_known_fallback(message, rule):
+            _raise_if_qk_norm_version_gap(model_type, message, error)
            raise


If the error message contains both the QK-norm weights (e.g., k_norm) and the KV-sharing weights (e.g., k_proj), the check _message_matches_known_fallback(message, rule) will evaluate to True. As a result, the loader will skip the QK-norm version gap check and proceed to load the model with strict=False. This will silently drop the load-bearing k_norm and q_norm weights, leading to a numerically broken model.

To prevent this, we should execute the QK-norm version gap check before checking the known fallbacks. If a QK-norm weight is rejected, we must raise the actionable error immediately.

Suggested change

message = str(error)

rule = _KNOWN_MLX_LM_STRICT_FALLBACKS.get(model_type)

if rule is None or not _message_matches_known_fallback(message, rule):

_raise_if_qk_norm_version_gap(model_type, message, error)

raise

message = str(error)

_raise_if_qk_norm_version_gap(model_type, message, error)

rule = _KNOWN_MLX_LM_STRICT_FALLBACKS.get(model_type)

if rule is None or not _message_matches_known_fallback(message, rule):

raise

gemini-code-assist · 2026-07-02T09:28:34Z

        message = str(error)
        rule = _KNOWN_VLM_EXTRA_WEIGHT_FILTERS.get(model_type)
        if rule is None or not _message_matches_known_fallback(message, rule):
+            _raise_if_qk_norm_version_gap(model_type, message, error)
            raise


Similarly, if the error message contains both the QK-norm weights and the extra quantized per-layer projection state, the check _message_matches_known_fallback(message, rule) will evaluate to True. This will bypass the QK-norm version gap check and proceed to load the model with the extra weight filter, silently dropping the load-bearing k_norm and q_norm weights.

Executing the QK-norm version gap check before checking the known filters ensures that we raise the actionable error immediately if any QK-norm weights are rejected.

Suggested change

message = str(error)

rule = _KNOWN_VLM_EXTRA_WEIGHT_FILTERS.get(model_type)

if rule is None or not _message_matches_known_fallback(message, rule):

_raise_if_qk_norm_version_gap(model_type, message, error)

raise

message = str(error)

_raise_if_qk_norm_version_gap(model_type, message, error)

rule = _KNOWN_VLM_EXTRA_WEIGHT_FILTERS.get(model_type)

if rule is None or not _message_matches_known_fallback(message, rule):

raise

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef4098bca9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-02T09:31:04Z

        if rule is None or not _message_matches_known_fallback(message, rule):
+            _raise_if_qk_norm_version_gap(model_type, message, error)


Check QK-norm before falling back to strict=False

When a Gemma4 text load error contains both the known KV-sharing extras (self_attn.k_proj/self_attn.v_proj) and rejected QK-norm weights, this condition treats it as the safe fallback and skips the new guard, so the subsequent load_model(..., strict=False) will silently drop the load-bearing q_norm/k_norm tensors. Check _raise_if_qk_norm_version_gap before accepting the strict=False fallback so mixed mismatch messages do not return a numerically broken model.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-02T09:31:04Z

        if rule is None or not _message_matches_known_fallback(message, rule):
+            _raise_if_qk_norm_version_gap(model_type, message, error)


Check QK-norm before applying the VLM retry

When a Gemma4 VLM mismatch includes both the allow-listed projection quant state and rejected q_norm/k_norm weights, the existing filter matches first and skips the new QK-norm guard. The retry only removes the projection keys, leaving the QK-norm keys to fail again with the raw MLX ValueError, so users still miss the actionable version-gap message in this mixed-error case; run the QK-norm check before the known-extra retry.

Useful? React with 👍 / 👎.

The gemma4 KV-sharing fallback loads strict=False when the mismatch names k_proj/v_proj. A broken mlx-lm reports k_norm in the same message (the tester's 'Received 126 parameters not in model: ...k_norm..., ...k_proj..., ...v_proj...'), so the fallback would silently drop the load-bearing QK-norm weights and yield a numerically broken model. Run the QK-norm version-gap check before the KV-sharing / extra-weight fallbacks in both loaders so any q_norm/k_norm mismatch raises the clear upgrade error instead of dropping the norms.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3b585bd32f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-02T10:35:00Z

        message = str(error)
+        # QK-norm weights are load-bearing: never filter past them (see the
+        # mlx-lm path). Check before the known extra-weight filter below.
+        _raise_if_qk_norm_version_gap(model_type, message, error)


Guard runtime-quantized VLM loads too

This QK-norm check only runs through _load_mlx_vlm_with_extra_weight_filter, but the VLM runtime-quantization branch calls vlm_load(...) directly before quantizing (want_runtime_quant, lines 4082-4088). In that environment, loading a Gemma4/Qwen3.5 VLM on an mlx-vlm build that rejects q_norm/k_norm still surfaces the raw parameters not in model ValueError instead of the new actionable version-gap message. Route that direct load through the same guard or catch its ValueError before applying quantization.

Useful? React with 👍 / 👎.

The want_runtime_quant VLM branch calls vlm_load() directly before quantizing, bypassing _load_mlx_vlm_with_extra_weight_filter and its QK-norm guard. Wrap it so a q_norm/k_norm strict-load failure raises the clear upgrade error there as well. The text runtime-quant path already routes through _load_mlx_lm_with_strict_fallback, which is guarded.

chatgpt-codex-connector · 2026-07-02T12:17:11Z

💡 Codex Review

unsloth-zoo/unsloth_zoo/mlx/loader.py

Line 302 in 0eabafd

f'`pip install "mlx-lm!=0.31.3" "mlx-vlm"`. See mlx-lm #1242.{installed}'

Force an upgrade in the stale-MLX remediation

When the installed package is too old but not exactly 0.31.3 (for example an older MLX stack that already satisfies mlx-lm!=0.31.3), this suggested pip install can leave the stale package in place; I checked python -m pip install --help, where -U, --upgrade is the option documented to upgrade specified packages. Since this error path is reached specifically because the current MLX install rejects required QK-norm weights, the remediation should force an upgrade or specify a known-good minimum, otherwise users can follow the message and hit the same load failure again.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

danielhanchen · 2026-07-02T12:19:45Z

The QK-norm guard now runs before every strict=False / extra-weight fallback (3b585bd), so an error listing both KV-sharing keys and q_norm/k_norm raises the clear upgrade error instead of dropping the norms. Also guarded the runtime-quantized VLM branch (0eabafd), which calls vlm_load() directly before quantizing and previously bypassed the check; the text runtime-quant path already routes through the guarded loader.

…rsion probe The suggested fix string used a plain pip install, so a user already on an old-but-not-0.31.3 mlx-lm would reinstall the same too-old build and hit the error again. Add -U and the known-good floor: pip install -U mlx-lm>=0.22.0,!=0.31.3. Also add a one-line comment on the best-effort version-probe except block.

danielhanchen · 2026-07-03T09:13:26Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 158e2de688

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-03T09:17:42Z

+        # first so an error carrying both KV-sharing keys and q_norm/k_norm (a
+        # broken mlx-lm) raises a clear error instead of silently dropping the
+        # norms via the strict=False fallback below.
+        _raise_if_qk_norm_version_gap(model_type, message, error)


Preserve the Gemma4 shared-KV fallback

When a gemma4_text checkpoint hits the known shared-KV mismatch and the error includes the shared-layer k_norm alongside k_proj/v_proj (the exact 126-parameter error from mlx-lm #1242), this guard now runs before _KNOWN_MLX_LM_STRICT_FALLBACKS and raises instead of retrying load_model(..., strict=False). Those shared-layer K/V tensors are intentionally absent from the MLX model -- the repo’s own converter skips k_proj/v_proj and self_attn.k_norm for gemma4_kv_shared_layers in unsloth_zoo/vllm_utils.py:1147-1150,1188 -- so treating that signature as an unsupported QK-norm version gap unnecessarily breaks the existing safe Gemma4 text load path. Please exempt the Gemma4 shared-KV signature or let that fallback handle errors that only contain the shared k_norm with K/V extras.

Useful? React with 👍 / 👎.

danielhanchen · 2026-07-03T11:12:46Z

@codex review

chatgpt-codex-connector · 2026-07-03T11:15:52Z

Codex Review: Didn't find any major issues. Another round soon, please!

Reviewed commit: 158e2de688

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

danielhanchen · 2026-07-03T11:49:57Z

@codex review

chatgpt-codex-connector · 2026-07-03T11:50:04Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

danielhanchen · 2026-07-03T12:32:12Z

@codex review

chatgpt-codex-connector · 2026-07-03T12:36:08Z

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

Reviewed commit: 158e2de688

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-03T13:28:34Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

danielhanchen · 2026-07-03T14:35:01Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8766c18845

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-03T14:40:01Z

+        # first so an error carrying both KV-sharing keys and q_norm/k_norm (a
+        # broken mlx-lm) raises a clear error instead of silently dropping the
+        # norms via the strict=False fallback below.
+        _raise_if_qk_norm_version_gap(model_type, message, error)


Keep Gemma4 KV-sharing fallback from being preempted

When a Gemma4 text checkpoint has extra weights only for KV-shared layers, the existing fallback below is meant to retry with strict=False once the message contains self_attn.k_proj and self_attn.v_proj. Those same safe KV-sharing mismatches can also include self_attn.k_norm for layers that the MLX Gemma4 model intentionally does not instantiate, so this new guard raises before the fallback and turns supported Gemma4 text loads into hard failures instead of dropping the unused shared-layer extras.

Useful? React with 👍 / 👎.

danielhanchen · 2026-07-03T18:43:12Z

@codex review

chatgpt-codex-connector · 2026-07-03T18:45:55Z

Codex Review: Didn't find any major issues. 🎉

Reviewed commit: 8766c18845

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

danielhanchen · 2026-07-03T20:11:31Z

@codex review

chatgpt-codex-connector · 2026-07-03T20:16:03Z

Codex Review: Didn't find any major issues. 👍

Reviewed commit: 8766c18845

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

gemini-code-assist Bot reviewed Jul 2, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jul 2, 2026

View reviewed changes

github-code-quality Bot found potential problems Jul 2, 2026

View reviewed changes

Comment thread unsloth_zoo/mlx/loader.py Fixed

chatgpt-codex-connector Bot reviewed Jul 3, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into pr845-main-update

8766c18

chatgpt-codex-connector Bot reviewed Jul 3, 2026

View reviewed changes

		if rule is None or not _message_matches_known_fallback(message, rule):
		_raise_if_qk_norm_version_gap(model_type, message, error)

Uh oh!

Conversation

danielhanchen commented Jul 2, 2026

Problem

Fix

Validation

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented Jul 2, 2026

💡 Codex Review

Uh oh!

danielhanchen commented Jul 2, 2026

Uh oh!

Uh oh!

danielhanchen commented Jul 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Jul 3, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jul 3, 2026

Uh oh!

danielhanchen commented Jul 3, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jul 3, 2026

Uh oh!

danielhanchen commented Jul 3, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jul 3, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jul 3, 2026

Uh oh!

danielhanchen commented Jul 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Jul 3, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jul 3, 2026

Uh oh!

danielhanchen commented Jul 3, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jul 3, 2026

Uh oh!