MLX: clear error when mlx-lm/mlx-vlm is too old for a QK-norm arch (gemma4/qwen3_5)#845
MLX: clear error when mlx-lm/mlx-vlm is too old for a QK-norm arch (gemma4/qwen3_5)#845danielhanchen wants to merge 5 commits into
Conversation
Loading a QK-norm checkpoint (gemma4, qwen3_5) on an mlx-lm/mlx-vlm that predates the arch fails with a raw mlx ValueError: "Received N parameters not in model: ...self_attn.k_norm.weight...". This is the tester-facing failure on mlx-lm 0.31.3 (see mlx-lm #1242). Dropping q_norm/k_norm would give a numerically broken model, so detect that signature at both strict re-raise sites and raise an actionable error naming the fix (reinstall an arch-complete mlx-lm, not 0.31.3). Non-QK-norm mismatches still fall through to the existing gemma4 filters.
There was a problem hiding this comment.
Code Review
This pull request introduces a safety check _raise_if_qk_norm_version_gap to detect when the installed mlx-lm or mlx-vlm packages are too old or regressed (e.g., version 0.31.3) to support QK-norm architectures like Gemma 4 and Qwen 3.5, raising a clear, actionable error instead of silently dropping weights. It also adds corresponding unit tests. The reviewer identified a critical issue where the new check could be bypassed if the error message matches other known fallbacks or filters, and suggested executing the QK-norm version gap check before checking the known fallbacks/filters to prevent silently dropping load-bearing weights.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| message = str(error) | ||
| rule = _KNOWN_MLX_LM_STRICT_FALLBACKS.get(model_type) | ||
| if rule is None or not _message_matches_known_fallback(message, rule): | ||
| _raise_if_qk_norm_version_gap(model_type, message, error) | ||
| raise |
There was a problem hiding this comment.
If the error message contains both the QK-norm weights (e.g., k_norm) and the KV-sharing weights (e.g., k_proj), the check _message_matches_known_fallback(message, rule) will evaluate to True. As a result, the loader will skip the QK-norm version gap check and proceed to load the model with strict=False. This will silently drop the load-bearing k_norm and q_norm weights, leading to a numerically broken model.
To prevent this, we should execute the QK-norm version gap check before checking the known fallbacks. If a QK-norm weight is rejected, we must raise the actionable error immediately.
| message = str(error) | |
| rule = _KNOWN_MLX_LM_STRICT_FALLBACKS.get(model_type) | |
| if rule is None or not _message_matches_known_fallback(message, rule): | |
| _raise_if_qk_norm_version_gap(model_type, message, error) | |
| raise | |
| message = str(error) | |
| _raise_if_qk_norm_version_gap(model_type, message, error) | |
| rule = _KNOWN_MLX_LM_STRICT_FALLBACKS.get(model_type) | |
| if rule is None or not _message_matches_known_fallback(message, rule): | |
| raise |
| message = str(error) | ||
| rule = _KNOWN_VLM_EXTRA_WEIGHT_FILTERS.get(model_type) | ||
| if rule is None or not _message_matches_known_fallback(message, rule): | ||
| _raise_if_qk_norm_version_gap(model_type, message, error) | ||
| raise |
There was a problem hiding this comment.
Similarly, if the error message contains both the QK-norm weights and the extra quantized per-layer projection state, the check _message_matches_known_fallback(message, rule) will evaluate to True. This will bypass the QK-norm version gap check and proceed to load the model with the extra weight filter, silently dropping the load-bearing k_norm and q_norm weights.
Executing the QK-norm version gap check before checking the known filters ensures that we raise the actionable error immediately if any QK-norm weights are rejected.
| message = str(error) | |
| rule = _KNOWN_VLM_EXTRA_WEIGHT_FILTERS.get(model_type) | |
| if rule is None or not _message_matches_known_fallback(message, rule): | |
| _raise_if_qk_norm_version_gap(model_type, message, error) | |
| raise | |
| message = str(error) | |
| _raise_if_qk_norm_version_gap(model_type, message, error) | |
| rule = _KNOWN_VLM_EXTRA_WEIGHT_FILTERS.get(model_type) | |
| if rule is None or not _message_matches_known_fallback(message, rule): | |
| raise |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ef4098bca9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if rule is None or not _message_matches_known_fallback(message, rule): | ||
| _raise_if_qk_norm_version_gap(model_type, message, error) |
There was a problem hiding this comment.
Check QK-norm before falling back to strict=False
When a Gemma4 text load error contains both the known KV-sharing extras (self_attn.k_proj/self_attn.v_proj) and rejected QK-norm weights, this condition treats it as the safe fallback and skips the new guard, so the subsequent load_model(..., strict=False) will silently drop the load-bearing q_norm/k_norm tensors. Check _raise_if_qk_norm_version_gap before accepting the strict=False fallback so mixed mismatch messages do not return a numerically broken model.
Useful? React with 👍 / 👎.
| if rule is None or not _message_matches_known_fallback(message, rule): | ||
| _raise_if_qk_norm_version_gap(model_type, message, error) |
There was a problem hiding this comment.
Check QK-norm before applying the VLM retry
When a Gemma4 VLM mismatch includes both the allow-listed projection quant state and rejected q_norm/k_norm weights, the existing filter matches first and skips the new QK-norm guard. The retry only removes the projection keys, leaving the QK-norm keys to fail again with the raw MLX ValueError, so users still miss the actionable version-gap message in this mixed-error case; run the QK-norm check before the known-extra retry.
Useful? React with 👍 / 👎.
The gemma4 KV-sharing fallback loads strict=False when the mismatch names k_proj/v_proj. A broken mlx-lm reports k_norm in the same message (the tester's 'Received 126 parameters not in model: ...k_norm..., ...k_proj..., ...v_proj...'), so the fallback would silently drop the load-bearing QK-norm weights and yield a numerically broken model. Run the QK-norm version-gap check before the KV-sharing / extra-weight fallbacks in both loaders so any q_norm/k_norm mismatch raises the clear upgrade error instead of dropping the norms.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3b585bd32f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| message = str(error) | ||
| # QK-norm weights are load-bearing: never filter past them (see the | ||
| # mlx-lm path). Check before the known extra-weight filter below. | ||
| _raise_if_qk_norm_version_gap(model_type, message, error) |
There was a problem hiding this comment.
Guard runtime-quantized VLM loads too
This QK-norm check only runs through _load_mlx_vlm_with_extra_weight_filter, but the VLM runtime-quantization branch calls vlm_load(...) directly before quantizing (want_runtime_quant, lines 4082-4088). In that environment, loading a Gemma4/Qwen3.5 VLM on an mlx-vlm build that rejects q_norm/k_norm still surfaces the raw parameters not in model ValueError instead of the new actionable version-gap message. Route that direct load through the same guard or catch its ValueError before applying quantization.
Useful? React with 👍 / 👎.
The want_runtime_quant VLM branch calls vlm_load() directly before quantizing, bypassing _load_mlx_vlm_with_extra_weight_filter and its QK-norm guard. Wrap it so a q_norm/k_norm strict-load failure raises the clear upgrade error there as well. The text runtime-quant path already routes through _load_mlx_lm_with_strict_fallback, which is guarded.
💡 Codex Reviewunsloth-zoo/unsloth_zoo/mlx/loader.py Line 302 in 0eabafd When the installed package is too old but not exactly ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
The QK-norm guard now runs before every strict=False / extra-weight fallback (3b585bd), so an error listing both KV-sharing keys and |
…rsion probe The suggested fix string used a plain pip install, so a user already on an old-but-not-0.31.3 mlx-lm would reinstall the same too-old build and hit the error again. Add -U and the known-good floor: pip install -U mlx-lm>=0.22.0,!=0.31.3. Also add a one-line comment on the best-effort version-probe except block.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 158e2de688
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # first so an error carrying both KV-sharing keys and q_norm/k_norm (a | ||
| # broken mlx-lm) raises a clear error instead of silently dropping the | ||
| # norms via the strict=False fallback below. | ||
| _raise_if_qk_norm_version_gap(model_type, message, error) |
There was a problem hiding this comment.
Preserve the Gemma4 shared-KV fallback
When a gemma4_text checkpoint hits the known shared-KV mismatch and the error includes the shared-layer k_norm alongside k_proj/v_proj (the exact 126-parameter error from mlx-lm #1242), this guard now runs before _KNOWN_MLX_LM_STRICT_FALLBACKS and raises instead of retrying load_model(..., strict=False). Those shared-layer K/V tensors are intentionally absent from the MLX model -- the repo’s own converter skips k_proj/v_proj and self_attn.k_norm for gemma4_kv_shared_layers in unsloth_zoo/vllm_utils.py:1147-1150,1188 -- so treating that signature as an unsupported QK-norm version gap unnecessarily breaks the existing safe Gemma4 text load path. Please exempt the Gemma4 shared-KV signature or let that fallback handle errors that only contain the shared k_norm with K/V extras.
Useful? React with 👍 / 👎.
|
@codex review |
|
Codex Review: Didn't find any major issues. Another round soon, please! Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@codex review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8766c18845
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # first so an error carrying both KV-sharing keys and q_norm/k_norm (a | ||
| # broken mlx-lm) raises a clear error instead of silently dropping the | ||
| # norms via the strict=False fallback below. | ||
| _raise_if_qk_norm_version_gap(model_type, message, error) |
There was a problem hiding this comment.
Keep Gemma4 KV-sharing fallback from being preempted
When a Gemma4 text checkpoint has extra weights only for KV-shared layers, the existing fallback below is meant to retry with strict=False once the message contains self_attn.k_proj and self_attn.v_proj. Those same safe KV-sharing mismatches can also include self_attn.k_norm for layers that the MLX Gemma4 model intentionally does not instantiate, so this new guard raises before the fallback and turns supported Gemma4 text loads into hard failures instead of dropping the unused shared-layer extras.
Useful? React with 👍 / 👎.
|
@codex review |
|
Codex Review: Didn't find any major issues. 🎉 Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review |
|
Codex Review: Didn't find any major issues. 👍 Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Problem
Loading a QK-norm MLX checkpoint (
gemma4,qwen3_5) on anmlx-lm/mlx-vlmthat predates the architecture fails with a raw, unactionable error:This is what users hit on
mlx-lm0.31.3 (see mlx-lm #1242). Theq_norm/k_normweights are load-bearing, so silently dropping them would produce a numerically broken model.Fix
Detect the QK-norm strict-load signature (
parameters not in modelnamingq_norm/k_norm) at both strict re-raise sites in the loader and raise a clear, actionable error that names the cause and the fix (reinstall an arch-completemlx-lm, not 0.31.3) instead of the rawValueError. Covers gemma4, qwen3_5, and any future QK-norm arch generically. Non-QK-norm mismatches still fall through to the existing gemma4 KV-sharing / quant-projection filters.The loader does not add a
q_norm/k_norm-dropping fallback: those weights matter, so the correct resolution is upgrading the MLX stack (companion fix in the Studio installer excludesmlx-lm==0.31.3).Validation
pytest tests/test_mlx_qk_norm_version_gap.py(3 passing): the tester's exact gemma4 message and a qwen3_5q_normmessage raise the actionable error; a non-QK-norm mismatch and an unrelated error pass through untouched.