Keep DeepSeek-V4 hyper-connection mixers eager to stop backward inf by danielhanchen · Pull Request #859 · unslothai/unsloth-zoo

danielhanchen · 2026-07-03T18:40:09Z

Summary

Training DeepSeek-V4 with compiled modules produced inf grad_norm at step 2 and NaN weights by step 3 while the forward loss stayed finite. Bisecting the compiled cache per function isolated the manifold-constrained hyper-connection stream mixers (DeepseekV4HyperConnection / DeepseekV4HyperHead): their Sinkhorn-Knopp normalization chains twenty comb/(sum+eps) divisions with sigmoid and softmax mixing, and Inductor's fused backward of that chain overflows to inf at realistic gradient magnitudes. A random-init repro stays finite; the real model's gradient scale is required. Compiling everything except these two modules is stable.

What this does

Adds both classes to DISABLE_COMPILE_MODULES alongside the other numerically sensitive exclusions. RMSNorm, MLP, router, MLA attention, MoE, and the fused cross entropy stay compiled; the mixers are tiny, so throughput is unchanged.

Testing

tiny-DeepseekV4: 15 finite steps with a fresh cache and default env, loss matching a fully uncompiled reference run to 0.0002.
tiny-DeepseekV3 (no hyper-connection modules): no regression.

Training DeepSeek-V4 with compiled modules produced inf grad_norm at step 2 and NaN weights by step 3 while the forward loss stayed finite. Bisecting the compiled cache per function isolated the manifold-constrained hyper-connection stream mixers (DeepseekV4HyperConnection / DeepseekV4HyperHead): their Sinkhorn Knopp normalization chains twenty comb/(sum+eps) divisions with sigmoid and softmax mixing, and Inductor's fused backward of that chain overflows to inf at realistic gradient magnitudes (a random-init repro stays finite; the real model's gradient scale is required). Compiling everything except these two modules is stable. Add both classes to DISABLE_COMPILE_MODULES alongside the other numerically sensitive exclusions. RMSNorm, MLP, router, MLA attention, MoE, and the fused cross entropy stay compiled; the mixers are tiny, so throughput is unchanged. Validated on tiny-DeepseekV4: 15 finite steps with a fresh cache and default env, loss matching a fully uncompiled reference run to 0.0002, and no regression on tiny-DeepseekV3.

chatgpt-codex-connector · 2026-07-03T18:40:15Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

gemini-code-assist

Code Review

This pull request updates unsloth_zoo/compiler.py to add DeepseekV4HyperConnection and DeepseekV4HyperHead to the list of modules that bypass Inductor compilation in favor of eager execution. This change prevents numerical overflow issues (overflowing to infinity) during the fused backward pass of their Sinkhorn-Knopp division chain at real gradient scales. There are no review comments, so we have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

danielhanchen · 2026-07-04T12:50:57Z

@codex review

danielhanchen · 2026-07-04T12:50:57Z

/gemini review

gemini-code-assist

Code Review

This pull request updates unsloth_zoo/compiler.py to add DeepseekV4HyperConnection and DeepseekV4HyperHead to the list of modules that bypass autotuning. This prevents PyTorch Inductor's fused backward pass for their Sinkhorn-Knopp division chain from overflowing to infinity at real gradient scales. Since these are tiny modules, running them in eager mode has negligible cost. There are no review comments, and I have no additional feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

chatgpt-codex-connector · 2026-07-04T12:54:51Z

Codex Review: Didn't find any major issues. More of your lovely PRs please.

Reviewed commit: 07bb291709

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

gemini-code-assist Bot reviewed Jul 3, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jul 4, 2026

View reviewed changes

Tighten comments

96297ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Keep DeepSeek-V4 hyper-connection mixers eager to stop backward inf#859

Keep DeepSeek-V4 hyper-connection mixers eager to stop backward inf#859
danielhanchen wants to merge 2 commits into
mainfrom
dsv4-mhc-eager-compile

danielhanchen commented Jul 3, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jul 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

danielhanchen commented Jul 4, 2026

Uh oh!

danielhanchen commented Jul 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

chatgpt-codex-connector Bot commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

danielhanchen commented Jul 3, 2026

Summary

What this does

Testing

Uh oh!

chatgpt-codex-connector Bot commented Jul 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

danielhanchen commented Jul 4, 2026

Uh oh!

danielhanchen commented Jul 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chatgpt-codex-connector Bot commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant