Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions unsloth/models/llama.py
Original file line number Diff line number Diff line change
Expand Up @@ -3736,3 +3736,13 @@ def _for_training(m):
from .rl import PatchFastRL

PatchFastRL(FastLanguageModel = FastLlamaModel)


# Auto-enable grouped-GEMM MoE (transformers<5 ModuleList experts) on built / PEFT'd
# models. Wraps the loader leaves once; guarded so it never breaks model loading.
try:
from unsloth_zoo.temporary_patches.moe_grouped_modulelist import wrap_loader_for_grouped_moe

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Point the grouped-MoE import at a shipped module

In normal installs using the declared unsloth_zoo>=2026.6.7 dependency, this module path does not resolve, and the broad except silently skips the whole wrapping block. That means transformers<5 ModuleList MoE models loaded through this path (and the matching vision.py block) keep using the old expert-loop implementation instead of the intended grouped-GEMM patch, so the new auto-enable behavior is effectively a no-op unless users happen to have an unpublished zoo build.

Useful? React with 👍 / 👎.

FastLlamaModel.from_pretrained = staticmethod(wrap_loader_for_grouped_moe(FastLlamaModel.from_pretrained))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enable grouped MoE only after PEFT adapters attach

When FastLanguageModel.from_pretrained loads a PEFT adapter repo, loader.py calls this leaf loader first (dispatch_model.from_pretrained around line 788) and only attaches the adapter later with PeftModel.from_pretrained around lines 865-878. Wrapping the leaf here enables grouped MoE on the base model before any LoRA modules exist, so the eligibility check cannot reject LoRA-on-expert adapters and there is no later recheck; adapters targeting MoE expert projections such as the default gate/up/down targets then keep the grouped forward built from base weights and silently ignore those LoRA weights. Move the enable step to after PEFT attachment or rerun/restore the grouped patch after adapters are loaded.

Useful? React with 👍 / 👎.

FastLlamaModel.get_peft_model = staticmethod(wrap_loader_for_grouped_moe(FastLlamaModel.get_peft_model))
except Exception:
pass
9 changes: 9 additions & 0 deletions unsloth/models/vision.py
Original file line number Diff line number Diff line change
Expand Up @@ -2119,3 +2119,12 @@ def check_dataset_for_missing_videos(
warnings.warn(error_msg, stacklevel = 2)

return missing


# Auto-enable grouped-GEMM MoE (transformers<5 ModuleList experts); see llama.py.
try:
from unsloth_zoo.temporary_patches.moe_grouped_modulelist import wrap_loader_for_grouped_moe
FastBaseModel.from_pretrained = staticmethod(wrap_loader_for_grouped_moe(FastBaseModel.from_pretrained))
FastBaseModel.get_peft_model = staticmethod(wrap_loader_for_grouped_moe(FastBaseModel.get_peft_model))
except Exception:
pass
Loading