Description
On Windows with AMD ROCm (Strix Halo / Radeon 8060S, gfx1151), the Unsloth installer installs torchao==0.17.0 alongside torch 2.11.0+rocm7.13.0. However, torchao 0.17.0 crashes on import because torch.ops._c10d_functional.all_gather_into_tensor does not exist in ROCm Windows builds of PyTorch.
This causes a cascade failure:
sentence-transformers → transformers.PreTrainedModel → transformers.quantizers.quantizer_torchao → torchao import → CRASH
sentence-transformers embedder unavailable → falls back to llama-server GGUF embedder
llama-server embedder also fails → falls back to CPU
- End result: Unsloth Studio runs entirely on CPU despite having a working ROCm GPU
Environment
- OS: Windows 10
- GPU: AMD Radeon(TM) 8060S Graphics (Strix Halo APU, gfx1151)
- ROCm: 7.13.99004 (pip SDK from repo.amd.com)
- Python: 3.12.9
- Unsloth venv:
C:\Users\Jerry\.unsloth\studio\unsloth_studio\
Installer Output
The installer explicitly chooses this version combination:
torch 2.11.0+rocm7.13.0 detected -- installing torchao==0.17.0
Reproduction
# Fresh install via:
irm https://unsloth.ai/install.ps1 | iex
# After install, verify the crash:
& "C:\Users\Jerry\.unsloth\studio\unsloth_studio\Scripts\python.exe" -c "import torchao"
Error Details
AttributeError: '_OpNamespace' '_c10d_functional' object has no attribute 'all_gather_into_tensor'
The crash is in torchao/dtypes/nf4tensor.py line 67, a module-level NF4_OPS_TABLE dict that unconditionally references:
NF4_OPS_TABLE: Dict[Any, Any] = {
torch.ops._c10d_functional.all_gather_into_tensor.default: nf4_all_gather_into_tensor,
...
}
On ROCm Windows torch 2.11.0, torch.ops._c10d_functional exists but only contains ["name"] — no all_gather_into_tensor op.
Root Cause
The installer should either:
- Not install torchao on ROCm Windows if it's known to be incompatible
- Install a torchao version compatible with ROCm Windows
- Perform a post-install import smoke test and warn/revert on failure
Workaround
Users can work around by patching transformers/quantizers/quantizer_torchao.py to wrap the import torchao in a try/except, but this is fragile.
Note
The ROCm GPU itself works fine — torch.cuda.is_available() returns True after install. The issue is specifically with torchao's module-level initialization.
Description
On Windows with AMD ROCm (Strix Halo / Radeon 8060S, gfx1151), the Unsloth installer installs
torchao==0.17.0alongsidetorch 2.11.0+rocm7.13.0. However, torchao 0.17.0 crashes on import becausetorch.ops._c10d_functional.all_gather_into_tensordoes not exist in ROCm Windows builds of PyTorch.This causes a cascade failure:
sentence-transformers→transformers.PreTrainedModel→transformers.quantizers.quantizer_torchao→torchaoimport → CRASHsentence-transformers embedder unavailable→ falls back tollama-serverGGUF embedderllama-server embedderalso fails → falls back to CPUEnvironment
C:\Users\Jerry\.unsloth\studio\unsloth_studio\Installer Output
The installer explicitly chooses this version combination:
Reproduction
Error Details
The crash is in
torchao/dtypes/nf4tensor.pyline 67, a module-levelNF4_OPS_TABLEdict that unconditionally references:On ROCm Windows torch 2.11.0,
torch.ops._c10d_functionalexists but only contains["name"]— noall_gather_into_tensorop.Root Cause
The installer should either:
Workaround
Users can work around by patching
transformers/quantizers/quantizer_torchao.pyto wrap theimport torchaoin a try/except, but this is fragile.Note
The ROCm GPU itself works fine —
torch.cuda.is_available()returns True after install. The issue is specifically with torchao's module-level initialization.