Skip to content

[Bug] Windows ROCm: torchao==0.17.0 crashes on import, breaking sentence-transformers and llama-server embedder #6833

Description

@jerrydong1988

Description

On Windows with AMD ROCm (Strix Halo / Radeon 8060S, gfx1151), the Unsloth installer installs torchao==0.17.0 alongside torch 2.11.0+rocm7.13.0. However, torchao 0.17.0 crashes on import because torch.ops._c10d_functional.all_gather_into_tensor does not exist in ROCm Windows builds of PyTorch.

This causes a cascade failure:

  1. sentence-transformerstransformers.PreTrainedModeltransformers.quantizers.quantizer_torchaotorchao import → CRASH
  2. sentence-transformers embedder unavailable → falls back to llama-server GGUF embedder
  3. llama-server embedder also fails → falls back to CPU
  4. End result: Unsloth Studio runs entirely on CPU despite having a working ROCm GPU

Environment

  • OS: Windows 10
  • GPU: AMD Radeon(TM) 8060S Graphics (Strix Halo APU, gfx1151)
  • ROCm: 7.13.99004 (pip SDK from repo.amd.com)
  • Python: 3.12.9
  • Unsloth venv: C:\Users\Jerry\.unsloth\studio\unsloth_studio\

Installer Output

The installer explicitly chooses this version combination:

torch 2.11.0+rocm7.13.0 detected -- installing torchao==0.17.0

Reproduction

# Fresh install via:
irm https://unsloth.ai/install.ps1 | iex

# After install, verify the crash:
& "C:\Users\Jerry\.unsloth\studio\unsloth_studio\Scripts\python.exe" -c "import torchao"

Error Details

AttributeError: '_OpNamespace' '_c10d_functional' object has no attribute 'all_gather_into_tensor'

The crash is in torchao/dtypes/nf4tensor.py line 67, a module-level NF4_OPS_TABLE dict that unconditionally references:

NF4_OPS_TABLE: Dict[Any, Any] = {
    torch.ops._c10d_functional.all_gather_into_tensor.default: nf4_all_gather_into_tensor,
    ...
}

On ROCm Windows torch 2.11.0, torch.ops._c10d_functional exists but only contains ["name"] — no all_gather_into_tensor op.

Root Cause

The installer should either:

  1. Not install torchao on ROCm Windows if it's known to be incompatible
  2. Install a torchao version compatible with ROCm Windows
  3. Perform a post-install import smoke test and warn/revert on failure

Workaround

Users can work around by patching transformers/quantizers/quantizer_torchao.py to wrap the import torchao in a try/except, but this is fragile.

Note

The ROCm GPU itself works fine — torch.cuda.is_available() returns True after install. The issue is specifically with torchao's module-level initialization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions