[Bug] Windows ROCm: torchao==0.17.0 crashes on import, breaking sentence-transformers and llama-server embedder

## Description

On Windows with AMD ROCm (Strix Halo / Radeon 8060S, gfx1151), the Unsloth installer installs `torchao==0.17.0` alongside `torch 2.11.0+rocm7.13.0`. However, torchao 0.17.0 crashes on import because `torch.ops._c10d_functional.all_gather_into_tensor` does not exist in ROCm Windows builds of PyTorch.

This causes a cascade failure:
1. `sentence-transformers` → `transformers.PreTrainedModel` → `transformers.quantizers.quantizer_torchao` → `torchao` import → CRASH
2. `sentence-transformers embedder unavailable` → falls back to `llama-server` GGUF embedder
3. `llama-server embedder` also fails → falls back to CPU
4. End result: Unsloth Studio runs entirely on CPU despite having a working ROCm GPU

## Environment

- **OS**: Windows 10
- **GPU**: AMD Radeon(TM) 8060S Graphics (Strix Halo APU, gfx1151)
- **ROCm**: 7.13.99004 (pip SDK from repo.amd.com)
- **Python**: 3.12.9
- **Unsloth venv**: `C:\Users\Jerry\.unsloth\studio\unsloth_studio\`

## Installer Output

The installer explicitly chooses this version combination:
```
torch 2.11.0+rocm7.13.0 detected -- installing torchao==0.17.0
```

## Reproduction

```powershell
# Fresh install via:
irm https://unsloth.ai/install.ps1 | iex

# After install, verify the crash:
& "C:\Users\Jerry\.unsloth\studio\unsloth_studio\Scripts\python.exe" -c "import torchao"
```

## Error Details

```
AttributeError: '_OpNamespace' '_c10d_functional' object has no attribute 'all_gather_into_tensor'
```

The crash is in `torchao/dtypes/nf4tensor.py` line 67, a module-level `NF4_OPS_TABLE` dict that unconditionally references:
```python
NF4_OPS_TABLE: Dict[Any, Any] = {
    torch.ops._c10d_functional.all_gather_into_tensor.default: nf4_all_gather_into_tensor,
    ...
}
```

On ROCm Windows torch 2.11.0, `torch.ops._c10d_functional` exists but only contains `["name"]` — no `all_gather_into_tensor` op.

## Root Cause

The installer should either:
1. Not install torchao on ROCm Windows if it's known to be incompatible
2. Install a torchao version compatible with ROCm Windows
3. Perform a post-install import smoke test and warn/revert on failure

## Workaround

Users can work around by patching `transformers/quantizers/quantizer_torchao.py` to wrap the `import torchao` in a try/except, but this is fragile.

## Note

The ROCm GPU itself works fine — `torch.cuda.is_available()` returns True after install. The issue is specifically with torchao's module-level initialization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug] Windows ROCm: torchao==0.17.0 crashes on import, breaking sentence-transformers and llama-server embedder #6833

Description

Environment

Installer Output

Reproduction

Error Details

Root Cause

Workaround

Note

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

[Bug] Windows ROCm: torchao==0.17.0 crashes on import, breaking sentence-transformers and llama-server embedder #6833

Description

Description

Environment

Installer Output

Reproduction

Error Details

Root Cause

Workaround

Note

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions