[staging] torch-index override (mirror of unslothai/unsloth#6692)#227
[staging] torch-index override (mirror of unslothai/unsloth#6692)#227danielhanchen wants to merge 10 commits into
Conversation
…tection
get_torch_index_url (and the studio-update mirror _detect_cuda_torch_index_url)
chose the torch wheel family solely by probing the host GPU, with no override.
In a headless / container / CI build the host driver is visible via the
/proc/driver/nvidia/gpus fallback but nvidia-smi cannot report a CUDA version,
so the function fell back to its cu126 default and installed the wrong wheels
(e.g. a cu128 image got cu126 torch).
Add an explicit override checked before any probing, in both the shell installer
and the Python studio-update path:
- UNSLOTH_TORCH_INDEX_URL full index URL, used verbatim (wins)
- UNSLOTH_TORCH_INDEX_FAMILY family (cpu, cu128, rocm6.4, ...) appended to the
mirror base (UNSLOTH_PYTORCH_MIRROR still honoured)
This matches how the published GPU images select CUDA -- vLLM and SGLang take the
CUDA version from an explicit build ARG rather than detecting it, and the Unsloth
Docker base image already pins the cu128 index directly. Desktop installs are
unchanged: with no override set, detection runs exactly as before.
Adds test_get_torch_index_url.sh cases for the override (family, full URL,
precedence, mirror base, trailing-slash strip, empty-ignored).
Address review feedback on the override added in this PR so a pinned index is honoured everywhere, not just in get_torch_index_url: - Skip the WSL ROCm bootstrap (root privilege + large downloads, probes /dev/dxg) when UNSLOTH_TORCH_INDEX_URL / _FAMILY is set; it previously ran before the override was consulted. - Skip the Radeon/Strix rerouting (which re-probes the GPU and overwrites the resolved URL with repo.radeon.com / repo.amd.com) when the index is pinned, so an explicit ROCm override (e.g. UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4) is kept. - install_python_stack.py: derive _TORCH_BACKEND from the override when UNSLOTH_TORCH_BACKEND is unset (standalone studio update), so _ensure_rocm_torch / _ensure_cuda_torch repair to the requested family instead of re-detecting. - Strip ALL leading/trailing slashes in the shell override to match the Python side (avoids 404s on strict pip proxies). Adds test cases for double-slash and leading/trailing-slash overrides.
Follow-up to the override work in this PR: the get_torch_index_url / install.sh reroute already respect a pinned UNSLOTH_TORCH_INDEX_URL / _FAMILY, but the Python repair helpers in install_python_stack.py still re-probed the GPU and could overwrite the pinned family. Make the pin authoritative there too: - _ensure_cuda_torch: an explicit cu* pin commits to CUDA wheels, so repair a ROCm-poisoned venv even when no NVIDIA GPU is visible here (headless / container / CI cross-install), instead of bailing on the GPU-presence gate. - _ensure_rocm_torch: skip the AMD per-gfx (Strix) reroute when a ROCm index is pinned, and in the generic reinstall path install from the pinned URL verbatim rather than re-detecting the host ROCm version. gfx*/rocm7.2 indexes serve torch 2.11+, so select the 2.11 package specs for a gfx leaf. - install.sh: raise the torch constraint to 2.11 for */gfx* indexes too, matching rocm7.2, so a pinned full-URL/family override that returns early keeps a valid constraint. Add _explicit_torch_index_url / _explicit_rocm_torch_index_url helpers and tests covering the no-GPU CUDA pin repair and the explicit gfx index honored verbatim.
for more information, see https://pre-commit.ci
The pinned-index work landed for install.sh and install_python_stack.py, but the Windows installers still picked the wheel index from GPU probing. Extend the same UNSLOTH_TORCH_INDEX_URL / _FAMILY contract so a pinned index wins on every platform: - install.ps1: Get-TorchIndexUrl returns the pinned URL/family before nvidia-smi probing; the AMD ROCm reroute is skipped when the index is pinned, so an explicit cpu/cu* pin on an AMD host is not overwritten. - studio/setup.ps1: add shared Get-PinnedTorchIndexUrl / Get-TorchIndexLeaf helpers; the stale-venv check, the install selection and the AMD reroute all honor the pin, and the CPU/CUDA install pulls from the resolved index URL. - tests: parity test that all four installers read both override vars and the two Windows installers gate the AMD reroute on the pinned flag.
for more information, see https://pre-commit.ci
Follow-ups to the override work flagged in review: - install.ps1: a pinned gfx*/rocm>=7.2 index previously skipped the AMD reroute that sets the torch>=2.11 floor, so the generic install used torch>=2.4,<2.11 and could resolve the known-bad _grouped_mm wheel. Route a pinned ROCm index through the ROCm install path with the 2.11 floor + companions, and guard the companion-spec lookup so a skipped reroute block cannot null-deref. - studio/setup.ps1: the stale-venv check compared the installed flavor (cuXXX/cpu, with +rocm misread as cpu) against the raw pinned leaf (gfx1151 / rocm6.4), so a correct pinned ROCm venv was always marked stale. Classify +rocm wheels as the generic 'rocm' flavor and normalize a pinned rocm*/gfx* leaf to 'rocm' before comparing (cu* stays specific so cu126-vs-cu128 still rebuilds). - install_python_stack.py: _ensure_cuda_torch now also reinstalls from a pinned CUDA index when the venv carries a CPU wheel (headless CPU-venv-to-CUDA cross-install via 'studio update'), not only when it finds a ROCm build. - tests: parity assertions already cover all four installers honoring the override.
Follow-ups to the previous round: - studio/setup.ps1: a pinned gfx*/rocm>=7.2 index now routes through the ROCm install path with the 2.11 floor + companions (it previously fell through to the CUDA branch with bare torch/torchvision/torchaudio against the ROCm index). The CPU/CUDA fallback index is forced to the CPU wheel index when a ROCm index is active, so a failed pinned-ROCm install does not retry the ROCm mirror. - studio/setup.ps1: the stale-venv check no longer treats an unrecognized pinned URL leaf (e.g. a PEP 503 mirror ending in /simple) as a torch flavor tag, which was marking a correct venv stale; cu*/cpu/rocm/gfx leaves are still compared. - install.ps1: the post-failure CPU fallback uses an explicit CPU index instead of , which for a pinned ROCm index was the ROCm mirror itself (so the 'fallback' just retried the failing index and aborted the installer). - install_python_stack.py: _ensure_cuda_torch now also reinstalls when the venv's CUDA family differs from a pinned one (installed cu126 vs pinned cu128), not only CPU->CUDA; the probe reports the installed cuXXX tag for the comparison.
There was a problem hiding this comment.
Code Review
This pull request introduces support for explicit PyTorch wheel index overrides via the UNSLOTH_TORCH_INDEX_URL and UNSLOTH_TORCH_INDEX_FAMILY environment variables across all installation scripts (install.sh, install.ps1, setup.ps1, and install_python_stack.py). This allows headless, container, and CI environments to bypass automatic GPU probing and Radeon/Strix rerouting. Feedback on the changes suggests using .ToLowerInvariant() instead of .ToLower() in install.ps1 to prevent potential locale-specific string comparison issues and ensure consistency with other scripts.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| # bug). Route a pinned ROCm index through the ROCm install path with the same | ||
| # 2.11 floor/companions the unpinned reroute derives from the gfx arch. | ||
| if ($TorchIndexPinned -and -not $ROCmIndexUrl -and -not $SkipTorch) { | ||
| $_pinLeaf = ($TorchIndexUrl.TrimEnd('/') -split '/')[-1].ToLower() |
There was a problem hiding this comment.
Use .ToLowerInvariant() instead of .ToLower() to prevent potential locale-specific issues (such as the Turkish 'I' bug) when parsing the wheel index URL leaf. This also ensures consistency with the implementation of Get-TorchIndexLeaf in setup.ps1.
$_pinLeaf = ($TorchIndexUrl.TrimEnd('/') -split '/')[-1].ToLowerInvariant()
…r window The pinned-ROCm CPU fallback computes an explicit CPU index, but the comment explaining why it cannot reuse $TorchIndexUrl pushed the actual Invoke-InstallCommandRetry / --force-reinstall call more than 600 chars past the "ROCm PyTorch install failed" message, so test_pr5940_followups's window check no longer saw the retry helper. Move the CPU-index computation and its comment above the failure substep so the retrying force-reinstall stays adjacent to the message. No behavior change: same explicit CPU index, same retry, same --force-reinstall.
Staging mirror of unslothai#6692 to exercise the cross-OS install matrix on real Windows / macOS / Linux runners.
Changes under test: UNSLOTH_TORCH_INDEX_URL / UNSLOTH_TORCH_INDEX_FAMILY override wired through install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py, plus pinned ROCm/CUDA edge cases. No override set means the default detection path is unchanged.
Goal: confirm normal Studio install + smoke still passes on all three OSes, and the override + CUDA-spoof unit suites stay green.