Skip to content

[staging] torch-index override (mirror of unslothai/unsloth#6692)#227

Open
danielhanchen wants to merge 10 commits into
cuda-override-basefrom
cuda-torch-index-override
Open

[staging] torch-index override (mirror of unslothai/unsloth#6692)#227
danielhanchen wants to merge 10 commits into
cuda-override-basefrom
cuda-torch-index-override

Conversation

@danielhanchen

Copy link
Copy Markdown
Owner

Staging mirror of unslothai#6692 to exercise the cross-OS install matrix on real Windows / macOS / Linux runners.

Changes under test: UNSLOTH_TORCH_INDEX_URL / UNSLOTH_TORCH_INDEX_FAMILY override wired through install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py, plus pinned ROCm/CUDA edge cases. No override set means the default detection path is unchanged.

Goal: confirm normal Studio install + smoke still passes on all three OSes, and the override + CUDA-spoof unit suites stay green.

danielhanchen and others added 9 commits June 26, 2026 04:59
…tection

get_torch_index_url (and the studio-update mirror _detect_cuda_torch_index_url)
chose the torch wheel family solely by probing the host GPU, with no override.
In a headless / container / CI build the host driver is visible via the
/proc/driver/nvidia/gpus fallback but nvidia-smi cannot report a CUDA version,
so the function fell back to its cu126 default and installed the wrong wheels
(e.g. a cu128 image got cu126 torch).

Add an explicit override checked before any probing, in both the shell installer
and the Python studio-update path:
  - UNSLOTH_TORCH_INDEX_URL   full index URL, used verbatim (wins)
  - UNSLOTH_TORCH_INDEX_FAMILY family (cpu, cu128, rocm6.4, ...) appended to the
                               mirror base (UNSLOTH_PYTORCH_MIRROR still honoured)

This matches how the published GPU images select CUDA -- vLLM and SGLang take the
CUDA version from an explicit build ARG rather than detecting it, and the Unsloth
Docker base image already pins the cu128 index directly. Desktop installs are
unchanged: with no override set, detection runs exactly as before.

Adds test_get_torch_index_url.sh cases for the override (family, full URL,
precedence, mirror base, trailing-slash strip, empty-ignored).
Address review feedback on the override added in this PR so a pinned index is
honoured everywhere, not just in get_torch_index_url:

- Skip the WSL ROCm bootstrap (root privilege + large downloads, probes
  /dev/dxg) when UNSLOTH_TORCH_INDEX_URL / _FAMILY is set; it previously ran
  before the override was consulted.
- Skip the Radeon/Strix rerouting (which re-probes the GPU and overwrites the
  resolved URL with repo.radeon.com / repo.amd.com) when the index is pinned, so
  an explicit ROCm override (e.g. UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4) is kept.
- install_python_stack.py: derive _TORCH_BACKEND from the override when
  UNSLOTH_TORCH_BACKEND is unset (standalone studio update), so _ensure_rocm_torch
  / _ensure_cuda_torch repair to the requested family instead of re-detecting.
- Strip ALL leading/trailing slashes in the shell override to match the Python
  side (avoids 404s on strict pip proxies).

Adds test cases for double-slash and leading/trailing-slash overrides.
Follow-up to the override work in this PR: the get_torch_index_url / install.sh
reroute already respect a pinned UNSLOTH_TORCH_INDEX_URL / _FAMILY, but the
Python repair helpers in install_python_stack.py still re-probed the GPU and
could overwrite the pinned family. Make the pin authoritative there too:

- _ensure_cuda_torch: an explicit cu* pin commits to CUDA wheels, so repair a
  ROCm-poisoned venv even when no NVIDIA GPU is visible here (headless /
  container / CI cross-install), instead of bailing on the GPU-presence gate.
- _ensure_rocm_torch: skip the AMD per-gfx (Strix) reroute when a ROCm index is
  pinned, and in the generic reinstall path install from the pinned URL verbatim
  rather than re-detecting the host ROCm version. gfx*/rocm7.2 indexes serve
  torch 2.11+, so select the 2.11 package specs for a gfx leaf.
- install.sh: raise the torch constraint to 2.11 for */gfx* indexes too, matching
  rocm7.2, so a pinned full-URL/family override that returns early keeps a valid
  constraint.

Add _explicit_torch_index_url / _explicit_rocm_torch_index_url helpers and tests
covering the no-GPU CUDA pin repair and the explicit gfx index honored verbatim.
The pinned-index work landed for install.sh and install_python_stack.py, but the
Windows installers still picked the wheel index from GPU probing. Extend the same
UNSLOTH_TORCH_INDEX_URL / _FAMILY contract so a pinned index wins on every platform:

- install.ps1: Get-TorchIndexUrl returns the pinned URL/family before nvidia-smi
  probing; the AMD ROCm reroute is skipped when the index is pinned, so an explicit
  cpu/cu* pin on an AMD host is not overwritten.
- studio/setup.ps1: add shared Get-PinnedTorchIndexUrl / Get-TorchIndexLeaf helpers;
  the stale-venv check, the install selection and the AMD reroute all honor the pin,
  and the CPU/CUDA install pulls from the resolved index URL.
- tests: parity test that all four installers read both override vars and the two
  Windows installers gate the AMD reroute on the pinned flag.
Follow-ups to the override work flagged in review:

- install.ps1: a pinned gfx*/rocm>=7.2 index previously skipped the AMD reroute
  that sets the torch>=2.11 floor, so the generic install used torch>=2.4,<2.11
  and could resolve the known-bad _grouped_mm wheel. Route a pinned ROCm index
  through the ROCm install path with the 2.11 floor + companions, and guard the
  companion-spec lookup so a skipped reroute block cannot null-deref.
- studio/setup.ps1: the stale-venv check compared the installed flavor (cuXXX/cpu,
  with +rocm misread as cpu) against the raw pinned leaf (gfx1151 / rocm6.4), so a
  correct pinned ROCm venv was always marked stale. Classify +rocm wheels as the
  generic 'rocm' flavor and normalize a pinned rocm*/gfx* leaf to 'rocm' before
  comparing (cu* stays specific so cu126-vs-cu128 still rebuilds).
- install_python_stack.py: _ensure_cuda_torch now also reinstalls from a pinned
  CUDA index when the venv carries a CPU wheel (headless CPU-venv-to-CUDA
  cross-install via 'studio update'), not only when it finds a ROCm build.
- tests: parity assertions already cover all four installers honoring the override.
Follow-ups to the previous round:

- studio/setup.ps1: a pinned gfx*/rocm>=7.2 index now routes through the ROCm
  install path with the 2.11 floor + companions (it previously fell through to the
  CUDA branch with bare torch/torchvision/torchaudio against the ROCm index). The
  CPU/CUDA fallback index is forced to the CPU wheel index when a ROCm index is
  active, so a failed pinned-ROCm install does not retry the ROCm mirror.
- studio/setup.ps1: the stale-venv check no longer treats an unrecognized pinned
  URL leaf (e.g. a PEP 503 mirror ending in /simple) as a torch flavor tag, which
  was marking a correct venv stale; cu*/cpu/rocm/gfx leaves are still compared.
- install.ps1: the post-failure CPU fallback uses an explicit CPU index instead of
  , which for a pinned ROCm index was the ROCm mirror itself (so the
  'fallback' just retried the failing index and aborted the installer).
- install_python_stack.py: _ensure_cuda_torch now also reinstalls when the venv's
  CUDA family differs from a pinned one (installed cu126 vs pinned cu128), not only
  CPU->CUDA; the probe reports the installed cuXXX tag for the comparison.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for explicit PyTorch wheel index overrides via the UNSLOTH_TORCH_INDEX_URL and UNSLOTH_TORCH_INDEX_FAMILY environment variables across all installation scripts (install.sh, install.ps1, setup.ps1, and install_python_stack.py). This allows headless, container, and CI environments to bypass automatic GPU probing and Radeon/Strix rerouting. Feedback on the changes suggests using .ToLowerInvariant() instead of .ToLower() in install.ps1 to prevent potential locale-specific string comparison issues and ensure consistency with other scripts.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread install.ps1
# bug). Route a pinned ROCm index through the ROCm install path with the same
# 2.11 floor/companions the unpinned reroute derives from the gfx arch.
if ($TorchIndexPinned -and -not $ROCmIndexUrl -and -not $SkipTorch) {
$_pinLeaf = ($TorchIndexUrl.TrimEnd('/') -split '/')[-1].ToLower()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use .ToLowerInvariant() instead of .ToLower() to prevent potential locale-specific issues (such as the Turkish 'I' bug) when parsing the wheel index URL leaf. This also ensures consistency with the implementation of Get-TorchIndexLeaf in setup.ps1.

        $_pinLeaf = ($TorchIndexUrl.TrimEnd('/') -split '/')[-1].ToLowerInvariant()

…r window

The pinned-ROCm CPU fallback computes an explicit CPU index, but the comment
explaining why it cannot reuse $TorchIndexUrl pushed the actual
Invoke-InstallCommandRetry / --force-reinstall call more than 600 chars past the
"ROCm PyTorch install failed" message, so test_pr5940_followups's window check
no longer saw the retry helper. Move the CPU-index computation and its comment
above the failure substep so the retrying force-reinstall stays adjacent to the
message. No behavior change: same explicit CPU index, same retry, same
--force-reinstall.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant