install: let UNSLOTH_TORCH_INDEX_FAMILY / _URL override CUDA wheel detection#6692
install: let UNSLOTH_TORCH_INDEX_FAMILY / _URL override CUDA wheel detection#6692danielhanchen wants to merge 14 commits into
Conversation
…tection
get_torch_index_url (and the studio-update mirror _detect_cuda_torch_index_url)
chose the torch wheel family solely by probing the host GPU, with no override.
In a headless / container / CI build the host driver is visible via the
/proc/driver/nvidia/gpus fallback but nvidia-smi cannot report a CUDA version,
so the function fell back to its cu126 default and installed the wrong wheels
(e.g. a cu128 image got cu126 torch).
Add an explicit override checked before any probing, in both the shell installer
and the Python studio-update path:
- UNSLOTH_TORCH_INDEX_URL full index URL, used verbatim (wins)
- UNSLOTH_TORCH_INDEX_FAMILY family (cpu, cu128, rocm6.4, ...) appended to the
mirror base (UNSLOTH_PYTORCH_MIRROR still honoured)
This matches how the published GPU images select CUDA -- vLLM and SGLang take the
CUDA version from an explicit build ARG rather than detecting it, and the Unsloth
Docker base image already pins the cu128 index directly. Desktop installs are
unchanged: with no override set, detection runs exactly as before.
Adds test_get_torch_index_url.sh cases for the override (family, full URL,
precedence, mirror base, trailing-slash strip, empty-ignored).
There was a problem hiding this comment.
Code Review
This pull request introduces support for overriding the PyTorch wheel index URL and family via the UNSLOTH_TORCH_INDEX_URL and UNSLOTH_TORCH_INDEX_FAMILY environment variables in both the shell installer (install.sh) and the Python updater (install_python_stack.py). This allows headless, container, or CI environments to bypass GPU probing and pin a specific wheel family. Unit tests have also been added to verify these overrides. Feedback was provided regarding an inconsistency in how trailing and leading slashes are stripped between the shell and Python implementations, with a suggestion to use a loop in the shell script to robustly strip multiple slashes.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if [ -n "${UNSLOTH_TORCH_INDEX_URL:-}" ]; then | ||
| echo "${UNSLOTH_TORCH_INDEX_URL%/}"; return | ||
| fi | ||
| if [ -n "${UNSLOTH_TORCH_INDEX_FAMILY:-}" ]; then | ||
| _family="${UNSLOTH_TORCH_INDEX_FAMILY#/}" | ||
| echo "$_base/${_family%/}"; return | ||
| fi |
There was a problem hiding this comment.
The Python implementation in install_python_stack.py uses .rstrip('/') and .strip('/') to strip all leading and trailing slashes from the override variables. However, the shell implementation in install.sh currently only strips a single leading or trailing slash (using ${VAR%/} and ${VAR#/}).
To ensure consistency between the shell installer and the Python updater, and to prevent potential 404 errors on strict pip proxies (as noted in the comment on line 2499), we should strip all leading and trailing slashes in install.sh using a loop.
| if [ -n "${UNSLOTH_TORCH_INDEX_URL:-}" ]; then | |
| echo "${UNSLOTH_TORCH_INDEX_URL%/}"; return | |
| fi | |
| if [ -n "${UNSLOTH_TORCH_INDEX_FAMILY:-}" ]; then | |
| _family="${UNSLOTH_TORCH_INDEX_FAMILY#/}" | |
| echo "$_base/${_family%/}"; return | |
| fi | |
| if [ -n "${UNSLOTH_TORCH_INDEX_URL:-}" ]; then | |
| _url="${UNSLOTH_TORCH_INDEX_URL}" | |
| while [ "${_url%/}" != "$_url" ]; do _url="${_url%/}"; done | |
| echo "$_url"; return | |
| fi | |
| if [ -n "${UNSLOTH_TORCH_INDEX_FAMILY:-}" ]; then | |
| _family="${UNSLOTH_TORCH_INDEX_FAMILY}" | |
| while [ "${_family#/}" != "$_family" ]; do _family="${_family#/}"; done | |
| while [ "${_family%/}" != "$_family" ]; do _family="${_family%/}"; done | |
| echo "$_base/$_family"; return | |
| fi |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 138c0949b4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if [ -n "${UNSLOTH_TORCH_INDEX_URL:-}" ]; then | ||
| echo "${UNSLOTH_TORCH_INDEX_URL%/}"; return | ||
| fi | ||
| if [ -n "${UNSLOTH_TORCH_INDEX_FAMILY:-}" ]; then |
There was a problem hiding this comment.
Skip ROCm WSL bootstrap when torch index is pinned
When either override is set, this branch is still reached only after _maybe_bootstrap_rocm_wsl has already run (_maybe_bootstrap_rocm_wsl || true precedes TORCH_INDEX_URL=$(get_torch_index_url)). On a WSL Strix Halo host with no ROCm runtime, a headless build that pins CPU/CUDA via UNSLOTH_TORCH_INDEX_URL or UNSLOTH_TORCH_INDEX_FAMILY can still probe /dev/dxg and run the ROCm bootstrap with sudo/large downloads before the override is honored. Please short-circuit that bootstrap when an explicit index override is present, or resolve the override before calling it.
Useful? React with 👍 / 👎.
| _family="${UNSLOTH_TORCH_INDEX_FAMILY#/}" | ||
| echo "$_base/${_family%/}"; return |
There was a problem hiding this comment.
Bypass Radeon rerouting for pinned ROCm indexes
When UNSLOTH_TORCH_INDEX_FAMILY is set to a ROCm tag on a Radeon host, this returns the pinned mirror URL, but the later _amd_gpu_radeon fresh-install path still probes the GPU and installs from repo.radeon.com instead of $TORCH_INDEX_URL (falling back only on failure). That means air-gapped/CI installs using UNSLOTH_PYTORCH_MIRROR=... UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4 won't use the requested mirror or pinned family whenever ROCm detection succeeds; guard the Radeon/Strix reroutes when an explicit index override was supplied.
Useful? React with 👍 / 👎.
| _override_family = os.environ.get("UNSLOTH_TORCH_INDEX_FAMILY", "").strip() | ||
| if _override_family: | ||
| return f"{_PYTORCH_WHL_BASE}/{_override_family.strip('/')}" |
There was a problem hiding this comment.
Honor family overrides before ROCm repair
In standalone unsloth studio update, this override is only read inside the CUDA repair helper, while _TORCH_BACKEND remains empty and _ensure_rocm_torch() still runs afterward. On an AMD Linux host with CPU torch, setting UNSLOTH_TORCH_INDEX_FAMILY=cpu (or a CUDA family for a headless build) therefore still lets ROCm GPU detection reinstall ROCm torch from the detected ROCm index, ignoring the requested family; derive the backend from the override before both CUDA and ROCm repair paths.
Useful? React with 👍 / 👎.
Address review feedback on the override added in this PR so a pinned index is honoured everywhere, not just in get_torch_index_url: - Skip the WSL ROCm bootstrap (root privilege + large downloads, probes /dev/dxg) when UNSLOTH_TORCH_INDEX_URL / _FAMILY is set; it previously ran before the override was consulted. - Skip the Radeon/Strix rerouting (which re-probes the GPU and overwrites the resolved URL with repo.radeon.com / repo.amd.com) when the index is pinned, so an explicit ROCm override (e.g. UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4) is kept. - install_python_stack.py: derive _TORCH_BACKEND from the override when UNSLOTH_TORCH_BACKEND is unset (standalone studio update), so _ensure_rocm_torch / _ensure_cuda_torch repair to the requested family instead of re-detecting. - Strip ALL leading/trailing slashes in the shell override to match the Python side (avoids 404s on strict pip proxies). Adds test cases for double-slash and leading/trailing-slash overrides.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b02a609af5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # one like UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4) must not be rerouted to the | ||
| # Radeon/Strix repos by GPU probing. | ||
| _amd_gpu_radeon=false | ||
| if [ "$_torch_index_pinned" = false ]; then |
There was a problem hiding this comment.
Set 2.11 constraint for pinned gfx indexes
When a caller pins an AMD per-arch index, for example UNSLOTH_TORCH_INDEX_URL=https://repo.amd.com/rocm/whl/gfx1151/ for a Strix/air-gapped install, this guard skips the Strix block that would have set TORCH_CONSTRAINT to torch>=2.11.0,<2.12.0. The fresh install then reaches the generic PyTorch install with the default <2.11.0 constraint, but these gfx* indexes are the 2.11 ROCm wheel path, so uv cannot select the intended torch build. Please derive the 2.11 constraint from a pinned gfx* leaf before bypassing GPU probing.
Useful? React with 👍 / 👎.
| if _idx_leaf.startswith(("rocm", "gfx")): | ||
| _TORCH_BACKEND = "rocm" |
There was a problem hiding this comment.
Honor pinned ROCm family in Studio repair
The new override handling classifies UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4 or a full .../rocm6.4 URL only as _TORCH_BACKEND = "rocm"; it does not pass the requested family/index into _ensure_rocm_torch(), which still computes its install URL from the detected host ROCm version. In standalone unsloth studio update on an AMD Linux host with CPU torch, pinning rocm6.4 on a ROCm 7.2 machine will therefore reinstall from rocm7.2 instead of the requested family/mirror. Please make the ROCm repair path consume the explicit override, not just the backend classification.
Useful? React with 👍 / 👎.
|
Addressed the review feedback in the latest commit so a pinned index is authoritative everywhere, not just inside
Added test cases for double-slash and leading/trailing-slash overrides; the |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b02a609af5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if _idx_leaf.startswith(("rocm", "gfx")): | ||
| _TORCH_BACKEND = "rocm" |
There was a problem hiding this comment.
Preserve the pinned ROCm index in the Python repair path
When unsloth studio update (or the post-install Python stack step) is run with UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4 or a ROCm UNSLOTH_TORCH_INDEX_URL, this block only collapses the override to _TORCH_BACKEND = "rocm". _ensure_rocm_torch() still re-detects the host ROCm version and installs from f"{_PYTORCH_WHL_BASE}/{tag}" (and can still take the Strix per-gfx override), so a CPU/CUDA torch repair ignores the exact pinned family/URL that the shell path just promised to honor. This breaks the explicit override contract for standalone Studio updates and for installs where the Python repair step runs after a pinned ROCm selection.
Useful? React with 👍 / 👎.
| elif _idx_leaf.startswith("cu"): | ||
| _TORCH_BACKEND = "cuda" |
There was a problem hiding this comment.
Honor CUDA overrides without requiring visible hardware
For a standalone unsloth studio update in the headless/container case this change classifies UNSLOTH_TORCH_INDEX_FAMILY=cu128 as CUDA, but _ensure_cuda_torch() still returns before using _detect_cuda_torch_index_url() unless _has_usable_nvidia_gpu() succeeds. That means an existing ROCm-poisoned venv on a GPU-less builder is left as ROCm even though the caller explicitly pinned a CUDA wheel family, so the new Python-side override does not actually cover the no-visible-GPU scenario it was added for.
Useful? React with 👍 / 👎.
| # pip proxies (artifactory, sonatype). | ||
| _url="${UNSLOTH_TORCH_INDEX_URL}" | ||
| while [ "${_url%/}" != "$_url" ]; do _url="${_url%/}"; done | ||
| echo "$_url"; return |
There was a problem hiding this comment.
Allow gfx URL overrides to select torch 2.11 wheels
When UNSLOTH_TORCH_INDEX_URL points at an AMD per-gfx index such as https://repo.amd.com/rocm/whl/gfx1151, this return makes TORCH_INDEX_URL end in gfx1151 without taking the Strix reroute path that normally also raises TORCH_CONSTRAINT to torch>=2.11.0,<2.12.0. The only generic constraint adjustment below handles */rocm7.2, so the fresh install still asks that per-gfx index for torch>=2.4,<2.11.0 even though the existing Strix path documents that this index serves the torch 2.11+ ROCm build, causing the pinned full-URL install to fail to resolve.
Useful? React with 👍 / 👎.
Follow-up to the override work in this PR: the get_torch_index_url / install.sh reroute already respect a pinned UNSLOTH_TORCH_INDEX_URL / _FAMILY, but the Python repair helpers in install_python_stack.py still re-probed the GPU and could overwrite the pinned family. Make the pin authoritative there too: - _ensure_cuda_torch: an explicit cu* pin commits to CUDA wheels, so repair a ROCm-poisoned venv even when no NVIDIA GPU is visible here (headless / container / CI cross-install), instead of bailing on the GPU-presence gate. - _ensure_rocm_torch: skip the AMD per-gfx (Strix) reroute when a ROCm index is pinned, and in the generic reinstall path install from the pinned URL verbatim rather than re-detecting the host ROCm version. gfx*/rocm7.2 indexes serve torch 2.11+, so select the 2.11 package specs for a gfx leaf. - install.sh: raise the torch constraint to 2.11 for */gfx* indexes too, matching rocm7.2, so a pinned full-URL/family override that returns early keeps a valid constraint. Add _explicit_torch_index_url / _explicit_rocm_torch_index_url helpers and tests covering the no-GPU CUDA pin repair and the explicit gfx index honored verbatim.
|
Thanks, all three addressed in 997155d.
Added |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0c0e2cbbc7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if ver < (7, 2): | ||
| # An explicit ROCm wheel-index pin is authoritative: never auto-reroute it to | ||
| # the AMD per-gfx index (the caller already chose the family/URL). | ||
| if ver < (7, 2) and _explicit_rocm_torch_index_url() is None: |
There was a problem hiding this comment.
Honor pinned ROCm indexes when HIP torch exists
When a ROCm/GFX index is pinned, this guard suppresses the Strix auto-reroute, but the pinned index is only installed later inside the elif not has_hip_torch path. On a Strix ROCm 7.1 venv that already has HIP torch, running UNSLOTH_TORCH_INDEX_URL=https://repo.amd.com/rocm/whl/gfx1151 unsloth studio update now skips the Strix fix and never reinstalls from the pinned index, leaving the known-bad 7.1 wheel in place.
Useful? React with 👍 / 👎.
| _override_idx = _explicit_rocm_torch_index_url() | ||
| if _override_idx is not None: | ||
| index_url = _override_idx |
There was a problem hiding this comment.
Apply ROCm overrides before host GPU gates
The explicit ROCm index is only consulted here, after _ensure_rocm_torch() has already required a visible AMD GPU and a readable host ROCm version. In a headless/container cross-install, or on a builder with only NVIDIA visible, UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4 sets _TORCH_BACKEND to rocm but the function returns before reaching this block, so the pinned ROCm wheel is never installed and the venv keeps its CPU/CUDA torch.
Useful? React with 👍 / 👎.
| # All other ROCm tags and CUDA stay within <2.11.0. | ||
| case "$TORCH_INDEX_URL" in | ||
| */rocm7.2) TORCH_CONSTRAINT="torch>=2.11.0,<2.12.0" ;; | ||
| */rocm7.2|*/gfx*) TORCH_CONSTRAINT="torch>=2.11.0,<2.12.0" ;; |
There was a problem hiding this comment.
Match gfx constraints on the index leaf
This pattern checks the whole URL, so a custom mirror whose base path contains a gfx* segment, e.g. UNSLOTH_PYTORCH_MIRROR=https://mirror.local/gfx-cache with a CUDA family, is treated like an AMD per-arch index and raises TORCH_CONSTRAINT to torch>=2.11.0,<2.12.0. The backend classification just above intentionally uses only the final path segment to avoid this mirror-path false positive; the constraint should do the same or CUDA/CPU installs from such mirrors can resolve the wrong torch family.
Useful? React with 👍 / 👎.
The pinned-index work landed for install.sh and install_python_stack.py, but the Windows installers still picked the wheel index from GPU probing. Extend the same UNSLOTH_TORCH_INDEX_URL / _FAMILY contract so a pinned index wins on every platform: - install.ps1: Get-TorchIndexUrl returns the pinned URL/family before nvidia-smi probing; the AMD ROCm reroute is skipped when the index is pinned, so an explicit cpu/cu* pin on an AMD host is not overwritten. - studio/setup.ps1: add shared Get-PinnedTorchIndexUrl / Get-TorchIndexLeaf helpers; the stale-venv check, the install selection and the AMD reroute all honor the pin, and the CPU/CUDA install pulls from the resolved index URL. - tests: parity test that all four installers read both override vars and the two Windows installers gate the AMD reroute on the pinned flag.
for more information, see https://pre-commit.ci
|
The 5-reviewer pass found the override was only wired into
The "deleted mainline" items the stale integration diff showed (Colab Cloudflare tunnel hardening, offline export/load helpers, etc.) are commits that landed on Tests: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a017ecfea
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| $ROCmIndexUrl = $null | ||
| $ROCmTorchFloor = $null | ||
| if (($HasROCm -or $ROCmGfxArch) -and $TorchIndexUrl -like "*/cpu" -and -not $SkipTorch) { | ||
| if (-not $TorchIndexPinned -and ($HasROCm -or $ROCmGfxArch) -and $TorchIndexUrl -like "*/cpu" -and -not $SkipTorch) { |
There was a problem hiding this comment.
Derive ROCm constraints for pinned gfx indexes
When a Windows install pins an AMD per-arch index, e.g. UNSLOTH_TORCH_INDEX_URL=https://repo.amd.com/rocm/whl/gfx1151, this new guard skips the only branch that sets $ROCmTorchFloor; the later generic install still runs with "torch>=2.4,<2.11.0" against $TorchIndexUrl. For the gfx115x/gfx120x indexes that the unpinned path already constrains to torch 2.11, this either resolves the known-bad older wheel or fails to select the intended wheel, so the pinned path should derive the same floor/companion constraints from the pinned leaf before bypassing reroute.
Useful? React with 👍 / 👎.
| if (-not $shouldRebuild) { | ||
| $expectedTorchTag = if ($HasNvidiaSmi) { Get-PytorchCudaTag } else { "cpu" } | ||
| $_pinnedIdx = Get-PinnedTorchIndexUrl | ||
| $expectedTorchTag = if ($_pinnedIdx) { Get-TorchIndexLeaf $_pinnedIdx } elseif ($HasNvidiaSmi) { Get-PytorchCudaTag } else { "cpu" } |
There was a problem hiding this comment.
Treat pinned ROCm indexes as rocm in stale checks
With a pinned ROCm/GFX index, this compares the installed tag to the raw leaf such as gfx1151 or rocm6.4, but the probe above only ever records cuXXX or cpu and treats a +rocm... torch version as cpu. As a result, unsloth studio update on an existing ROCm Windows venv pinned to a gfx/rocm index is always marked stale (or exits in installer-managed mode) even when the installed ROCm torch is correct; normalize these leaves to the same rocm flavor and parse +rocm before comparing.
Useful? React with 👍 / 👎.
| # An explicit CUDA wheel-index pin (headless / container / CI cross-install) | ||
| # commits to CUDA wheels regardless of whether a GPU is visible here, so it | ||
| # overrides the GPU-presence gate. | ||
| if not _has_usable_nvidia_gpu() and _explicit_torch_index_url() is None: |
There was a problem hiding this comment.
Reinstall CPU torch when CUDA is explicitly pinned
In standalone unsloth studio update with UNSLOTH_TORCH_INDEX_FAMILY=cu128 on a headless/no-visible-GPU builder, this bypasses the GPU gate, but the helper still returns unless the existing torch probe prints hip; an existing CPU torch is treated as deliberate and never reinstalled from the pinned CUDA index. Since the update path preserves torch rather than preinstalling it from install.sh, the explicit CUDA override remains ineffective for the common CPU-venv-to-CUDA cross-install case.
Useful? React with 👍 / 👎.
Follow-ups to the override work flagged in review: - install.ps1: a pinned gfx*/rocm>=7.2 index previously skipped the AMD reroute that sets the torch>=2.11 floor, so the generic install used torch>=2.4,<2.11 and could resolve the known-bad _grouped_mm wheel. Route a pinned ROCm index through the ROCm install path with the 2.11 floor + companions, and guard the companion-spec lookup so a skipped reroute block cannot null-deref. - studio/setup.ps1: the stale-venv check compared the installed flavor (cuXXX/cpu, with +rocm misread as cpu) against the raw pinned leaf (gfx1151 / rocm6.4), so a correct pinned ROCm venv was always marked stale. Classify +rocm wheels as the generic 'rocm' flavor and normalize a pinned rocm*/gfx* leaf to 'rocm' before comparing (cu* stays specific so cu126-vs-cu128 still rebuilds). - install_python_stack.py: _ensure_cuda_torch now also reinstalls from a pinned CUDA index when the venv carries a CPU wheel (headless CPU-venv-to-CUDA cross-install via 'studio update'), not only when it finds a ROCm build. - tests: parity assertions already cover all four installers honoring the override.
|
Thanks, all three were real gaps in the pinned-index handling. Fixed:
Validation: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d2d5f90175
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Gating on $HasROCm alone left Strix Halo / Radeon 8060S on CPU torch; a failed | ||
| # ROCm install still falls back to CPU below, so this is safe. | ||
| if (($HasROCm -or $ROCmGfxArch) -and $CuTag -eq "cpu") { | ||
| if (-not $TorchIndexPinned -and ($HasROCm -or $ROCmGfxArch) -and $CuTag -eq "cpu") { |
There was a problem hiding this comment.
Route pinned ROCm leaves through the ROCm install path
When UNSLOTH_TORCH_INDEX_URL or _FAMILY points at a ROCm/GFX index during standalone studio/setup.ps1 updates, this guard skips the only branch that initializes $ROCmIndexUrl, $ROCmTorchSpec, and the pinned torchvision/torchaudio ranges. The script then falls through with $CuTag set to the pinned leaf and installs via the generic CUDA branch using bare torch torchvision torchaudio from that ROCm index, so pinned Strix/RDNA4 indexes can resolve the known-bad <2.11 wheels or ABI-mismatched companions instead of the constrained ROCm triplet used by the unpinned AMD path.
Useful? React with 👍 / 👎.
| # Normalize a pinned rocm*/gfx* leaf to the generic "rocm" flavor so it | ||
| # compares against the installed +rocm wheel (also "rocm"); cu*/cpu | ||
| # leaves stay specific so a cu126-vs-cu128 mismatch still rebuilds. | ||
| $expectedTorchTag = if ($_pinLeaf -like 'gfx*' -or $_pinLeaf -like 'rocm*') { "rocm" } else { $_pinLeaf } |
There was a problem hiding this comment.
Skip stale checks for unknown pinned URL leaves
When UNSLOTH_TORCH_INDEX_URL is a full custom index whose final path segment is not itself a flavor, e.g. a PEP 503 mirror ending in /simple, this assigns simple as the required torch tag. A fresh Windows install can successfully install cu128 from that URL in install.ps1, then the installer-managed setup.ps1 pass compares cu128 != simple, marks the venv stale, and exits with a repair error; direct updates similarly rebuild a correct environment. Unknown leaves should not be treated as torch flavor tags.
Useful? React with 👍 / 👎.
| else: | ||
| return # healthy CUDA torch, or a deliberate CPU wheel -- leave as-is |
There was a problem hiding this comment.
Reinstall wrong CUDA family when pinned
With standalone unsloth studio update on Linux/headless builders, UNSLOTH_TORCH_INDEX_FAMILY=cu128 now bypasses the visible-GPU gate, but the probe collapses every installed CUDA wheel to the generic marker cuda. If the venv already has torch+cu126, this else returns without reinstalling from the pinned cu128 index, so the explicit override still leaves the wrong CUDA family in place; the repair should compare the installed +cuXXX tag when a CUDA family is pinned.
Useful? React with 👍 / 👎.
| $_pinRocm211 = ([int]$Matches[1] -gt 7) -or ([int]$Matches[1] -eq 7 -and [int]$Matches[2] -ge 2) | ||
| } | ||
| if ($_pinLeaf -like 'gfx*' -or $_pinRocm211) { | ||
| $ROCmIndexUrl = $TorchIndexUrl |
There was a problem hiding this comment.
Fall back to CPU from the CPU index after pinned ROCm fails
For a pinned ROCm/GFX index, this sets $ROCmIndexUrl to the same URL stored in $TorchIndexUrl; if the ROCm install later fails, the intended “CPU fallback” path still runs uv pip install ... --index-url $TorchIndexUrl, so it retries the failing ROCm mirror instead of using the CPU wheel index. This makes transient or unreachable pinned ROCm mirrors abort the Windows installer rather than leaving a CPU base for Studio to repair later, unlike the unpinned AMD path where $TorchIndexUrl remains */cpu.
Useful? React with 👍 / 👎.
Follow-ups to the previous round: - studio/setup.ps1: a pinned gfx*/rocm>=7.2 index now routes through the ROCm install path with the 2.11 floor + companions (it previously fell through to the CUDA branch with bare torch/torchvision/torchaudio against the ROCm index). The CPU/CUDA fallback index is forced to the CPU wheel index when a ROCm index is active, so a failed pinned-ROCm install does not retry the ROCm mirror. - studio/setup.ps1: the stale-venv check no longer treats an unrecognized pinned URL leaf (e.g. a PEP 503 mirror ending in /simple) as a torch flavor tag, which was marking a correct venv stale; cu*/cpu/rocm/gfx leaves are still compared. - install.ps1: the post-failure CPU fallback uses an explicit CPU index instead of , which for a pinned ROCm index was the ROCm mirror itself (so the 'fallback' just retried the failing index and aborted the installer). - install_python_stack.py: _ensure_cuda_torch now also reinstalls when the venv's CUDA family differs from a pinned one (installed cu126 vs pinned cu128), not only CPU->CUDA; the probe reports the installed cuXXX tag for the comparison.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4a5baba8ce
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Fast-Install torch torchvision torchaudio --index-url $TorchInstallIndexUrl | ||
| $torchInstallExit = $LASTEXITCODE | ||
| $output = "" | ||
| } else { | ||
| $output = Fast-Install torch torchvision torchaudio --index-url "$PyTorchWhlBase/$CuTag" | Out-String | ||
| $output = Fast-Install torch torchvision torchaudio --index-url $TorchInstallIndexUrl | Out-String |
There was a problem hiding this comment.
Force the pinned ROCm CPU fallback to reinstall
When a pinned rocm*/gfx* index fails, $TorchInstallIndexUrl has already been set to the CPU index, but $CuTag remains the pinned leaf, so execution skips the CPU fallback branch with @cpuForce and lands here instead. These CUDA-branch calls lack --force-reinstall, so uv/pip can keep a partially installed ROCm torch that still satisfies bare torch even though the script warned it was falling back to CPU; reset $CuTag/route through the CPU branch or include the force reinstall for this fallback path.
Useful? React with 👍 / 👎.
| # All other ROCm tags and CUDA stay within <2.11.0. | ||
| case "$TORCH_INDEX_URL" in | ||
| */rocm7.2) TORCH_CONSTRAINT="torch>=2.11.0,<2.12.0" ;; | ||
| */rocm7.2|*/gfx*) TORCH_CONSTRAINT="torch>=2.11.0,<2.12.0" ;; |
There was a problem hiding this comment.
Pin ROCm companion wheels for pinned gfx installs
With a full UNSLOTH_TORCH_INDEX_URL=.../gfx1151/gfx120X override, this new */gfx* case only raises the torch range; the later shell install still passes bare torchvision torchaudio from the same AMD per-arch index. The PowerShell/Python paths in this change pin those companions because AMD's per-gfx indexes publish them independently, so the shell path can resolve a torchvision/torchaudio build for a different torch minor even while torch is constrained to <2.12, leaving the venv ABI-mismatched.
Useful? React with 👍 / 👎.
|
All four were real gaps in the pinned-index handling. Fixed in
Validation: both |
…r window The pinned-ROCm CPU fallback computes an explicit CPU index, but the comment explaining why it cannot reuse $TorchIndexUrl pushed the actual Invoke-InstallCommandRetry / --force-reinstall call more than 600 chars past the "ROCm PyTorch install failed" message, so test_pr5940_followups's window check no longer saw the retry helper. Move the CPU-index computation and its comment above the failure substep so the retrying force-reinstall stays adjacent to the message. No behavior change: same explicit CPU index, same retry, same --force-reinstall.
|
Tested on Windows 11 with an RTX 5060 Ti (driver 595.97, CUDA 13.2), Python 3.13.13. Checked the default detection plus both overrides, including a full reinstall to confirm the override actually changes what gets installed. Default (no override) detects the right family for this card. Fresh local install picked cu130: With URL resolution checked against the actual So Studio also came up cleanly on the branch earlier (UI loaded, GGUF chat worked) on the default cu130 build: Default desktop installs behave exactly as before with nothing set, and the override does what it says for headless/CI/Docker builds. Looks good. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f6c5e46dad
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } elseif ($torchVer -match '\+rocm') { | ||
| # Any +rocm / gfx wheel -> generic "rocm" flavor. The exact ROCm | ||
| # version is repaired later by install_python_stack.py; here we | ||
| # only need the flavor so a correct ROCm venv is not marked stale. | ||
| $installedTorchTag = "rocm" |
There was a problem hiding this comment.
Treat unpinned ROCm venvs as expected ROCm
On a normal Windows AMD/ROCm install with no UNSLOTH_TORCH_INDEX_* pin and no NVIDIA GPU, this new branch records an existing +rocm torch as rocm, but the expected tag below still falls back to cpu. That makes a healthy ROCm venv look stale (torch rocm != required cpu); for installs launched from install.ps1 the managed setup path exits before finishing, and direct studio update rebuilds unnecessarily. Derive the expected tag from $HasROCm/$ROCmGfxArch here or avoid turning unpinned ROCm into a CPU mismatch.
Useful? React with 👍 / 👎.
| Fast-Install torch torchvision torchaudio --index-url $TorchInstallIndexUrl | ||
| $torchInstallExit = $LASTEXITCODE | ||
| $output = "" | ||
| } else { | ||
| $output = Fast-Install torch torchvision torchaudio --index-url "$PyTorchWhlBase/$CuTag" | Out-String | ||
| $output = Fast-Install torch torchvision torchaudio --index-url $TorchInstallIndexUrl | Out-String |
There was a problem hiding this comment.
Force CPU fallback after pinned ROCm failures
When a pinned ROCm index such as UNSLOTH_TORCH_INDEX_FAMILY=rocm7.2 or gfx1151 fails to install, $CuTag remains the ROCm leaf, so execution skips the CPU branch with @cpuForce and reaches this CUDA branch instead. Since $TorchInstallIndexUrl was precomputed as the CPU URL while $ROCmIndexUrl was set, this installs from the CPU index without --force-reinstall, allowing uv to keep a partially installed ROCm torch that already satisfies torch—the exact case the CPU fallback was guarding against.
Useful? React with 👍 / 👎.
| _override_idx = _explicit_rocm_torch_index_url() | ||
| if _override_idx is not None: | ||
| index_url = _override_idx | ||
| tag = index_url.rstrip("/").rsplit("/", 1)[-1].lower() |
There was a problem hiding this comment.
Bypass GPU gates for explicit ROCm pins
In standalone unsloth studio update with UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4 or a full ROCm URL, this override is only consulted after _ensure_rocm_torch() has already returned for a visible NVIDIA GPU, no visible AMD GPU, or an unreadable ROCm version. That leaves an existing CPU/CUDA torch in place in headless CI/container or mixed-GPU cross-install scenarios, even though the new override is meant to skip host GPU probing like the CUDA pin path does.
Useful? React with 👍 / 👎.
| elif _idx_leaf == "cpu": | ||
| _TORCH_BACKEND = "cpu" |
There was a problem hiding this comment.
Add a CPU repair path for explicit CPU pins
When standalone update runs with UNSLOTH_TORCH_INDEX_FAMILY=cpu or a URL ending in /cpu, this sets _TORCH_BACKEND to cpu; both repair helpers then treat that as a skip signal (_ensure_cuda_torch only runs for ""/cuda, and _ensure_rocm_torch returns for cuda/cpu). There is no CPU reinstall path, so an existing CUDA/ROCm torch remains in place even though the explicit CPU pin is supposed to be authoritative.
Useful? React with 👍 / 👎.
| # All other ROCm tags and CUDA stay within <2.11.0. | ||
| case "$TORCH_INDEX_URL" in | ||
| */rocm7.2) TORCH_CONSTRAINT="torch>=2.11.0,<2.12.0" ;; | ||
| */rocm7.2|*/gfx*) TORCH_CONSTRAINT="torch>=2.11.0,<2.12.0" ;; |
There was a problem hiding this comment.
Pin companion wheels for gfx overrides
For a newly supported pinned gfx override such as UNSLOTH_TORCH_INDEX_URL=.../gfx1151, this only raises the torch constraint; the install/repair commands still pass bare torchvision torchaudio to uv with --index-url "$TORCH_INDEX_URL". I checked uv pip install --help: --index-url selects the package index and extra indexes require --extra-index-url, so the AMD gfx index can resolve newer independently published companions than the torch 2.11 wheel (the PS/Python paths pin these for the same ABI-mismatch reason).
Useful? React with 👍 / 👎.
| if ($_pinLeaf -like 'gfx*' -or $_pinLeaf -like 'rocm*') { | ||
| $expectedTorchTag = "rocm" |
There was a problem hiding this comment.
Preserve exact ROCm pins during update
When a user changes an explicit ROCm pin from one ROCm family to another, for example from an existing +rocm6.4 venv to UNSLOTH_TORCH_INDEX_URL=.../gfx1151, this collapses both sides to the generic rocm flavor. If the installed package version is already current, the later dependency fast-path skips the PyTorch install entirely because the venv is not CPU-only, so the requested pinned index is never applied.
Useful? React with 👍 / 👎.
setup.ps1: - Stale-venv check: treat an AMD/ROCm host (HasROCm or a resolved gfx arch) with no explicit pin as expecting "rocm", not "cpu", so a healthy +rocm venv is not flagged stale (which made installer-managed setup exit and direct update rebuild). - Pinned-ROCm install failure now routes into the force-reinstall CPU branch: CuTag stays the rocm/gfx leaf on failure, so the condition also checks ROCmCpuFallback; otherwise the CUDA branch installed from the CPU index without --force-reinstall and kept the partial ROCm torch. - Explicit ROCm pin compare no longer collapses gfx*/rocm* to a generic "rocm": it compares the +rocmX.Y version (and the torch 2.11 line for gfx pins) so changing the pinned family (e.g. rocm6.4 -> gfx1151) rebuilds and applies it. install_python_stack.py: - _ensure_rocm_torch: an explicit ROCm wheel-index pin now bypasses the NVIDIA-present / no-AMD-GPU / unreadable-ROCm gates (headless/container/CI cross-install), mirroring the explicit-CUDA-pin bypass in _ensure_cuda_torch. - Add _ensure_cpu_torch: an explicit CPU pin (FAMILY=cpu or /cpu URL) now has a repair path that reinstalls CPU torch over an existing CUDA/ROCm build on a standalone update (which skips install.sh's flavor enforcement). install.sh: - Pin torchvision/torchaudio companions alongside torch for the rocm7.2 / per-gfx index and the Strix reroute (those AMD indexes publish companions independently and a bare name can resolve a torch-2.12-built wheel, an ABI mismatch).
for more information, see https://pre-commit.ci
|
Thanks for the review. Addressed all six in c290cac.
Local checks: test_get_torch_index_url.sh 46/46, test_cross_platform_parity.py 17/17, test_torch_flavor.ps1 green, setup.ps1 Pester 31 passed / 0 failed. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4018a697d2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
| } elseif ($HasNvidiaSmi) { | ||
| $expectedTorchTag = Get-PytorchCudaTag | ||
| } elseif ($HasROCm -or $script:ROCmGfxArch) { |
There was a problem hiding this comment.
Don't require ROCm torch for unmapped AMD GPUs
This stale-venv check now treats any $HasROCm or inferred $script:ROCmGfxArch as requiring a ROCm torch build, but the install branch later only maps gfx120x/gfx115x/gfx110x/gfx90a/gfx908 and explicitly falls back to CPU when $archFamily is null. On an AMD host with an unsupported/unknown arch such as name-inferred RDNA2 gfx1030, a correct CPU venv is marked stale (torch cpu != required rocm), so direct updates rebuild every run and installer-managed setup aborts before the CPU fallback can run. Fresh evidence is the later $archFamily null path in this same script that warns and installs CPU for those arches.
Useful? React with 👍 / 👎.
| # An explicit CUDA wheel-index pin (headless / container / CI cross-install) | ||
| # commits to CUDA wheels regardless of whether a GPU is visible here, so it | ||
| # overrides the GPU-presence gate. | ||
| if not _has_usable_nvidia_gpu() and _explicit_torch_index_url() is None: |
There was a problem hiding this comment.
Limit CUDA gate bypass to cu pins*
When UNSLOTH_TORCH_INDEX_URL is a custom mirror whose final segment is not a CUDA family, for example a PEP 503 URL ending in /simple, _TORCH_BACKEND remains empty but this condition still bypasses the no-NVIDIA gate merely because some explicit URL exists. On a GPU-less or AMD builder with an existing ROCm torch, the later hip branch runs the CUDA repair against that arbitrary URL even though no cu* pin was provided, replacing the current build instead of leaving unknown custom indexes alone. Restrict this bypass to classified CUDA pins.
Useful? React with 👍 / 👎.
| } else { | ||
| # gfx pin or unreadable version: compare on the torch 2.11 line. | ||
| $expectedTorchTag = if ($_pinNeeds211) { "rocm(torch>=2.11)" } else { "rocm(torch<2.11)" } | ||
| $installedTorchTag = if ($_instIs211) { "rocm(torch>=2.11)" } else { "rocm(torch<2.11)" } |
There was a problem hiding this comment.
Preserve non-ROCm flavor during ROCm pin checks
When a ROCm/gfx index is pinned but the installed wheel is CPU or CUDA, this fallback overwrites $installedTorchTag based only on the torch major/minor version. For example, UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4 with CPU torch 2.10, or a gfx1151 pin with CPU/CUDA torch 2.11, can compare equal to the expected ROCm bucket, so an up-to-date Studio venv skips the dependency pass and never installs the pinned ROCm wheel. Keep the original non-ROCm flavor unless torch.__version__ actually contains +rocm.
Useful? React with 👍 / 👎.
| if ($_expectedKnown -and $installedTorchTag -and $installedTorchTag -ne $expectedTorchTag) { | ||
| $shouldRebuild = $true |
There was a problem hiding this comment.
Let explicit pins repair instead of deleting the venv
For standalone Windows unsloth studio update, an explicit pin that intentionally changes flavor now trips the stale-venv rebuild before the later install/repair code can run. For example, on a NVIDIA venv with torch+cu128, setting UNSLOTH_TORCH_INDEX_FAMILY=cpu makes this set $shouldRebuild, the direct-update path removes .venv, and the script then errors because the virtual environment is missing; the new CPU repair path is never reached. Explicit pin mismatches should be routed to the torch install/repair step instead of the destructive stale check.
Useful? React with 👍 / 👎.
Native install verified across Windows, macOS and arm LinuxTo confirm this PR's torch-index resolution does not break the real install path, I ran a native
The install completes and resolves the right wheel index on each OS, the prebuilt macOS (Intel) and arm Linux, same flow: Notes:
|
…rrides _ensure_cuda_torch only overrode the NVIDIA-presence gate for *any* pinned index, so a non-CUDA mirror URL (or a ROCm/CPU pin) on a non-NVIDIA host with ROCm torch could force a CUDA reinstall over a working ROCm venv. Add _explicit_cuda_torch_index_url() (leaf cu*), matching the ROCm/CPU helpers, and gate on it instead. install.sh::get_torch_index_url treated a whitespace-only UNSLOTH_TORCH_INDEX_URL / _FAMILY as authoritative (yielding an invalid index), unlike the Python .strip() and PowerShell IsNullOrWhiteSpace paths; trim leading/trailing whitespace first.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c1e956c80a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "torch", | ||
| "torchvision", | ||
| "torchaudio", |
There was a problem hiding this comment.
Constrain CPU repair torch versions
When a standalone update is run with an explicit CPU pin on a venv that currently has CUDA/ROCm torch, this repair installs bare torch, torchvision, and torchaudio from the CPU index with constrain = False. Unlike the fresh CPU/CUDA paths and _CUDA_TORCH_PKG_SPEC, this can resolve the newest CPU trio outside the repo’s supported <2.11 torch range (and mismatched companion bounds), so the CPU override can leave Studio on an unsupported PyTorch stack; use the same bounded specs as the other CPU/CUDA installs.
Useful? React with 👍 / 👎.
| # false -- name-inferred Adrenalin hosts still get ROCm torch below.) | ||
| # Without this an unpinned ROCm venv compares "rocm" != "cpu" and is | ||
| # needlessly rebuilt, and an installer-managed setup exits as stale. | ||
| $expectedTorchTag = "rocm" |
There was a problem hiding this comment.
Route CPU-on-AMD venvs to repair
On an AMD/ROCm Windows host with no explicit pin, this makes every existing CPU torch venv stale just because $HasROCm or $script:ROCmGfxArch is true. That catches legitimate recovery cases such as an installer ROCm failure that intentionally fell back to a CPU base, or an older CPU-only AMD venv that the later dependency block would repair; before that repair can run, installer-managed setup exits and direct studio update removes the venv then errors. Treat existing +rocm as non-stale here without forcing CPU mismatches through the destructive stale-venv path.
Useful? React with 👍 / 👎.
| # An explicit CUDA wheel-index pin (headless / container / CI cross-install) | ||
| # commits to CUDA wheels regardless of whether a GPU is visible here, so it | ||
| # overrides the GPU-presence gate. | ||
| if not _has_usable_nvidia_gpu() and _explicit_cuda_torch_index_url() is None: |
There was a problem hiding this comment.
Apply CUDA pins before the CVD hide gate
When CI or a container hides GPUs with CUDA_VISIBLE_DEVICES=-1/empty and runs standalone unsloth studio update with UNSLOTH_TORCH_INDEX_FAMILY=cu128, the earlier CVD return still fires before this new explicit-pin gate. That leaves an existing CPU/ROCm torch in place even though the caller pinned CUDA; fresh evidence is that the explicit CUDA override check was added below the CVD short-circuit rather than before it.
Useful? React with 👍 / 👎.



Summary
get_torch_index_url(and itsstudio updatecounterpart_detect_cuda_torch_index_url) choose the PyTorch wheel family purely by probing the host GPU, with no way to override it. That is correct for a normal desktopcurl | shinstall, but it breaks any headless / container / CI install where the build host's GPU must not decide the wheel family.Concretely, in a GPU-less Docker build the builder still sees the host's NVIDIA driver through the
/proc/driver/nvidia/gpus/sysfs fallback, so_has_usable_nvidia_gpureturns true ("NVIDIA GPU detected"), but there is no queryablenvidia-smiCUDA version, so the function hits its last-ditch branch:and installs a
+cu126torch even when the image is meant to be+cu128. There was no env var to force the family.Fix
Add an explicit override, checked before any GPU probing, to both the shell installer and the Python
studio updatepath:UNSLOTH_TORCH_INDEX_URL-- a full index URL, used verbatim (highest precedence).UNSLOTH_TORCH_INDEX_FAMILY-- the convenience form (cpu,cu124,cu126,cu128,cu130,rocm6.4, ...), appended to the mirror base soUNSLOTH_PYTORCH_MIRRORis still honoured.This is the same "tell the build, don't ask the hardware" approach the published GPU images take: vLLM and SGLang both pick the CUDA wheel index from an explicit build
ARG(CUDA_VERSION->cu1xx) rather than detecting it, and the Unsloth Docker base image already pins--extra-index-url .../cu128directly. This change lets the Studio installer participate in that contract instead of guessing. Normal desktop installs are unaffected: with no override set, detection runs exactly as before.Tests
tests/sh/test_get_torch_index_url.shgains cases for: family override with no GPU, family override winning over a detected12.6(the exact Docker case), full-URL override winning over detection, family override appended toUNSLOTH_PYTORCH_MIRROR, trailing-slash stripping, URL-over-family precedence, and empty overrides falling through to detection. Full suite passes (44/44). The Python path was verified with the same matrix.