Skip to content

install: let UNSLOTH_TORCH_INDEX_FAMILY / _URL override CUDA wheel detection#6692

Open
danielhanchen wants to merge 14 commits into
mainfrom
cuda-torch-index-override
Open

install: let UNSLOTH_TORCH_INDEX_FAMILY / _URL override CUDA wheel detection#6692
danielhanchen wants to merge 14 commits into
mainfrom
cuda-torch-index-override

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

get_torch_index_url (and its studio update counterpart _detect_cuda_torch_index_url) choose the PyTorch wheel family purely by probing the host GPU, with no way to override it. That is correct for a normal desktop curl | sh install, but it breaks any headless / container / CI install where the build host's GPU must not decide the wheel family.

Concretely, in a GPU-less Docker build the builder still sees the host's NVIDIA driver through the /proc/driver/nvidia/gpus/ sysfs fallback, so _has_usable_nvidia_gpu returns true ("NVIDIA GPU detected"), but there is no queryable nvidia-smi CUDA version, so the function hits its last-ditch branch:

[WARN] Could not determine CUDA version from nvidia-smi, defaulting to cu126

and installs a +cu126 torch even when the image is meant to be +cu128. There was no env var to force the family.

Fix

Add an explicit override, checked before any GPU probing, to both the shell installer and the Python studio update path:

  • UNSLOTH_TORCH_INDEX_URL -- a full index URL, used verbatim (highest precedence).
  • UNSLOTH_TORCH_INDEX_FAMILY -- the convenience form (cpu, cu124, cu126, cu128, cu130, rocm6.4, ...), appended to the mirror base so UNSLOTH_PYTORCH_MIRROR is still honoured.

This is the same "tell the build, don't ask the hardware" approach the published GPU images take: vLLM and SGLang both pick the CUDA wheel index from an explicit build ARG (CUDA_VERSION -> cu1xx) rather than detecting it, and the Unsloth Docker base image already pins --extra-index-url .../cu128 directly. This change lets the Studio installer participate in that contract instead of guessing. Normal desktop installs are unaffected: with no override set, detection runs exactly as before.

Tests

tests/sh/test_get_torch_index_url.sh gains cases for: family override with no GPU, family override winning over a detected 12.6 (the exact Docker case), full-URL override winning over detection, family override appended to UNSLOTH_PYTORCH_MIRROR, trailing-slash stripping, URL-over-family precedence, and empty overrides falling through to detection. Full suite passes (44/44). The Python path was verified with the same matrix.

…tection

get_torch_index_url (and the studio-update mirror _detect_cuda_torch_index_url)
chose the torch wheel family solely by probing the host GPU, with no override.
In a headless / container / CI build the host driver is visible via the
/proc/driver/nvidia/gpus fallback but nvidia-smi cannot report a CUDA version,
so the function fell back to its cu126 default and installed the wrong wheels
(e.g. a cu128 image got cu126 torch).

Add an explicit override checked before any probing, in both the shell installer
and the Python studio-update path:
  - UNSLOTH_TORCH_INDEX_URL   full index URL, used verbatim (wins)
  - UNSLOTH_TORCH_INDEX_FAMILY family (cpu, cu128, rocm6.4, ...) appended to the
                               mirror base (UNSLOTH_PYTORCH_MIRROR still honoured)

This matches how the published GPU images select CUDA -- vLLM and SGLang take the
CUDA version from an explicit build ARG rather than detecting it, and the Unsloth
Docker base image already pins the cu128 index directly. Desktop installs are
unchanged: with no override set, detection runs exactly as before.

Adds test_get_torch_index_url.sh cases for the override (family, full URL,
precedence, mirror base, trailing-slash strip, empty-ignored).

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for overriding the PyTorch wheel index URL and family via the UNSLOTH_TORCH_INDEX_URL and UNSLOTH_TORCH_INDEX_FAMILY environment variables in both the shell installer (install.sh) and the Python updater (install_python_stack.py). This allows headless, container, or CI environments to bypass GPU probing and pin a specific wheel family. Unit tests have also been added to verify these overrides. Feedback was provided regarding an inconsistency in how trailing and leading slashes are stripped between the shell and Python implementations, with a suggestion to use a loop in the shell script to robustly strip multiple slashes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread install.sh Outdated
Comment on lines +1996 to +2002
if [ -n "${UNSLOTH_TORCH_INDEX_URL:-}" ]; then
echo "${UNSLOTH_TORCH_INDEX_URL%/}"; return
fi
if [ -n "${UNSLOTH_TORCH_INDEX_FAMILY:-}" ]; then
_family="${UNSLOTH_TORCH_INDEX_FAMILY#/}"
echo "$_base/${_family%/}"; return
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Python implementation in install_python_stack.py uses .rstrip('/') and .strip('/') to strip all leading and trailing slashes from the override variables. However, the shell implementation in install.sh currently only strips a single leading or trailing slash (using ${VAR%/} and ${VAR#/}).

To ensure consistency between the shell installer and the Python updater, and to prevent potential 404 errors on strict pip proxies (as noted in the comment on line 2499), we should strip all leading and trailing slashes in install.sh using a loop.

Suggested change
if [ -n "${UNSLOTH_TORCH_INDEX_URL:-}" ]; then
echo "${UNSLOTH_TORCH_INDEX_URL%/}"; return
fi
if [ -n "${UNSLOTH_TORCH_INDEX_FAMILY:-}" ]; then
_family="${UNSLOTH_TORCH_INDEX_FAMILY#/}"
echo "$_base/${_family%/}"; return
fi
if [ -n "${UNSLOTH_TORCH_INDEX_URL:-}" ]; then
_url="${UNSLOTH_TORCH_INDEX_URL}"
while [ "${_url%/}" != "$_url" ]; do _url="${_url%/}"; done
echo "$_url"; return
fi
if [ -n "${UNSLOTH_TORCH_INDEX_FAMILY:-}" ]; then
_family="${UNSLOTH_TORCH_INDEX_FAMILY}"
while [ "${_family#/}" != "$_family" ]; do _family="${_family#/}"; done
while [ "${_family%/}" != "$_family" ]; do _family="${_family%/}"; done
echo "$_base/$_family"; return
fi

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 138c0949b4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread install.sh Outdated
Comment on lines +1996 to +1999
if [ -n "${UNSLOTH_TORCH_INDEX_URL:-}" ]; then
echo "${UNSLOTH_TORCH_INDEX_URL%/}"; return
fi
if [ -n "${UNSLOTH_TORCH_INDEX_FAMILY:-}" ]; then

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip ROCm WSL bootstrap when torch index is pinned

When either override is set, this branch is still reached only after _maybe_bootstrap_rocm_wsl has already run (_maybe_bootstrap_rocm_wsl || true precedes TORCH_INDEX_URL=$(get_torch_index_url)). On a WSL Strix Halo host with no ROCm runtime, a headless build that pins CPU/CUDA via UNSLOTH_TORCH_INDEX_URL or UNSLOTH_TORCH_INDEX_FAMILY can still probe /dev/dxg and run the ROCm bootstrap with sudo/large downloads before the override is honored. Please short-circuit that bootstrap when an explicit index override is present, or resolve the override before calling it.

Useful? React with 👍 / 👎.

Comment thread install.sh Outdated
Comment on lines +2000 to +2001
_family="${UNSLOTH_TORCH_INDEX_FAMILY#/}"
echo "$_base/${_family%/}"; return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bypass Radeon rerouting for pinned ROCm indexes

When UNSLOTH_TORCH_INDEX_FAMILY is set to a ROCm tag on a Radeon host, this returns the pinned mirror URL, but the later _amd_gpu_radeon fresh-install path still probes the GPU and installs from repo.radeon.com instead of $TORCH_INDEX_URL (falling back only on failure). That means air-gapped/CI installs using UNSLOTH_PYTORCH_MIRROR=... UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4 won't use the requested mirror or pinned family whenever ROCm detection succeeds; guard the Radeon/Strix reroutes when an explicit index override was supplied.

Useful? React with 👍 / 👎.

Comment on lines +947 to +949
_override_family = os.environ.get("UNSLOTH_TORCH_INDEX_FAMILY", "").strip()
if _override_family:
return f"{_PYTORCH_WHL_BASE}/{_override_family.strip('/')}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor family overrides before ROCm repair

In standalone unsloth studio update, this override is only read inside the CUDA repair helper, while _TORCH_BACKEND remains empty and _ensure_rocm_torch() still runs afterward. On an AMD Linux host with CPU torch, setting UNSLOTH_TORCH_INDEX_FAMILY=cpu (or a CUDA family for a headless build) therefore still lets ROCm GPU detection reinstall ROCm torch from the detected ROCm index, ignoring the requested family; derive the backend from the override before both CUDA and ROCm repair paths.

Useful? React with 👍 / 👎.

Address review feedback on the override added in this PR so a pinned index is
honoured everywhere, not just in get_torch_index_url:

- Skip the WSL ROCm bootstrap (root privilege + large downloads, probes
  /dev/dxg) when UNSLOTH_TORCH_INDEX_URL / _FAMILY is set; it previously ran
  before the override was consulted.
- Skip the Radeon/Strix rerouting (which re-probes the GPU and overwrites the
  resolved URL with repo.radeon.com / repo.amd.com) when the index is pinned, so
  an explicit ROCm override (e.g. UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4) is kept.
- install_python_stack.py: derive _TORCH_BACKEND from the override when
  UNSLOTH_TORCH_BACKEND is unset (standalone studio update), so _ensure_rocm_torch
  / _ensure_cuda_torch repair to the requested family instead of re-detecting.
- Strip ALL leading/trailing slashes in the shell override to match the Python
  side (avoids 404s on strict pip proxies).

Adds test cases for double-slash and leading/trailing-slash overrides.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b02a609af5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread install.sh
# one like UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4) must not be rerouted to the
# Radeon/Strix repos by GPU probing.
_amd_gpu_radeon=false
if [ "$_torch_index_pinned" = false ]; then

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Set 2.11 constraint for pinned gfx indexes

When a caller pins an AMD per-arch index, for example UNSLOTH_TORCH_INDEX_URL=https://repo.amd.com/rocm/whl/gfx1151/ for a Strix/air-gapped install, this guard skips the Strix block that would have set TORCH_CONSTRAINT to torch>=2.11.0,<2.12.0. The fresh install then reaches the generic PyTorch install with the default <2.11.0 constraint, but these gfx* indexes are the 2.11 ROCm wheel path, so uv cannot select the intended torch build. Please derive the 2.11 constraint from a pinned gfx* leaf before bypassing GPU probing.

Useful? React with 👍 / 👎.

Comment on lines +1443 to +1444
if _idx_leaf.startswith(("rocm", "gfx")):
_TORCH_BACKEND = "rocm"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor pinned ROCm family in Studio repair

The new override handling classifies UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4 or a full .../rocm6.4 URL only as _TORCH_BACKEND = "rocm"; it does not pass the requested family/index into _ensure_rocm_torch(), which still computes its install URL from the detected host ROCm version. In standalone unsloth studio update on an AMD Linux host with CPU torch, pinning rocm6.4 on a ROCm 7.2 machine will therefore reinstall from rocm7.2 instead of the requested family/mirror. Please make the ROCm repair path consume the explicit override, not just the backend classification.

Useful? React with 👍 / 👎.

@danielhanchen

Copy link
Copy Markdown
Member Author

Addressed the review feedback in the latest commit so a pinned index is authoritative everywhere, not just inside get_torch_index_url:

  • Skip the WSL ROCm bootstrap when UNSLOTH_TORCH_INDEX_URL / UNSLOTH_TORCH_INDEX_FAMILY is set (it previously ran before the override was consulted).
  • Skip the Radeon/Strix rerouting when the index is pinned, so an explicit ROCm override (e.g. UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4) is not overwritten by GPU probing.
  • install_python_stack.py: derive _TORCH_BACKEND from the override when UNSLOTH_TORCH_BACKEND is unset (standalone studio update), so the ROCm/CUDA repair helpers honour the requested family instead of re-detecting.
  • Strip ALL leading/trailing slashes in the shell override to match the Python side's .rstrip("/") / .strip("/").

Added test cases for double-slash and leading/trailing-slash overrides; the get_torch_index_url suite is green.

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b02a609af5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1443 to +1444
if _idx_leaf.startswith(("rocm", "gfx")):
_TORCH_BACKEND = "rocm"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve the pinned ROCm index in the Python repair path

When unsloth studio update (or the post-install Python stack step) is run with UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4 or a ROCm UNSLOTH_TORCH_INDEX_URL, this block only collapses the override to _TORCH_BACKEND = "rocm". _ensure_rocm_torch() still re-detects the host ROCm version and installs from f"{_PYTORCH_WHL_BASE}/{tag}" (and can still take the Strix per-gfx override), so a CPU/CUDA torch repair ignores the exact pinned family/URL that the shell path just promised to honor. This breaks the explicit override contract for standalone Studio updates and for installs where the Python repair step runs after a pinned ROCm selection.

Useful? React with 👍 / 👎.

Comment on lines +1447 to +1448
elif _idx_leaf.startswith("cu"):
_TORCH_BACKEND = "cuda"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor CUDA overrides without requiring visible hardware

For a standalone unsloth studio update in the headless/container case this change classifies UNSLOTH_TORCH_INDEX_FAMILY=cu128 as CUDA, but _ensure_cuda_torch() still returns before using _detect_cuda_torch_index_url() unless _has_usable_nvidia_gpu() succeeds. That means an existing ROCm-poisoned venv on a GPU-less builder is left as ROCm even though the caller explicitly pinned a CUDA wheel family, so the new Python-side override does not actually cover the no-visible-GPU scenario it was added for.

Useful? React with 👍 / 👎.

Comment thread install.sh
# pip proxies (artifactory, sonatype).
_url="${UNSLOTH_TORCH_INDEX_URL}"
while [ "${_url%/}" != "$_url" ]; do _url="${_url%/}"; done
echo "$_url"; return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow gfx URL overrides to select torch 2.11 wheels

When UNSLOTH_TORCH_INDEX_URL points at an AMD per-gfx index such as https://repo.amd.com/rocm/whl/gfx1151, this return makes TORCH_INDEX_URL end in gfx1151 without taking the Strix reroute path that normally also raises TORCH_CONSTRAINT to torch>=2.11.0,<2.12.0. The only generic constraint adjustment below handles */rocm7.2, so the fresh install still asks that per-gfx index for torch>=2.4,<2.11.0 even though the existing Strix path documents that this index serves the torch 2.11+ ROCm build, causing the pinned full-URL install to fail to resolve.

Useful? React with 👍 / 👎.

Follow-up to the override work in this PR: the get_torch_index_url / install.sh
reroute already respect a pinned UNSLOTH_TORCH_INDEX_URL / _FAMILY, but the
Python repair helpers in install_python_stack.py still re-probed the GPU and
could overwrite the pinned family. Make the pin authoritative there too:

- _ensure_cuda_torch: an explicit cu* pin commits to CUDA wheels, so repair a
  ROCm-poisoned venv even when no NVIDIA GPU is visible here (headless /
  container / CI cross-install), instead of bailing on the GPU-presence gate.
- _ensure_rocm_torch: skip the AMD per-gfx (Strix) reroute when a ROCm index is
  pinned, and in the generic reinstall path install from the pinned URL verbatim
  rather than re-detecting the host ROCm version. gfx*/rocm7.2 indexes serve
  torch 2.11+, so select the 2.11 package specs for a gfx leaf.
- install.sh: raise the torch constraint to 2.11 for */gfx* indexes too, matching
  rocm7.2, so a pinned full-URL/family override that returns early keeps a valid
  constraint.

Add _explicit_torch_index_url / _explicit_rocm_torch_index_url helpers and tests
covering the no-GPU CUDA pin repair and the explicit gfx index honored verbatim.
@danielhanchen

Copy link
Copy Markdown
Member Author

Thanks, all three addressed in 997155d.

  • _ensure_rocm_torch honoring the pin: the Strix/per-gfx reroute now skips when a ROCm index is pinned, and the generic reinstall branch installs from the pinned URL verbatim instead of re-detecting the host ROCm version. A gfx*/rocm7.2 leaf selects the torch 2.11 package specs.
  • _ensure_cuda_torch GPU gate: an explicit cu* pin commits to CUDA wheels, so a ROCm-poisoned venv is now repaired even when no NVIDIA GPU is visible here (headless / container / CI cross-install), rather than bailing on the presence check.
  • install.sh constraint: */gfx* now raises the torch constraint to >=2.11.0,<2.12.0 alongside */rocm7.2, so a pinned full-URL/family override that returns early keeps a valid constraint.

Added _explicit_torch_index_url / _explicit_rocm_torch_index_url helpers plus tests for the no-GPU CUDA pin repair and the explicit gfx index being honored verbatim. Full CUDA + ROCm install suites pass (350 passed, 1 skipped).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c0e2cbbc7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if ver < (7, 2):
# An explicit ROCm wheel-index pin is authoritative: never auto-reroute it to
# the AMD per-gfx index (the caller already chose the family/URL).
if ver < (7, 2) and _explicit_rocm_torch_index_url() is None:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor pinned ROCm indexes when HIP torch exists

When a ROCm/GFX index is pinned, this guard suppresses the Strix auto-reroute, but the pinned index is only installed later inside the elif not has_hip_torch path. On a Strix ROCm 7.1 venv that already has HIP torch, running UNSLOTH_TORCH_INDEX_URL=https://repo.amd.com/rocm/whl/gfx1151 unsloth studio update now skips the Strix fix and never reinstalls from the pinned index, leaving the known-bad 7.1 wheel in place.

Useful? React with 👍 / 👎.

Comment on lines +1355 to +1357
_override_idx = _explicit_rocm_torch_index_url()
if _override_idx is not None:
index_url = _override_idx

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply ROCm overrides before host GPU gates

The explicit ROCm index is only consulted here, after _ensure_rocm_torch() has already required a visible AMD GPU and a readable host ROCm version. In a headless/container cross-install, or on a builder with only NVIDIA visible, UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4 sets _TORCH_BACKEND to rocm but the function returns before reaching this block, so the pinned ROCm wheel is never installed and the venv keeps its CPU/CUDA torch.

Useful? React with 👍 / 👎.

Comment thread install.sh Outdated
# All other ROCm tags and CUDA stay within <2.11.0.
case "$TORCH_INDEX_URL" in
*/rocm7.2) TORCH_CONSTRAINT="torch>=2.11.0,<2.12.0" ;;
*/rocm7.2|*/gfx*) TORCH_CONSTRAINT="torch>=2.11.0,<2.12.0" ;;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match gfx constraints on the index leaf

This pattern checks the whole URL, so a custom mirror whose base path contains a gfx* segment, e.g. UNSLOTH_PYTORCH_MIRROR=https://mirror.local/gfx-cache with a CUDA family, is treated like an AMD per-arch index and raises TORCH_CONSTRAINT to torch>=2.11.0,<2.12.0. The backend classification just above intentionally uses only the final path segment to avoid this mirror-path false positive; the constraint should do the same or CUDA/CPU installs from such mirrors can resolve the wrong torch family.

Useful? React with 👍 / 👎.

danielhanchen and others added 2 commits June 26, 2026 08:33
The pinned-index work landed for install.sh and install_python_stack.py, but the
Windows installers still picked the wheel index from GPU probing. Extend the same
UNSLOTH_TORCH_INDEX_URL / _FAMILY contract so a pinned index wins on every platform:

- install.ps1: Get-TorchIndexUrl returns the pinned URL/family before nvidia-smi
  probing; the AMD ROCm reroute is skipped when the index is pinned, so an explicit
  cpu/cu* pin on an AMD host is not overwritten.
- studio/setup.ps1: add shared Get-PinnedTorchIndexUrl / Get-TorchIndexLeaf helpers;
  the stale-venv check, the install selection and the AMD reroute all honor the pin,
  and the CPU/CUDA install pulls from the resolved index URL.
- tests: parity test that all four installers read both override vars and the two
  Windows installers gate the AMD reroute on the pinned flag.
@danielhanchen

Copy link
Copy Markdown
Member Author

The 5-reviewer pass found the override was only wired into install.sh and the Python detector, so it was being ignored on the platforms that did not get the change. Completed the contract everywhere:

  • install.ps1 (Windows installer): Get-TorchIndexUrl now returns UNSLOTH_TORCH_INDEX_URL verbatim, or UNSLOTH_PYTORCH_MIRROR + UNSLOTH_TORCH_INDEX_FAMILY, before any nvidia-smi probing. The AMD ROCm reroute is skipped when the index is pinned, so an explicit cpu / cu* pin on an AMD host is not rewritten to a ROCm index.
  • studio/setup.ps1 (Windows Studio): added shared Get-PinnedTorchIndexUrl / Get-TorchIndexLeaf helpers; the stale-venv freshness check, the install selection (Get-PytorchCudaTag) and the AMD ROCm reroute all honor the pin, and the CPU/CUDA install pulls from the resolved index URL so a full pinned URL (custom mirror) is used as-is.
  • studio/install_python_stack.py ROCm/CUDA repair path: made authoritative in the earlier follow-up commit on this branch (_ensure_cuda_torch / _ensure_rocm_torch skip the host GPU/version probe and the Strix per-gfx reroute when pinned, and install from the pinned family/URL instead of re-detecting the host ROCm version).
  • Added a cross-platform parity test asserting all four installers read both override vars and that the two Windows installers gate the AMD reroute on the pinned-index flag, so this cannot silently drift back to per-OS coverage.

The "deleted mainline" items the stale integration diff showed (Colab Cloudflare tunnel hardening, offline export/load helpers, etc.) are commits that landed on main after this branch forked; I merged current main in so the branch carries them and the PR diff is just the three intended install changes plus the parity test.

Tests: tests/sh/test_get_torch_index_url.sh 46/46; tests/python/test_cross_platform_parity.py, tests/studio/install/test_cuda_repair.py, tests/studio/install/test_rocm_support.py green; both .ps1 files parse and the test_torch_flavor.ps1 helper suite passes.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5a017ecfea

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread install.ps1
$ROCmIndexUrl = $null
$ROCmTorchFloor = $null
if (($HasROCm -or $ROCmGfxArch) -and $TorchIndexUrl -like "*/cpu" -and -not $SkipTorch) {
if (-not $TorchIndexPinned -and ($HasROCm -or $ROCmGfxArch) -and $TorchIndexUrl -like "*/cpu" -and -not $SkipTorch) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Derive ROCm constraints for pinned gfx indexes

When a Windows install pins an AMD per-arch index, e.g. UNSLOTH_TORCH_INDEX_URL=https://repo.amd.com/rocm/whl/gfx1151, this new guard skips the only branch that sets $ROCmTorchFloor; the later generic install still runs with "torch>=2.4,<2.11.0" against $TorchIndexUrl. For the gfx115x/gfx120x indexes that the unpinned path already constrains to torch 2.11, this either resolves the known-bad older wheel or fails to select the intended wheel, so the pinned path should derive the same floor/companion constraints from the pinned leaf before bypassing reroute.

Useful? React with 👍 / 👎.

Comment thread studio/setup.ps1 Outdated
if (-not $shouldRebuild) {
$expectedTorchTag = if ($HasNvidiaSmi) { Get-PytorchCudaTag } else { "cpu" }
$_pinnedIdx = Get-PinnedTorchIndexUrl
$expectedTorchTag = if ($_pinnedIdx) { Get-TorchIndexLeaf $_pinnedIdx } elseif ($HasNvidiaSmi) { Get-PytorchCudaTag } else { "cpu" }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat pinned ROCm indexes as rocm in stale checks

With a pinned ROCm/GFX index, this compares the installed tag to the raw leaf such as gfx1151 or rocm6.4, but the probe above only ever records cuXXX or cpu and treats a +rocm... torch version as cpu. As a result, unsloth studio update on an existing ROCm Windows venv pinned to a gfx/rocm index is always marked stale (or exits in installer-managed mode) even when the installed ROCm torch is correct; normalize these leaves to the same rocm flavor and parse +rocm before comparing.

Useful? React with 👍 / 👎.

Comment thread studio/install_python_stack.py Outdated
# An explicit CUDA wheel-index pin (headless / container / CI cross-install)
# commits to CUDA wheels regardless of whether a GPU is visible here, so it
# overrides the GPU-presence gate.
if not _has_usable_nvidia_gpu() and _explicit_torch_index_url() is None:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reinstall CPU torch when CUDA is explicitly pinned

In standalone unsloth studio update with UNSLOTH_TORCH_INDEX_FAMILY=cu128 on a headless/no-visible-GPU builder, this bypasses the GPU gate, but the helper still returns unless the existing torch probe prints hip; an existing CPU torch is treated as deliberate and never reinstalled from the pinned CUDA index. Since the update path preserves torch rather than preinstalling it from install.sh, the explicit CUDA override remains ineffective for the common CPU-venv-to-CUDA cross-install case.

Useful? React with 👍 / 👎.

Follow-ups to the override work flagged in review:

- install.ps1: a pinned gfx*/rocm>=7.2 index previously skipped the AMD reroute
  that sets the torch>=2.11 floor, so the generic install used torch>=2.4,<2.11
  and could resolve the known-bad _grouped_mm wheel. Route a pinned ROCm index
  through the ROCm install path with the 2.11 floor + companions, and guard the
  companion-spec lookup so a skipped reroute block cannot null-deref.
- studio/setup.ps1: the stale-venv check compared the installed flavor (cuXXX/cpu,
  with +rocm misread as cpu) against the raw pinned leaf (gfx1151 / rocm6.4), so a
  correct pinned ROCm venv was always marked stale. Classify +rocm wheels as the
  generic 'rocm' flavor and normalize a pinned rocm*/gfx* leaf to 'rocm' before
  comparing (cu* stays specific so cu126-vs-cu128 still rebuilds).
- install_python_stack.py: _ensure_cuda_torch now also reinstalls from a pinned
  CUDA index when the venv carries a CPU wheel (headless CPU-venv-to-CUDA
  cross-install via 'studio update'), not only when it finds a ROCm build.
- tests: parity assertions already cover all four installers honoring the override.
@danielhanchen

Copy link
Copy Markdown
Member Author

Thanks, all three were real gaps in the pinned-index handling. Fixed:

  • install.ps1: a pinned gfx*/rocm>=7.2 index was skipping the AMD reroute block, which is the only place that set the torch>=2.11.0,<2.12.0 floor and the matching torchvision/torchaudio companions, so the generic install fell back to torch>=2.4,<2.11.0 and could resolve the known-bad _grouped_mm wheel. A pinned ROCm index is now routed through the ROCm install path with the same 2.11 floor and companions derived from the pinned leaf (gfx* and rocm>=7.2 serve torch 2.11+). The companion-spec lookups are also guarded so a skipped reroute block cannot null-deref the per-arch maps.
  • studio/setup.ps1: the stale-venv check compared the installed flavor (which only recorded cuXXX/cpu and misread a +rocm wheel as cpu) against the raw pinned leaf (gfx1151 / rocm6.4), so a correct pinned ROCm venv was always marked stale. The probe now classifies any +rocm wheel as the generic rocm flavor, and a pinned rocm*/gfx* leaf is normalized to rocm before comparing. cu* leaves stay specific so a cu126-vs-cu128 mismatch still rebuilds.
  • studio/install_python_stack.py: _ensure_cuda_torch now also reinstalls from the pinned CUDA index when the venv carries a CPU wheel, not only a ROCm build. This is the headless CPU-venv-to-CUDA cross-install (studio update with UNSLOTH_TORCH_INDEX_FAMILY=cu128); the update path preserves torch rather than preinstalling it from install.sh, so the explicit CUDA pin was previously a no-op. A CPU wheel with no CUDA pin is still treated as deliberate.

Validation: tests/sh/test_get_torch_index_url.sh 46/46; test_cross_platform_parity.py (incl. the override-parity assertions for all four installers), test_cuda_repair.py, test_rocm_support.py, test_gpu_detection_followups.py green; both .ps1 parse and test_torch_flavor.ps1 passes.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d2d5f90175

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread studio/setup.ps1
# Gating on $HasROCm alone left Strix Halo / Radeon 8060S on CPU torch; a failed
# ROCm install still falls back to CPU below, so this is safe.
if (($HasROCm -or $ROCmGfxArch) -and $CuTag -eq "cpu") {
if (-not $TorchIndexPinned -and ($HasROCm -or $ROCmGfxArch) -and $CuTag -eq "cpu") {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route pinned ROCm leaves through the ROCm install path

When UNSLOTH_TORCH_INDEX_URL or _FAMILY points at a ROCm/GFX index during standalone studio/setup.ps1 updates, this guard skips the only branch that initializes $ROCmIndexUrl, $ROCmTorchSpec, and the pinned torchvision/torchaudio ranges. The script then falls through with $CuTag set to the pinned leaf and installs via the generic CUDA branch using bare torch torchvision torchaudio from that ROCm index, so pinned Strix/RDNA4 indexes can resolve the known-bad <2.11 wheels or ABI-mismatched companions instead of the constrained ROCm triplet used by the unpinned AMD path.

Useful? React with 👍 / 👎.

Comment thread studio/setup.ps1 Outdated
# Normalize a pinned rocm*/gfx* leaf to the generic "rocm" flavor so it
# compares against the installed +rocm wheel (also "rocm"); cu*/cpu
# leaves stay specific so a cu126-vs-cu128 mismatch still rebuilds.
$expectedTorchTag = if ($_pinLeaf -like 'gfx*' -or $_pinLeaf -like 'rocm*') { "rocm" } else { $_pinLeaf }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip stale checks for unknown pinned URL leaves

When UNSLOTH_TORCH_INDEX_URL is a full custom index whose final path segment is not itself a flavor, e.g. a PEP 503 mirror ending in /simple, this assigns simple as the required torch tag. A fresh Windows install can successfully install cu128 from that URL in install.ps1, then the installer-managed setup.ps1 pass compares cu128 != simple, marks the venv stale, and exits with a repair error; direct updates similarly rebuild a correct environment. Unknown leaves should not be treated as torch flavor tags.

Useful? React with 👍 / 👎.

Comment thread studio/install_python_stack.py Outdated
Comment on lines 1092 to 1093
else:
return # healthy CUDA torch, or a deliberate CPU wheel -- leave as-is

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reinstall wrong CUDA family when pinned

With standalone unsloth studio update on Linux/headless builders, UNSLOTH_TORCH_INDEX_FAMILY=cu128 now bypasses the visible-GPU gate, but the probe collapses every installed CUDA wheel to the generic marker cuda. If the venv already has torch+cu126, this else returns without reinstalling from the pinned cu128 index, so the explicit override still leaves the wrong CUDA family in place; the repair should compare the installed +cuXXX tag when a CUDA family is pinned.

Useful? React with 👍 / 👎.

Comment thread install.ps1
$_pinRocm211 = ([int]$Matches[1] -gt 7) -or ([int]$Matches[1] -eq 7 -and [int]$Matches[2] -ge 2)
}
if ($_pinLeaf -like 'gfx*' -or $_pinRocm211) {
$ROCmIndexUrl = $TorchIndexUrl

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fall back to CPU from the CPU index after pinned ROCm fails

For a pinned ROCm/GFX index, this sets $ROCmIndexUrl to the same URL stored in $TorchIndexUrl; if the ROCm install later fails, the intended “CPU fallback” path still runs uv pip install ... --index-url $TorchIndexUrl, so it retries the failing ROCm mirror instead of using the CPU wheel index. This makes transient or unreachable pinned ROCm mirrors abort the Windows installer rather than leaving a CPU base for Studio to repair later, unlike the unpinned AMD path where $TorchIndexUrl remains */cpu.

Useful? React with 👍 / 👎.

Follow-ups to the previous round:

- studio/setup.ps1: a pinned gfx*/rocm>=7.2 index now routes through the ROCm
  install path with the 2.11 floor + companions (it previously fell through to the
  CUDA branch with bare torch/torchvision/torchaudio against the ROCm index). The
  CPU/CUDA fallback index is forced to the CPU wheel index when a ROCm index is
  active, so a failed pinned-ROCm install does not retry the ROCm mirror.
- studio/setup.ps1: the stale-venv check no longer treats an unrecognized pinned
  URL leaf (e.g. a PEP 503 mirror ending in /simple) as a torch flavor tag, which
  was marking a correct venv stale; cu*/cpu/rocm/gfx leaves are still compared.
- install.ps1: the post-failure CPU fallback uses an explicit CPU index instead of
  , which for a pinned ROCm index was the ROCm mirror itself (so the
  'fallback' just retried the failing index and aborted the installer).
- install_python_stack.py: _ensure_cuda_torch now also reinstalls when the venv's
  CUDA family differs from a pinned one (installed cu126 vs pinned cu128), not only
  CPU->CUDA; the probe reports the installed cuXXX tag for the comparison.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a5baba8ce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread studio/setup.ps1
Comment on lines +2928 to +2932
Fast-Install torch torchvision torchaudio --index-url $TorchInstallIndexUrl
$torchInstallExit = $LASTEXITCODE
$output = ""
} else {
$output = Fast-Install torch torchvision torchaudio --index-url "$PyTorchWhlBase/$CuTag" | Out-String
$output = Fast-Install torch torchvision torchaudio --index-url $TorchInstallIndexUrl | Out-String

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Force the pinned ROCm CPU fallback to reinstall

When a pinned rocm*/gfx* index fails, $TorchInstallIndexUrl has already been set to the CPU index, but $CuTag remains the pinned leaf, so execution skips the CPU fallback branch with @cpuForce and lands here instead. These CUDA-branch calls lack --force-reinstall, so uv/pip can keep a partially installed ROCm torch that still satisfies bare torch even though the script warned it was falling back to CPU; reset $CuTag/route through the CPU branch or include the force reinstall for this fallback path.

Useful? React with 👍 / 👎.

Comment thread install.sh Outdated
# All other ROCm tags and CUDA stay within <2.11.0.
case "$TORCH_INDEX_URL" in
*/rocm7.2) TORCH_CONSTRAINT="torch>=2.11.0,<2.12.0" ;;
*/rocm7.2|*/gfx*) TORCH_CONSTRAINT="torch>=2.11.0,<2.12.0" ;;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pin ROCm companion wheels for pinned gfx installs

With a full UNSLOTH_TORCH_INDEX_URL=.../gfx1151/gfx120X override, this new */gfx* case only raises the torch range; the later shell install still passes bare torchvision torchaudio from the same AMD per-arch index. The PowerShell/Python paths in this change pin those companions because AMD's per-gfx indexes publish them independently, so the shell path can resolve a torchvision/torchaudio build for a different torch minor even while torch is constrained to <2.12, leaving the venv ABI-mismatched.

Useful? React with 👍 / 👎.

@danielhanchen

Copy link
Copy Markdown
Member Author

All four were real gaps in the pinned-index handling. Fixed in 4a5baba8c:

  • studio/setup.ps1 (pinned ROCm routing): a pinned gfx*/rocm>=7.2 index now routes through the ROCm install path with the >=2.11 torch floor and the constrained torchvision/torchaudio companions, instead of falling through to the generic CUDA branch with bare torch torchvision torchaudio resolved off the ROCm index. That bare-triplet path could pull the known-bad <2.11 wheels or ABI-mismatched companions.
  • studio/setup.ps1 (unknown leaf stale check): the stale-venv comparison no longer treats an unrecognized pinned URL leaf as a torch flavor tag. A full custom index ending in e.g. /simple was setting the required tag to simple, so a freshly-installed cu128 venv compared cu128 != simple and got marked stale and rebuilt. Only cu*/cpu/rocm/gfx leaves are compared now; an unknown leaf trusts the pin.
  • install.ps1 (CPU fallback index): for a pinned ROCm/GFX index, the post-failure CPU fallback now installs from an explicit CPU wheel index rather than $TorchIndexUrl, which for a pinned ROCm index was the ROCm mirror itself. Previously the "CPU fallback" just retried the failing ROCm mirror and aborted the installer, instead of leaving a CPU base for Studio to repair later (matching the unpinned AMD path).
  • studio/install_python_stack.py (wrong CUDA family): _ensure_cuda_torch now reinstalls from the pinned index when the venv's CUDA family differs from the pinned one (installed cu126 vs pinned cu128), not only on CPU to CUDA. The probe reports the installed +cuXXX tag so the comparison is exact rather than collapsing every CUDA wheel to a generic cuda marker.

Validation: both .ps1 scripts parse, test_torch_flavor.ps1 passes, and the Python install suites stay green.

…r window

The pinned-ROCm CPU fallback computes an explicit CPU index, but the comment
explaining why it cannot reuse $TorchIndexUrl pushed the actual
Invoke-InstallCommandRetry / --force-reinstall call more than 600 chars past the
"ROCm PyTorch install failed" message, so test_pr5940_followups's window check
no longer saw the retry helper. Move the CPU-index computation and its comment
above the failure substep so the retrying force-reinstall stays adjacent to the
message. No behavior change: same explicit CPU index, same retry, same
--force-reinstall.
@LeoBorcherding

Copy link
Copy Markdown
Collaborator

Tested on Windows 11 with an RTX 5060 Ti (driver 595.97, CUDA 13.2), Python 3.13.13. Checked the default detection plus both overrides, including a full reinstall to confirm the override actually changes what gets installed.

Default (no override) detects the right family for this card. Fresh local install picked cu130:

gpu            NVIDIA GPU detected
               installing PyTorch (https://download.pytorch.org/whl/cu130)...
 + torch==2.10.0+cu130

With UNSLOTH_TORCH_INDEX_FAMILY=cu128, a clean reinstall on the same host pulled the cu128 wheels instead, no GPU re-probe:

torch        2.10.0+cu128
torchvision  0.25.0+cu128
torchaudio   2.11.0+cu128

URL resolution checked against the actual Get-TorchIndexUrl from install.ps1:

No override (auto-detect):           https://download.pytorch.org/whl/cu130
UNSLOTH_TORCH_INDEX_FAMILY=cu128:    https://download.pytorch.org/whl/cu128
UNSLOTH_TORCH_INDEX_URL=.../cu124:   https://download.pytorch.org/whl/cu124   (beats _FAMILY)

So _FAMILY appends the leaf to the mirror base and skips detection, and _URL takes precedence over _FAMILY and is used as-is, which matches the description.

Studio also came up cleanly on the branch earlier (UI loaded, GGUF chat worked) on the default cu130 build:

Hardware detected: CUDA -- NVIDIA GeForce RTX 5060 Ti
INFO:     Unsloth Studio running on http://127.0.0.1:8889

Default desktop installs behave exactly as before with nothing set, and the override does what it says for headless/CI/Docker builds. Looks good.

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f6c5e46dad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread studio/setup.ps1
Comment on lines +2543 to +2547
} elseif ($torchVer -match '\+rocm') {
# Any +rocm / gfx wheel -> generic "rocm" flavor. The exact ROCm
# version is repaired later by install_python_stack.py; here we
# only need the flavor so a correct ROCm venv is not marked stale.
$installedTorchTag = "rocm"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Treat unpinned ROCm venvs as expected ROCm

On a normal Windows AMD/ROCm install with no UNSLOTH_TORCH_INDEX_* pin and no NVIDIA GPU, this new branch records an existing +rocm torch as rocm, but the expected tag below still falls back to cpu. That makes a healthy ROCm venv look stale (torch rocm != required cpu); for installs launched from install.ps1 the managed setup path exits before finishing, and direct studio update rebuilds unnecessarily. Derive the expected tag from $HasROCm/$ROCmGfxArch here or avoid turning unpinned ROCm into a CPU mismatch.

Useful? React with 👍 / 👎.

Comment thread studio/setup.ps1
Comment on lines +2928 to +2932
Fast-Install torch torchvision torchaudio --index-url $TorchInstallIndexUrl
$torchInstallExit = $LASTEXITCODE
$output = ""
} else {
$output = Fast-Install torch torchvision torchaudio --index-url "$PyTorchWhlBase/$CuTag" | Out-String
$output = Fast-Install torch torchvision torchaudio --index-url $TorchInstallIndexUrl | Out-String

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Force CPU fallback after pinned ROCm failures

When a pinned ROCm index such as UNSLOTH_TORCH_INDEX_FAMILY=rocm7.2 or gfx1151 fails to install, $CuTag remains the ROCm leaf, so execution skips the CPU branch with @cpuForce and reaches this CUDA branch instead. Since $TorchInstallIndexUrl was precomputed as the CPU URL while $ROCmIndexUrl was set, this installs from the CPU index without --force-reinstall, allowing uv to keep a partially installed ROCm torch that already satisfies torch—the exact case the CPU fallback was guarding against.

Useful? React with 👍 / 👎.

Comment on lines +1376 to +1379
_override_idx = _explicit_rocm_torch_index_url()
if _override_idx is not None:
index_url = _override_idx
tag = index_url.rstrip("/").rsplit("/", 1)[-1].lower()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bypass GPU gates for explicit ROCm pins

In standalone unsloth studio update with UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4 or a full ROCm URL, this override is only consulted after _ensure_rocm_torch() has already returned for a visible NVIDIA GPU, no visible AMD GPU, or an unreadable ROCm version. That leaves an existing CPU/CUDA torch in place in headless CI/container or mixed-GPU cross-install scenarios, even though the new override is meant to skip host GPU probing like the CUDA pin path does.

Useful? React with 👍 / 👎.

Comment on lines +1507 to +1508
elif _idx_leaf == "cpu":
_TORCH_BACKEND = "cpu"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add a CPU repair path for explicit CPU pins

When standalone update runs with UNSLOTH_TORCH_INDEX_FAMILY=cpu or a URL ending in /cpu, this sets _TORCH_BACKEND to cpu; both repair helpers then treat that as a skip signal (_ensure_cuda_torch only runs for ""/cuda, and _ensure_rocm_torch returns for cuda/cpu). There is no CPU reinstall path, so an existing CUDA/ROCm torch remains in place even though the explicit CPU pin is supposed to be authoritative.

Useful? React with 👍 / 👎.

Comment thread install.sh Outdated
# All other ROCm tags and CUDA stay within <2.11.0.
case "$TORCH_INDEX_URL" in
*/rocm7.2) TORCH_CONSTRAINT="torch>=2.11.0,<2.12.0" ;;
*/rocm7.2|*/gfx*) TORCH_CONSTRAINT="torch>=2.11.0,<2.12.0" ;;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pin companion wheels for gfx overrides

For a newly supported pinned gfx override such as UNSLOTH_TORCH_INDEX_URL=.../gfx1151, this only raises the torch constraint; the install/repair commands still pass bare torchvision torchaudio to uv with --index-url "$TORCH_INDEX_URL". I checked uv pip install --help: --index-url selects the package index and extra indexes require --extra-index-url, so the AMD gfx index can resolve newer independently published companions than the torch 2.11 wheel (the PS/Python paths pin these for the same ABI-mismatch reason).

Useful? React with 👍 / 👎.

Comment thread studio/setup.ps1 Outdated
Comment on lines +2574 to +2575
if ($_pinLeaf -like 'gfx*' -or $_pinLeaf -like 'rocm*') {
$expectedTorchTag = "rocm"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve exact ROCm pins during update

When a user changes an explicit ROCm pin from one ROCm family to another, for example from an existing +rocm6.4 venv to UNSLOTH_TORCH_INDEX_URL=.../gfx1151, this collapses both sides to the generic rocm flavor. If the installed package version is already current, the later dependency fast-path skips the PyTorch install entirely because the venv is not CPU-only, so the requested pinned index is never applied.

Useful? React with 👍 / 👎.

danielhanchen and others added 2 commits June 27, 2026 09:11
setup.ps1:
- Stale-venv check: treat an AMD/ROCm host (HasROCm or a resolved gfx arch) with
  no explicit pin as expecting "rocm", not "cpu", so a healthy +rocm venv is not
  flagged stale (which made installer-managed setup exit and direct update rebuild).
- Pinned-ROCm install failure now routes into the force-reinstall CPU branch:
  CuTag stays the rocm/gfx leaf on failure, so the condition also checks
  ROCmCpuFallback; otherwise the CUDA branch installed from the CPU index without
  --force-reinstall and kept the partial ROCm torch.
- Explicit ROCm pin compare no longer collapses gfx*/rocm* to a generic "rocm":
  it compares the +rocmX.Y version (and the torch 2.11 line for gfx pins) so
  changing the pinned family (e.g. rocm6.4 -> gfx1151) rebuilds and applies it.

install_python_stack.py:
- _ensure_rocm_torch: an explicit ROCm wheel-index pin now bypasses the
  NVIDIA-present / no-AMD-GPU / unreadable-ROCm gates (headless/container/CI
  cross-install), mirroring the explicit-CUDA-pin bypass in _ensure_cuda_torch.
- Add _ensure_cpu_torch: an explicit CPU pin (FAMILY=cpu or /cpu URL) now has a
  repair path that reinstalls CPU torch over an existing CUDA/ROCm build on a
  standalone update (which skips install.sh's flavor enforcement).

install.sh:
- Pin torchvision/torchaudio companions alongside torch for the rocm7.2 / per-gfx
  index and the Strix reroute (those AMD indexes publish companions independently
  and a bare name can resolve a torch-2.12-built wheel, an ABI mismatch).
@danielhanchen

Copy link
Copy Markdown
Member Author

Thanks for the review. Addressed all six in c290cac.

  1. Unpinned ROCm expected tag (setup.ps1). The stale-venv check now derives the expected tag from $HasROCm / $script:ROCmGfxArch, so an unpinned AMD host expects rocm instead of cpu. A healthy +rocm venv is no longer flagged stale, so the installer-managed path no longer exits and a direct studio update no longer rebuilds.

  2. CPU fallback after a pinned ROCm failure (setup.ps1). The CPU branch now fires on $ROCmCpuFallback as well as $CuTag -eq "cpu". After a failed pinned ROCm install $CuTag is still the rocm/gfx leaf, so without this it fell through to the CUDA branch and installed from the CPU index without --force-reinstall, keeping the partial ROCm torch. It now reinstalls CPU torch with --force-reinstall.

  3. Bypass GPU gates for explicit ROCm pins (install_python_stack.py). An explicit ROCm pin (rocm*/gfx* leaf) now bypasses the NVIDIA-present / no-AMD-GPU / unreadable-ROCm-version gates in _ensure_rocm_torch, mirroring the explicit-CUDA-pin bypass in _ensure_cuda_torch. A headless / container / CI cross-install now applies the pinned index.

  4. CPU repair path for explicit CPU pins (install_python_stack.py). Added _ensure_cpu_torch, which force-reinstalls CPU torch over an existing CUDA/ROCm build when UNSLOTH_TORCH_INDEX_FAMILY=cpu or a /cpu URL is pinned. It runs after the CUDA/ROCm repairs at both call sites, and only acts on an explicit pin (an auto-detected CPU backend already installed CPU wheels via install.sh).

  5. Pin companion wheels for gfx overrides (install.sh). torchvision/torchaudio are now pinned alongside torch (>=0.26,<0.27 / >=2.11,<2.12) for the rocm7.2 and per-gfx index and the Strix reroute, through new TORCHVISION_CONSTRAINT / TORCHAUDIO_CONSTRAINT variables. They stay bare for cu*/cpu and the curated pytorch.org rocm indexes, so those paths are byte-for-byte unchanged. The bare names could otherwise resolve a torch-2.12-built companion from the AMD index.

  6. Preserve exact ROCm pins during update (setup.ps1). The explicit-ROCm-pin compare no longer collapses gfx*/rocm* to a generic rocm. It compares the +rocmX.Y version when both sides are readable, and the torch 2.11 line for gfx pins, so changing the pinned family (for example rocm6.4 to gfx1151) rebuilds and applies the new index. One known limitation: switching between two gfx arches that both serve torch 2.11 (for example gfx1151 to gfx1200) is not distinguished by this venv-level check.

Local checks: test_get_torch_index_url.sh 46/46, test_cross_platform_parity.py 17/17, test_torch_flavor.ps1 green, setup.ps1 Pester 31 passed / 0 failed.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4018a697d2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread studio/setup.ps1
}
} elseif ($HasNvidiaSmi) {
$expectedTorchTag = Get-PytorchCudaTag
} elseif ($HasROCm -or $script:ROCmGfxArch) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't require ROCm torch for unmapped AMD GPUs

This stale-venv check now treats any $HasROCm or inferred $script:ROCmGfxArch as requiring a ROCm torch build, but the install branch later only maps gfx120x/gfx115x/gfx110x/gfx90a/gfx908 and explicitly falls back to CPU when $archFamily is null. On an AMD host with an unsupported/unknown arch such as name-inferred RDNA2 gfx1030, a correct CPU venv is marked stale (torch cpu != required rocm), so direct updates rebuild every run and installer-managed setup aborts before the CPU fallback can run. Fresh evidence is the later $archFamily null path in this same script that warns and installs CPU for those arches.

Useful? React with 👍 / 👎.

Comment thread studio/install_python_stack.py Outdated
# An explicit CUDA wheel-index pin (headless / container / CI cross-install)
# commits to CUDA wheels regardless of whether a GPU is visible here, so it
# overrides the GPU-presence gate.
if not _has_usable_nvidia_gpu() and _explicit_torch_index_url() is None:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit CUDA gate bypass to cu pins*

When UNSLOTH_TORCH_INDEX_URL is a custom mirror whose final segment is not a CUDA family, for example a PEP 503 URL ending in /simple, _TORCH_BACKEND remains empty but this condition still bypasses the no-NVIDIA gate merely because some explicit URL exists. On a GPU-less or AMD builder with an existing ROCm torch, the later hip branch runs the CUDA repair against that arbitrary URL even though no cu* pin was provided, replacing the current build instead of leaving unknown custom indexes alone. Restrict this bypass to classified CUDA pins.

Useful? React with 👍 / 👎.

Comment thread studio/setup.ps1
} else {
# gfx pin or unreadable version: compare on the torch 2.11 line.
$expectedTorchTag = if ($_pinNeeds211) { "rocm(torch>=2.11)" } else { "rocm(torch<2.11)" }
$installedTorchTag = if ($_instIs211) { "rocm(torch>=2.11)" } else { "rocm(torch<2.11)" }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve non-ROCm flavor during ROCm pin checks

When a ROCm/gfx index is pinned but the installed wheel is CPU or CUDA, this fallback overwrites $installedTorchTag based only on the torch major/minor version. For example, UNSLOTH_TORCH_INDEX_FAMILY=rocm6.4 with CPU torch 2.10, or a gfx1151 pin with CPU/CUDA torch 2.11, can compare equal to the expected ROCm bucket, so an up-to-date Studio venv skips the dependency pass and never installs the pinned ROCm wheel. Keep the original non-ROCm flavor unless torch.__version__ actually contains +rocm.

Useful? React with 👍 / 👎.

Comment thread studio/setup.ps1
Comment on lines +2625 to 2626
if ($_expectedKnown -and $installedTorchTag -and $installedTorchTag -ne $expectedTorchTag) {
$shouldRebuild = $true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Let explicit pins repair instead of deleting the venv

For standalone Windows unsloth studio update, an explicit pin that intentionally changes flavor now trips the stale-venv rebuild before the later install/repair code can run. For example, on a NVIDIA venv with torch+cu128, setting UNSLOTH_TORCH_INDEX_FAMILY=cpu makes this set $shouldRebuild, the direct-update path removes .venv, and the script then errors because the virtual environment is missing; the new CPU repair path is never reached. Explicit pin mismatches should be routed to the torch install/repair step instead of the destructive stale check.

Useful? React with 👍 / 👎.

@danielhanchen

Copy link
Copy Markdown
Member Author

Native install verified across Windows, macOS and arm Linux

To confirm this PR's torch-index resolution does not break the real install path, I ran a native install.sh / install.ps1 --local on the free GitHub-hosted runners for every non-x64-Linux OS (the published Docker image is linux/amd64 and cannot run on these hosts, so this is the path Mac / Windows / arm users actually take), then loaded unsloth/gemma-4-E4B-it-GGUF:UD-Q4_K_XL and exercised the Studio.

Runner install gemma load chat RAG web search llama-server prebuilt
windows-latest (x64) PASS PASS PASS PASS PASS ggml-org b9827
windows-11-arm PASS PASS PASS PASS PASS ggml-org b9827
macos-15-intel (x64) PASS PASS PASS n/a n/a unslothai b9813-mix
ubuntu-24.04-arm PASS PASS PASS PASS PASS ggml-org b9827
macos-latest (arm, 7GB) PASS PASS see note PASS n/a unslothai b9813-mix

The install completes and resolves the right wheel index on each OS, the prebuilt llama-server installs (not a source build) and runs, and the Studio loads + chats with the GGUF. Native Windows, with the model loaded and the web-search tool firing:

Native Windows Studio chat with gemma-4-E4B-it-GGUF

macOS (Intel) and arm Linux, same flow:

macOS Intel Studio chat

arm Linux Studio chat

Notes:

  • The llama-server prebuilt source is per-OS by design: Windows/Linux pull the upstream ggml-org/llama.cpp release, macOS pulls unslothai/llama.cpp (the Metal build). Both install as prebuilts and run.
  • macos-latest is the smallest free runner (M1, 3-core, 7GB). The GGUF loads fine there, but CPU generation returns an empty completion even after a retry with a longer deadline; it generates fine on the 14GB Intel Mac and everywhere else. This is a runner memory limit, not an install/PR issue.
  • The update button (stage an older unsloth, run unsloth studio update) and the llama prebuilt confirmation passed on every runner.

…rrides

_ensure_cuda_torch only overrode the NVIDIA-presence gate for *any* pinned index,
so a non-CUDA mirror URL (or a ROCm/CPU pin) on a non-NVIDIA host with ROCm torch
could force a CUDA reinstall over a working ROCm venv. Add
_explicit_cuda_torch_index_url() (leaf cu*), matching the ROCm/CPU helpers, and
gate on it instead.

install.sh::get_torch_index_url treated a whitespace-only UNSLOTH_TORCH_INDEX_URL
/ _FAMILY as authoritative (yielding an invalid index), unlike the Python .strip()
and PowerShell IsNullOrWhiteSpace paths; trim leading/trailing whitespace first.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c1e956c80a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1206 to +1208
"torch",
"torchvision",
"torchaudio",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Constrain CPU repair torch versions

When a standalone update is run with an explicit CPU pin on a venv that currently has CUDA/ROCm torch, this repair installs bare torch, torchvision, and torchaudio from the CPU index with constrain = False. Unlike the fresh CPU/CUDA paths and _CUDA_TORCH_PKG_SPEC, this can resolve the newest CPU trio outside the repo’s supported <2.11 torch range (and mismatched companion bounds), so the CPU override can leave Studio on an unsupported PyTorch stack; use the same bounded specs as the other CPU/CUDA installs.

Useful? React with 👍 / 👎.

Comment thread studio/setup.ps1
# false -- name-inferred Adrenalin hosts still get ROCm torch below.)
# Without this an unpinned ROCm venv compares "rocm" != "cpu" and is
# needlessly rebuilt, and an installer-managed setup exits as stale.
$expectedTorchTag = "rocm"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route CPU-on-AMD venvs to repair

On an AMD/ROCm Windows host with no explicit pin, this makes every existing CPU torch venv stale just because $HasROCm or $script:ROCmGfxArch is true. That catches legitimate recovery cases such as an installer ROCm failure that intentionally fell back to a CPU base, or an older CPU-only AMD venv that the later dependency block would repair; before that repair can run, installer-managed setup exits and direct studio update removes the venv then errors. Treat existing +rocm as non-stale here without forcing CPU mismatches through the destructive stale-venv path.

Useful? React with 👍 / 👎.

# An explicit CUDA wheel-index pin (headless / container / CI cross-install)
# commits to CUDA wheels regardless of whether a GPU is visible here, so it
# overrides the GPU-presence gate.
if not _has_usable_nvidia_gpu() and _explicit_cuda_torch_index_url() is None:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply CUDA pins before the CVD hide gate

When CI or a container hides GPUs with CUDA_VISIBLE_DEVICES=-1/empty and runs standalone unsloth studio update with UNSLOTH_TORCH_INDEX_FAMILY=cu128, the earlier CVD return still fires before this new explicit-pin gate. That leaves an existing CPU/ROCm torch in place even though the caller pinned CUDA; fresh evidence is that the explicit CUDA override check was added below the CVD short-circuit rather than before it.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants