fix: clearer Studio setup error when GPU driver is too old for the installed CUDA toolkit by danielhanchen · Pull Request #94 · unslothai/unsloth-staging-1

danielhanchen · 2026-06-07T10:11:26Z

Staging mirror of unslothai#5993

Original PR: unslothai#5993
Author: mvanhorn

This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.

Original description

Summary

When studio/setup.sh runs on a host whose installed CUDA toolkit (for example 13.3) is newer than the GPU driver can run, the installer now names the real blocker: the toolkit is supported, but the current driver is too old to run it. Both studio/setup.sh and studio/setup.ps1 emit the same guidance, pointing the user at a driver update or the prebuilt CUDA bundle.

Why this matters

Issue unslothai#5917 (with duplicate feedback in unslothai#5918) reported that a CUDA 13.3 toolkit paired with an older NVIDIA driver made the Studio installer behave as though 13.3 was unsupported, when Unsloth does support 13.3 and the actual gap was the driver. PR unslothai#5826 already taught setup.sh to cope with fresh toolkits, but the diagnostic still could not distinguish "toolkit too new for the project" from "driver too old for this toolkit." This compares the driver-reported CUDA capability from nvidia-smi against the installed toolkit version near the _nvcc_meets_llama_minimum check and surfaces the explanatory message only when the source build would otherwise abort, keeping the prebuilt-bundle path non-fatal.

Testing

Covered by the new cases in tests/studio/install/test_selection_logic.py: driver supports the toolkit (no message), driver too old (message names both versions), no nvidia-smi present (branch skipped), and unparseable nvidia-smi output (graceful fallback to the prior generic message). Full suite runs in CI.

Fixes unslothai#5917

This PR tracks the moving review branch (pr-5993-head). Iteration fix commits land here directly. Review-added tests are in a separate PR.

Changed files:

.github/workflows/consolidated-tests-ci.yml
.github/workflows/lint-ci.yml
.github/workflows/lockfile-audit.yml
.github/workflows/mlx-ci.yml
.github/workflows/notebooks-ci.yml
.github/workflows/release-desktop.yml
.github/workflows/security-audit.yml
.github/workflows/stale.yml
.github/workflows/studio-api-smoke.yml
.github/workflows/studio-backend-ci.yml
.github/workflows/studio-frontend-ci.yml
.github/workflows/studio-inference-smoke.yml
.github/workflows/studio-load-orchestrator-ci.yml
.github/workflows/studio-mac-api-smoke.yml
.github/workflows/studio-mac-inference-smoke.yml
.github/workflows/studio-mac-install-matrix.yml
.github/workflows/studio-mac-ui-smoke.yml
.github/workflows/studio-mac-update-smoke.yml
.github/workflows/studio-tauri-smoke.yml
.github/workflows/studio-ui-smoke.yml
.github/workflows/studio-update-smoke.yml
.github/workflows/studio-windows-api-smoke.yml
.github/workflows/studio-windows-inference-smoke.yml
.github/workflows/studio-windows-ui-smoke.yml
.github/workflows/studio-windows-update-smoke.yml
.github/workflows/version-compat-ci.yml
.github/workflows/wheel-smoke.yml
studio/setup.ps1
studio/setup.sh
tests/studio/install/test_selection_logic.py

…stalled CUDA toolkit

for more information, see https://pre-commit.ci

…gs render in C_WARN

…ly for PR unslothai#5993

setup.ps1 now routes the too-new-toolkit case through Write-CudaDriverToolkitMismatch instead of the old 'is installed but INCOMPATIBLE' banner. Extract that helper alongside Resolve-CudaToolkit so the child pwsh can run it, and assert the new driver-too-old guidance (and the one-line source-build error) instead of the removed INCOMPATIBLE text.

…lkit/driver edge tests setup.ps1: note that only a forced source build reaches the hard-exit branch (the prebuilt path returned above), unlike setup.sh which degrades to CPU. test_selection_logic.py: cover the CUDA UMD Version variant, the empty nvcc version guard, and the too_old (< 12.4) short-circuit.

…ts before CPU fallback The driver check now compares CUDA major versions only, per NVIDIA minor-version compatibility, and when the selected nvcc is still too new the setup iterates other installed toolkits and uses the newest driver-compatible one before falling back to a CPU llama.cpp build. Same rule mirrored in setup.ps1.

…llback The major-only compatibility fix updated the side-by-side scan (Find-Nvcc -MaxVersion) and the CUDA_PATH check, but the fallback that runs when Find-Nvcc -MaxVersion returns null still recorded any plain Find-Nvcc result as an incompatible toolkit without re-checking the major. A same-major toolkit discoverable only via PATH, process CUDA_PATH, or a custom location (e.g. toolkit 13.3 with a driver supporting CUDA 13.2) was therefore rejected even though it is compatible. Re-apply the same major-only rule in the fallback: use the toolkit when its major is within the driver's, otherwise record it as too-new. Adds a regression test covering the PATH-only same-major case.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 13137021b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-07T10:17:46Z

@@ -586,7 +159,7 @@ jobs:
            > Linux AppImage on Ubuntu 24.04+ may require: `sudo apt install libfuse2t64`
            > First-run system dependency elevation is supported on Ubuntu/Debian. Other Linux distributions should install system packages manually.
          releaseDraft: ${{ inputs.draft }}


Restore desktop-latest publishing for updater releases

For non-draft desktop releases this workflow now only creates the versioned release; I searched .github/workflows and there is no remaining step that uploads desktop-latest/latest.json, while studio/src-tauri/tauri.conf.json still points the updater endpoint at .../releases/download/desktop-latest/latest.json. That means shipped desktop clients will not discover new releases even though the release workflow succeeds.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-07T10:17:46Z

+            const appVersion = match[1];
+            const response = await fetch(`https://pypi.org/pypi/unsloth/${appVersion}/json`);


Verify the backend version the desktop actually requires

This checks PyPI for the Tauri/Cargo app version, but the backend package version the desktop requires is MIN_DESKTOP_BACKEND_VERSION in studio/src-tauri/src/preflight/version.rs (currently 2026.5.3, while Cargo.toml is 2026.4.8). When those diverge, the release can pass after checking the wrong unsloth artifact and ship a desktop build whose required backend version was never published.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-07T10:17:46Z

@@ -1,2289 +0,0 @@
-# SPDX-License-Identifier: AGPL-3.0-only


Restore PR and push CI coverage

This delete removes one of the PR/push CI workflows, and after checking the current .github/workflows directory only release-desktop.yml (workflow_dispatch) and stale.yml (schedule) remain, so future PRs and pushes no longer run lint, tests, smoke tests, lockfile audits, or security audits at all. That makes regressions in experiments and release-critical paths land without any automated validation.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-07T10:17:46Z


      # ── Linux dependencies ──


Avoid persisting a write token into release build steps

With contents: write now granted at workflow scope, this checkout uses the default persisted credentials, so the repository token remains in the local git config while later npm install, Tauri, signing, and build steps execute dependency code. The previous workflow explicitly disabled credential persistence; keep that here so any compromised package script or build hook cannot read and exfiltrate a write-capable GitHub token from the release runner.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-07T10:17:46Z

            > First-run system dependency elevation is supported on Ubuntu/Debian. Other Linux distributions should install system packages manually.
          releaseDraft: ${{ inputs.draft }}
-          prerelease: ${{ needs.prepare-version.outputs.prerelease }}
+          prerelease: false


Preserve prerelease status for beta desktop builds

This hard-codes every desktop release as non-prerelease, whereas the removed preparation step derived this from versions like v0.1.39-beta. If the Cargo/Tauri version contains a prerelease suffix for a beta build, the workflow will still publish the GitHub release as a stable release, making beta artifacts indistinguishable from normal desktop releases.

Useful? React with 👍 / 👎.

danielhanchen · 2026-06-07T10:19:18Z

/gemini review

gemini-code-assist

Code Review

This pull request updates the CUDA compatibility logic in both setup.ps1 and setup.sh to only enforce that the CUDA Toolkit's major version is less than or equal to the driver's supported major version, rather than checking minor versions. It also introduces robust fallback logic to search for alternative compatible toolkits and adds comprehensive unit tests. The review comments are highly constructive and should all be addressed: they suggest skipping the Bash-dependent tests on Windows to prevent failures, using the standard-compliant head -n 1 instead of head -1 in the shell script, and resolving bash dynamically from the system path instead of hardcoding /bin/bash to improve portability.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-07T10:21:35Z

+# ===========================================================================
+
+
+class TestCudaDriverToolkitMismatchMessage:


Since setup.sh and these test cases depend on bash, they will fail with FileNotFoundError when run on Windows systems. It is recommended to skip this test class on Windows using pytest.mark.skipif.

Suggested change

class TestCudaDriverToolkitMismatchMessage:

@pytest.mark.skipif(sys.platform == "win32", reason="Bash tests are not supported on Windows")

class TestCudaDriverToolkitMismatchMessage:

gemini-code-assist · 2026-06-07T10:21:36Z

+    command -v nvidia-smi >/dev/null 2>&1 || return 0
+    nvidia-smi 2>/dev/null \
+        | sed -nE 's/.*CUDA( UMD)? Version:[[:space:]]*([0-9]+)\.([0-9]+).*/\2.\3/p' \
+        | head -1 || true


Using head -1 is deprecated in POSIX and may not be supported on all platforms or could trigger warnings. It is more portable and standard-compliant to use head -n 1 instead.

Suggested change

| head -1 || true

| head -n 1 || true

gemini-code-assist · 2026-06-07T10:21:36Z

+
+    def _run_bash(self, script, *, env = None):
+        proc = subprocess.run(
+            ["/bin/bash", "-c", script],


Hardcoding "/bin/bash" can cause test failures on systems where bash is located elsewhere (e.g., /usr/local/bin/bash on BSD systems, or in the PATH on Windows/macOS). Using "bash" is more portable as it resolves the executable from the system PATH.

Suggested change

["/bin/bash", "-c", script],

["bash", "-c", script],

danielhanchen · 2026-06-07T10:54:54Z

/gemini review

gemini-code-assist

Code Review

This pull request updates the CUDA toolkit compatibility check in both the PowerShell (setup.ps1) and Bash (setup.sh) setup scripts to allow toolkits with a compatible major version (toolkit major <= driver major) rather than strictly enforcing minor version limits. It also introduces helper functions for mismatch diagnostics, fallback logic to find alternative compatible toolkits, and comprehensive unit tests. The review feedback suggests optimizing the Bash script by replacing subshell invocations of sed with pure Bash parameter expansion when splitting multi-line command outputs, which avoids process creation overhead inside loops.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-07T10:58:52Z

+        _status="$(printf '%s\n' "$_check" | sed -n '1p')"
+        _version="$(printf '%s\n' "$_check" | sed -n '2p')"


Spawning subshells and invoking external sed processes twice for every candidate in the loop is inefficient. Since _check is guaranteed to contain two lines separated by a newline, you can use pure Bash parameter expansion to split the status and version. This is significantly faster and avoids any process creation overhead.

Suggested change

_status="$(printf '%s\n' "$_check" | sed -n '1p')"

_version="$(printf '%s\n' "$_check" | sed -n '2p')"

_status="${_check%%$'\\n'*}"\n _version="${_check#*$'\\n'}"

gemini-code-assist · 2026-06-07T10:58:52Z

+                            NVCC_PATH="$(printf '%s\n' "$_ALT_NVCC_CHECK" | sed -n '1p')"
+                            _NVCC_VER="$(printf '%s\n' "$_ALT_NVCC_CHECK" | sed -n '2p')"


Similarly, you can avoid spawning subshells and invoking sed twice here by using pure Bash parameter expansion to split the path and version from _ALT_NVCC_CHECK.

Suggested change

NVCC_PATH="$(printf '%s\n' "$_ALT_NVCC_CHECK" | sed -n '1p')"

_NVCC_VER="$(printf '%s\n' "$_ALT_NVCC_CHECK" | sed -n '2p')"

NVCC_PATH="${_ALT_NVCC_CHECK%%$'\\n'*}"\n _NVCC_VER="${_ALT_NVCC_CHECK#*$'\\n'}"

mvanhorn and others added 10 commits June 5, 2026 18:45

fix: clearer Studio setup error when GPU driver is too old for the in…

4416945

…stalled CUDA toolkit

[pre-commit.ci] auto fixes from pre-commit.com hooks

01cdb70

for more information, see https://pre-commit.ci

Honor optional color arg in setup.sh substep so driver/toolkit warnin…

43c4941

…gs render in C_WARN

Add regression test that setup.sh _cuda_version_gt compares numerical…

97ac1df

…ly for PR unslothai#5993

style: apply ruff kwarg-spacing format after rebase

e4cdddf

Scrub .github/workflows for staging push (matches staging base)

1313702

chatgpt-codex-connector Bot reviewed Jun 7, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 7, 2026

View reviewed changes

		const appVersion = match[1];
		const response = await fetch(`https://pypi.org/pypi/unsloth/${appVersion}/json`);

		# ===========================================================================


		class TestCudaDriverToolkitMismatchMessage:

	class TestCudaDriverToolkitMismatchMessage:
	@pytest.mark.skipif(sys.platform == "win32", reason="Bash tests are not supported on Windows")
	class TestCudaDriverToolkitMismatchMessage:

		_status="$(printf '%s\n' "$_check" \| sed -n '1p')"
		_version="$(printf '%s\n' "$_check" \| sed -n '2p')"

	_status="$(printf '%s\n' "$_check" \| sed -n '1p')"
	_version="$(printf '%s\n' "$_check" \| sed -n '2p')"
	_status="${_check%%$'\\n'}"\n _version="${_check#$'\\n'}"

		NVCC_PATH="$(printf '%s\n' "$_ALT_NVCC_CHECK" \| sed -n '1p')"
		_NVCC_VER="$(printf '%s\n' "$_ALT_NVCC_CHECK" \| sed -n '2p')"

	NVCC_PATH="$(printf '%s\n' "$_ALT_NVCC_CHECK" \| sed -n '1p')"
	_NVCC_VER="$(printf '%s\n' "$_ALT_NVCC_CHECK" \| sed -n '2p')"
	NVCC_PATH="${_ALT_NVCC_CHECK%%$'\\n'}"\n _NVCC_VER="${_ALT_NVCC_CHECK#$'\\n'}"

Uh oh!

Conversation

danielhanchen commented Jun 7, 2026

Staging mirror of unslothai#5993

Original description

Summary

Why this matters

Testing

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Jun 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Jun 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants