fix: clearer Studio setup error when GPU driver is too old for the installed CUDA toolkit#94
fix: clearer Studio setup error when GPU driver is too old for the installed CUDA toolkit#94danielhanchen wants to merge 10 commits into
Conversation
…stalled CUDA toolkit
for more information, see https://pre-commit.ci
…gs render in C_WARN
setup.ps1 now routes the too-new-toolkit case through Write-CudaDriverToolkitMismatch instead of the old 'is installed but INCOMPATIBLE' banner. Extract that helper alongside Resolve-CudaToolkit so the child pwsh can run it, and assert the new driver-too-old guidance (and the one-line source-build error) instead of the removed INCOMPATIBLE text.
…lkit/driver edge tests setup.ps1: note that only a forced source build reaches the hard-exit branch (the prebuilt path returned above), unlike setup.sh which degrades to CPU. test_selection_logic.py: cover the CUDA UMD Version variant, the empty nvcc version guard, and the too_old (< 12.4) short-circuit.
…ts before CPU fallback The driver check now compares CUDA major versions only, per NVIDIA minor-version compatibility, and when the selected nvcc is still too new the setup iterates other installed toolkits and uses the newest driver-compatible one before falling back to a CPU llama.cpp build. Same rule mirrored in setup.ps1.
…llback The major-only compatibility fix updated the side-by-side scan (Find-Nvcc -MaxVersion) and the CUDA_PATH check, but the fallback that runs when Find-Nvcc -MaxVersion returns null still recorded any plain Find-Nvcc result as an incompatible toolkit without re-checking the major. A same-major toolkit discoverable only via PATH, process CUDA_PATH, or a custom location (e.g. toolkit 13.3 with a driver supporting CUDA 13.2) was therefore rejected even though it is compatible. Re-apply the same major-only rule in the fallback: use the toolkit when its major is within the driver's, otherwise record it as too-new. Adds a regression test covering the PATH-only same-major case.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 13137021b8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -586,7 +159,7 @@ jobs: | |||
| > Linux AppImage on Ubuntu 24.04+ may require: `sudo apt install libfuse2t64` | |||
| > First-run system dependency elevation is supported on Ubuntu/Debian. Other Linux distributions should install system packages manually. | |||
| releaseDraft: ${{ inputs.draft }} | |||
There was a problem hiding this comment.
Restore desktop-latest publishing for updater releases
For non-draft desktop releases this workflow now only creates the versioned release; I searched .github/workflows and there is no remaining step that uploads desktop-latest/latest.json, while studio/src-tauri/tauri.conf.json still points the updater endpoint at .../releases/download/desktop-latest/latest.json. That means shipped desktop clients will not discover new releases even though the release workflow succeeds.
Useful? React with 👍 / 👎.
| const appVersion = match[1]; | ||
| const response = await fetch(`https://pypi.org/pypi/unsloth/${appVersion}/json`); |
There was a problem hiding this comment.
Verify the backend version the desktop actually requires
This checks PyPI for the Tauri/Cargo app version, but the backend package version the desktop requires is MIN_DESKTOP_BACKEND_VERSION in studio/src-tauri/src/preflight/version.rs (currently 2026.5.3, while Cargo.toml is 2026.4.8). When those diverge, the release can pass after checking the wrong unsloth artifact and ship a desktop build whose required backend version was never published.
Useful? React with 👍 / 👎.
| @@ -1,2289 +0,0 @@ | |||
| # SPDX-License-Identifier: AGPL-3.0-only | |||
There was a problem hiding this comment.
Restore PR and push CI coverage
This delete removes one of the PR/push CI workflows, and after checking the current .github/workflows directory only release-desktop.yml (workflow_dispatch) and stale.yml (schedule) remain, so future PRs and pushes no longer run lint, tests, smoke tests, lockfile audits, or security audits at all. That makes regressions in experiments and release-critical paths land without any automated validation.
Useful? React with 👍 / 👎.
|
|
||
| # ── Linux dependencies ── |
There was a problem hiding this comment.
Avoid persisting a write token into release build steps
With contents: write now granted at workflow scope, this checkout uses the default persisted credentials, so the repository token remains in the local git config while later npm install, Tauri, signing, and build steps execute dependency code. The previous workflow explicitly disabled credential persistence; keep that here so any compromised package script or build hook cannot read and exfiltrate a write-capable GitHub token from the release runner.
Useful? React with 👍 / 👎.
| > First-run system dependency elevation is supported on Ubuntu/Debian. Other Linux distributions should install system packages manually. | ||
| releaseDraft: ${{ inputs.draft }} | ||
| prerelease: ${{ needs.prepare-version.outputs.prerelease }} | ||
| prerelease: false |
There was a problem hiding this comment.
Preserve prerelease status for beta desktop builds
This hard-codes every desktop release as non-prerelease, whereas the removed preparation step derived this from versions like v0.1.39-beta. If the Cargo/Tauri version contains a prerelease suffix for a beta build, the workflow will still publish the GitHub release as a stable release, making beta artifacts indistinguishable from normal desktop releases.
Useful? React with 👍 / 👎.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the CUDA compatibility logic in both setup.ps1 and setup.sh to only enforce that the CUDA Toolkit's major version is less than or equal to the driver's supported major version, rather than checking minor versions. It also introduces robust fallback logic to search for alternative compatible toolkits and adds comprehensive unit tests. The review comments are highly constructive and should all be addressed: they suggest skipping the Bash-dependent tests on Windows to prevent failures, using the standard-compliant head -n 1 instead of head -1 in the shell script, and resolving bash dynamically from the system path instead of hardcoding /bin/bash to improve portability.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| # =========================================================================== | ||
|
|
||
|
|
||
| class TestCudaDriverToolkitMismatchMessage: |
There was a problem hiding this comment.
Since setup.sh and these test cases depend on bash, they will fail with FileNotFoundError when run on Windows systems. It is recommended to skip this test class on Windows using pytest.mark.skipif.
| class TestCudaDriverToolkitMismatchMessage: | |
| @pytest.mark.skipif(sys.platform == "win32", reason="Bash tests are not supported on Windows") | |
| class TestCudaDriverToolkitMismatchMessage: |
| command -v nvidia-smi >/dev/null 2>&1 || return 0 | ||
| nvidia-smi 2>/dev/null \ | ||
| | sed -nE 's/.*CUDA( UMD)? Version:[[:space:]]*([0-9]+)\.([0-9]+).*/\2.\3/p' \ | ||
| | head -1 || true |
|
|
||
| def _run_bash(self, script, *, env = None): | ||
| proc = subprocess.run( | ||
| ["/bin/bash", "-c", script], |
There was a problem hiding this comment.
Hardcoding "/bin/bash" can cause test failures on systems where bash is located elsewhere (e.g., /usr/local/bin/bash on BSD systems, or in the PATH on Windows/macOS). Using "bash" is more portable as it resolves the executable from the system PATH.
| ["/bin/bash", "-c", script], | |
| ["bash", "-c", script], |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the CUDA toolkit compatibility check in both the PowerShell (setup.ps1) and Bash (setup.sh) setup scripts to allow toolkits with a compatible major version (toolkit major <= driver major) rather than strictly enforcing minor version limits. It also introduces helper functions for mismatch diagnostics, fallback logic to find alternative compatible toolkits, and comprehensive unit tests. The review feedback suggests optimizing the Bash script by replacing subshell invocations of sed with pure Bash parameter expansion when splitting multi-line command outputs, which avoids process creation overhead inside loops.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| _status="$(printf '%s\n' "$_check" | sed -n '1p')" | ||
| _version="$(printf '%s\n' "$_check" | sed -n '2p')" |
There was a problem hiding this comment.
Spawning subshells and invoking external sed processes twice for every candidate in the loop is inefficient. Since _check is guaranteed to contain two lines separated by a newline, you can use pure Bash parameter expansion to split the status and version. This is significantly faster and avoids any process creation overhead.
| _status="$(printf '%s\n' "$_check" | sed -n '1p')" | |
| _version="$(printf '%s\n' "$_check" | sed -n '2p')" | |
| _status="${_check%%$'\\n'*}"\n _version="${_check#*$'\\n'}" |
| NVCC_PATH="$(printf '%s\n' "$_ALT_NVCC_CHECK" | sed -n '1p')" | ||
| _NVCC_VER="$(printf '%s\n' "$_ALT_NVCC_CHECK" | sed -n '2p')" |
There was a problem hiding this comment.
Similarly, you can avoid spawning subshells and invoking sed twice here by using pure Bash parameter expansion to split the path and version from _ALT_NVCC_CHECK.
| NVCC_PATH="$(printf '%s\n' "$_ALT_NVCC_CHECK" | sed -n '1p')" | |
| _NVCC_VER="$(printf '%s\n' "$_ALT_NVCC_CHECK" | sed -n '2p')" | |
| NVCC_PATH="${_ALT_NVCC_CHECK%%$'\\n'*}"\n _NVCC_VER="${_ALT_NVCC_CHECK#*$'\\n'}" |
Staging mirror of unslothai#5993
Original PR: unslothai#5993
Author: mvanhorn
This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.
Original description
Summary
When
studio/setup.shruns on a host whose installed CUDA toolkit (for example 13.3) is newer than the GPU driver can run, the installer now names the real blocker: the toolkit is supported, but the current driver is too old to run it. Bothstudio/setup.shandstudio/setup.ps1emit the same guidance, pointing the user at a driver update or the prebuilt CUDA bundle.Why this matters
Issue unslothai#5917 (with duplicate feedback in unslothai#5918) reported that a CUDA 13.3 toolkit paired with an older NVIDIA driver made the Studio installer behave as though 13.3 was unsupported, when Unsloth does support 13.3 and the actual gap was the driver. PR unslothai#5826 already taught
setup.shto cope with fresh toolkits, but the diagnostic still could not distinguish "toolkit too new for the project" from "driver too old for this toolkit." This compares the driver-reported CUDA capability fromnvidia-smiagainst the installed toolkit version near the_nvcc_meets_llama_minimumcheck and surfaces the explanatory message only when the source build would otherwise abort, keeping the prebuilt-bundle path non-fatal.Testing
Covered by the new cases in
tests/studio/install/test_selection_logic.py: driver supports the toolkit (no message), driver too old (message names both versions), nonvidia-smipresent (branch skipped), and unparseablenvidia-smioutput (graceful fallback to the prior generic message). Full suite runs in CI.Fixes unslothai#5917
This PR tracks the moving review branch (pr-5993-head). Iteration fix commits land here directly. Review-added tests are in a separate PR.
Changed files:
.github/workflows/consolidated-tests-ci.yml.github/workflows/lint-ci.yml.github/workflows/lockfile-audit.yml.github/workflows/mlx-ci.yml.github/workflows/notebooks-ci.yml.github/workflows/release-desktop.yml.github/workflows/security-audit.yml.github/workflows/stale.yml.github/workflows/studio-api-smoke.yml.github/workflows/studio-backend-ci.yml.github/workflows/studio-frontend-ci.yml.github/workflows/studio-inference-smoke.yml.github/workflows/studio-load-orchestrator-ci.yml.github/workflows/studio-mac-api-smoke.yml.github/workflows/studio-mac-inference-smoke.yml.github/workflows/studio-mac-install-matrix.yml.github/workflows/studio-mac-ui-smoke.yml.github/workflows/studio-mac-update-smoke.yml.github/workflows/studio-tauri-smoke.yml.github/workflows/studio-ui-smoke.yml.github/workflows/studio-update-smoke.yml.github/workflows/studio-windows-api-smoke.yml.github/workflows/studio-windows-inference-smoke.yml.github/workflows/studio-windows-ui-smoke.yml.github/workflows/studio-windows-update-smoke.yml.github/workflows/version-compat-ci.yml.github/workflows/wheel-smoke.ymlstudio/setup.ps1studio/setup.shtests/studio/install/test_selection_logic.py