Skip to content

Add Unsloth Docker images (base + Studio) for any NVIDIA GPU host, Ampere through Blackwell#5748

Open
danielhanchen wants to merge 101 commits into
mainfrom
docker-blackwell-build
Open

Add Unsloth Docker images (base + Studio) for any NVIDIA GPU host, Ampere through Blackwell#5748
danielhanchen wants to merge 101 commits into
mainfrom
docker-blackwell-build

Conversation

@danielhanchen

@danielhanchen danielhanchen commented May 24, 2026

Copy link
Copy Markdown
Member

Summary

Adds a Docker setup for Unsloth that runs on any NVIDIA GPU host from Ampere through Blackwell (sm_80 through sm_120: A100, RTX 30/40, H100, B100/B200, RTX 50-series, RTX 6000 Pro Blackwell) and natively on aarch64 (GB10 / Grace, DGX Spark). Two images are published to docker.io/unsloth/unsloth:

  • Lean base image (docker/Dockerfile, tag :base): the full training stack -- torch 2.10.0+cu128, Unsloth + unsloth_zoo, TRL / PEFT / accelerate, bitsandbytes, triton, xformers (amd64), vLLM -- plus llama.cpp for GGUF tooling, JupyterLab, and the baked Unsloth notebooks. Run it headless for training, unsloth-run <notebook|url>, or jupyter lab.
  • Full image (docker/Dockerfile.studio, tag :latest): layers Unsloth Studio on top of the base image and runs the production service trio under supervisord -- Studio on 8000, JupyterLab on 8888, key-only sshd on 22 -- plus an optional Cloudflare tunnel for JupyterLab.

The build itself requires no GPU at all: it runs on free GitHub-hosted runners, on a developer laptop without an NVIDIA card, or on any datacenter GPU. All produce byte-identical images.

Multi-arch. amd64 and arm64 are built in parallel on native GitHub runners (ubuntu-latest + ubuntu-24.04-arm, both free on public repos since Aug 2025), pushed by digest, then merged into a single multi-platform manifest, so docker pull selects the right child automatically. Native arm64 is ~3x faster than QEMU and runs on DGX Spark / Grace with CUDA working as normal (no runtime emulation).

Why the build does not need a GPU

There are four places where a naive Docker build silently couples to the build-host GPU. The Dockerfile breaks each one:

1. Wheel selection. The README install line uv pip install unsloth --torch-backend=auto introspects the build host's driver. This Dockerfile pins torch==2.10.0 against --extra-index-url https://download.pytorch.org/whl/cu128 explicitly. No --torch-backend=auto, no install.sh.

2. Dep resolution order. Splitting installs into multiple pip install calls lets bitsandbytes 0.49.x's transitive cuda-toolkit==13.0.2 dep silently upgrade torch 2.10.0+cu128 -> 2.12.0+cu130 in a later pass, leaving cu128 xformers stranded. This Dockerfile collapses everything into a single uv pip install with --index-strategy unsafe-best-match so the resolver sees all constraints at once.

3. Build-time verification. torch.cuda.get_arch_list() returns [] when no GPU is visible. The Dockerfile uses the raw C++ accessor torch._C._cuda_getArchFlags(), which reads compiled wheel metadata directly ('sm_70 sm_75 sm_80 sm_86 sm_90 sm_100 sm_120' on amd64; 'sm_80 sm_90 sm_100 sm_120' on aarch64). Required packages are checked via importlib.metadata.version() instead of importing them, because import unsloth triggers torch.cuda.get_device_properties(0), which can't be satisfied on a GPU-less host. Import-time correctness is exercised at deploy time by smoke_test.py.

4. Compiled-kernel cache. If anything imports unsloth during the build, Triton JITs kernels keyed to the build host's compute capability and bakes them into unsloth_compiled_cache/. UNSLOTH_COMPILE_DISABLE=1 and UNSLOTH_COMPILE_OVERWRITE=0 prevent this. The deploy GPU produces its own cache on first use.

The underlying reason all of this works is that cu128 PyTorch wheels are already fat binaries, cross-compiled upstream to include SASS for every architecture from sm_70 through sm_120 (and sm_80/90/100/120 on aarch64). The build host's GPU was never needed for the binary content, only by code that pretended to need it.

What is in the images

Base image (:base)

  • Training stack pinned at torch 2.10.0+cu128 (held against the cu cascade), with TORCH_CUDA_ARCH_LIST="7.5;8.0;8.6;8.9;9.0;10.0;10.3;12.0+PTX" so any source build inside the container honours the full Ampere-through-Blackwell range. The +PTX suffix on the top arch gives forward-compat JIT-PTX for future consumer Blackwell SKUs.
  • vLLM and llama.cpp (prebuilt) for inference and GGUF tooling.
  • The curated notebook extras (JupyterLab, ipywidgets, audio/TTS codecs, etc.) pinned to tested versions for reproducible rebuilds.
  • unslothai/notebooks baked in as a read-only template and synced to /workspace/unsloth-notebooks on boot (edit-preserving refresh from GitHub when reachable).
  • Notebook ergonomics: a !pip/!uv shim that keeps the core GPU stack pinned while letting a notebook install its own extras, per-notebook transformers-version sidecars, and unsloth-run for headless execution of any notebook or URL.
  • arm64 swaps to the [huggingface] extra (no cu128 aarch64 xformers/vLLM wheel) and installs cuda-nvrtc/nvcc from NVIDIA's sbsa repo.

Full image (:latest)

  • Everything in the base image, plus Unsloth Studio.
  • supervisord runs Studio (8000), JupyterLab (8888), an optional Cloudflare tunnel for JupyterLab, and key-only sshd (22). The studio venv dedups its CUDA libraries against the base venv to keep the image size down.

arm64 / aarch64

A native arm64 child is published in the same manifest. CI covers it two ways: docker-build-arm64-native.yml builds + smoke-checks on the free ubuntu-24.04-arm runner (the path DGX Spark / Grace users take), and docker-build-arm64-qemu.yml cross-builds under QEMU as a fallback signal and to validate the documented setup_qemu.sh recipe.

Validation

Base image validated end-to-end on a B200 (sm_100) host. With CUDA_VISIBLE_DEVICES="" (simulating the GPU-less CI runner), the resolved stack is torch 2.10.0+cu128, triton, xformers (cu128 wheel from the PyTorch index), bitsandbytes, unsloth + unsloth_zoo, transformers, trl, peft, accelerate, with arch flags ['sm_70','sm_75','sm_80','sm_86','sm_90','sm_100','sm_120']. Runtime path on B200 GPU 0: smoke_test.py imports unsloth, loads Llama-3.2-1B-Instruct-bnb-4bit in 4-bit, and completes LoRA steps with loss decreasing. The full image was launched on the same host and Studio, JupyterLab, and sshd all came up under supervisord.

Files

Path Purpose
docker/Dockerfile multi-stage cu128 base image, no GPU required (amd64 + arm64)
docker/Dockerfile.studio full image: base + Studio + JupyterLab + sshd under supervisord
docker/entrypoint.sh, docker/supervisord.conf, docker/studio_launch.sh boot + service orchestration
docker/unsloth_sync_notebooks.sh, docker/unsloth_nb_compat.py, docker/unsloth_nb_content_sig.py baked-notebook sync + per-notebook transformers sidecars
docker/unsloth_pip_shim.py, docker/unsloth_run.py, docker/unsloth_ipython_startup.py !pip/!uv shim, headless runner, IPython startup hook
docker/unsloth_studio_update.sh, docker/unsloth_llama_update.sh, docker/unsloth_jupyter_tunnel.sh, docker/fetch_llama_prebuilt.py in-image updaters + llama.cpp fetch + Cloudflare tunnel
docker/run.sh, docker/build.sh, docker/test_locally.sh, docker/docker_confirm.sh, docker/docker_confirm.ps1, docker/setup_qemu.sh local build / run / cross-OS confirm helpers
docker/freeze.sh, docker/hf_pull.sh, docker/hf_push.sh, docker/smoke_test.py, docker/.dockerignore lockfile extract, HF helpers, runtime smoke test
.github/workflows/docker-publish.yml multi-arch build + push of both images to Docker Hub

Design notes

  • Base images: nvidia/cuda:12.8.1-cudnn-devel-ubuntu24.04 for the build stage, -cudnn-runtime-ubuntu24.04 for deploy. No nvcc in the published amd64 image.
  • A lockfile is emitted at /opt/unsloth-venv/requirements.lock.txt inside the image and can be extracted with docker/freeze.sh for a fully-pinned rebuild later.

Test plan

  • Set repo secrets DOCKERHUB_USERNAME and DOCKERHUB_TOKEN
  • Trigger the publish workflow (or merge to main) and confirm both images build on the native amd64 + arm64 runners in under ~90 min and merge into one manifest
  • Pull unsloth/unsloth:latest on an RTX 50-series host and confirm Studio (8000) + JupyterLab (8888) come up
  • Pull the same tag on a B200 host and confirm the image works with no rebuild
  • Pull on a DGX Spark / Grace (arm64) host and confirm the arm64 child runs natively
  • Optional: set vars.HAS_GPU_RUNNER=true once a self-hosted GPU runner is registered so the post-publish smoke test exercises real sm_120 paths

danielhanchen and others added 2 commits May 24, 2026 06:52
Adds a multi-stage Dockerfile producing an image that works on Ampere through
Blackwell (sm_80 through sm_120: A100, RTX 30/40, H100, B100/B200, RTX 50-series,
RTX 6000 Pro Blackwell). The build itself requires no GPU at all and runs on a
free GitHub-hosted ubuntu-latest runner.

How the GPU-less build works:

1. cu128 PyTorch wheels are fat binaries. torch._C._cuda_getArchFlags() returns
   'sm_70 sm_75 sm_80 sm_86 sm_90 sm_100 sm_120' regardless of which GPU
   compiled the image, because the wheels are cross-compiled upstream by the
   PyTorch team.

2. All deps resolve in a single uv pip install pass with explicit pins
   (torch==2.10.0, --extra-index-url cu128, no --torch-backend=auto, no
   install.sh). This prevents the silent cu cascade where bitsandbytes'
   transitive cuda-toolkit==13 dep upgrades torch to 2.12+cu130 in a later
   resolver pass, leaving xformers and other cu128 wheels stranded.

3. Build-time verification uses package metadata (importlib.metadata.version)
   and the raw torch._C._cuda_getArchFlags() accessor. We deliberately avoid
   import unsloth at build time because unsloth.__init__ calls
   torch.cuda.get_device_properties(0), which requires an actual CUDA device
   and is not bypassable. Import-time correctness is exercised at deploy time
   by smoke_test.py with --gpus all.

4. UNSLOTH_COMPILE_DISABLE=1 and CUDA_VISIBLE_DEVICES="" during the build stage
   prevent any code path from JIT-compiling kernels for the build host's
   compute capability and baking the resulting cache into the image. The
   deploy GPU produces its own cache on first use.

Other notes:

- --index-strategy unsafe-best-match is needed because the PyTorch wheel index
  serves an old requests==2.28.1 that conflicts with datasets>=2.32.2, which
  the default first-index-wins strategy rejects.
- Extra is cu128-ampere-torch2100 (ampere precedes the torch version in the
  pyproject ordering).
- No flash-attn in the base image. FA3 is hard-refused on Blackwell upstream
  and unsloth gracefully falls back to xformers + SDPA. Users on Ampere /
  Ada / Hopper who want FA2 can pip install flash-attn on top.
- Two stages: nvidia/cuda:12.8.1-cudnn-devel-ubuntu24.04 for the build,
  -cudnn-runtime for the deploy image. No nvcc in the published image.
- A lockfile is emitted at /opt/unsloth-venv/requirements.lock.txt inside
  the image and can be extracted with docker/freeze.sh for byte-identical
  rebuilds even after PyPI moves on.

CI workflow .github/workflows/docker-publish.yml:

- Builds on ubuntu-latest on every push to main, every tag, weekly via cron,
  and manually via workflow_dispatch. Pushes to docker.io/unsloth/unsloth
  with cache via type=gha.
- Optional smoke-test job runs on a self-hosted GPU runner if vars.HAS_GPU_RUNNER
  is set; skipped otherwise. End-to-end verification on sm_120 hardware is a
  nice-to-have, not a publish blocker.

Validation:

- Install path validated on a B200 host with CUDA_VISIBLE_DEVICES="" set
  (simulating the GPU-less CI runner): torch 2.10.0+cu128 holds, xformers
  0.0.34, bitsandbytes 0.49.2, triton 3.6.0, transformers 5.5.0, trl 0.24.0,
  peft 0.19.1, accelerate 1.13.0. Arch flags include sm_100 and sm_120.
- Runtime path validated end-to-end on B200: smoke_test.py imports unsloth,
  loads Llama-3.2-1B-Instruct-bnb-4bit in 4-bit, completes 5 LoRA steps with
  loss decreasing 4.11 -> 3.75. xformers fallback active as designed.

Files:

- docker/Dockerfile             multi-stage cu128 build
- docker/build.sh               local build wrapper
- docker/freeze.sh              extract lockfile from a built image
- docker/smoke_test.py          runtime verification, run with --gpus all
- docker/.dockerignore
- .github/workflows/docker-publish.yml
Comment thread .github/workflows/docker-publish.yml Fixed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c6d92160f6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/docker-publish.yml Outdated
Comment on lines +115 to +117
docker pull ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
docker run --rm --gpus all \
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Smoke-test the image built in this run, not latest

The smoke-test job always pulls :latest, but this workflow also runs on tag pushes where latest is not guaranteed to be among the tags produced by metadata-action (it is only enabled on the default branch in this workflow). In that case, the smoke test can validate an older image and miss regressions in the freshly built tag from the current run.

Useful? React with 👍 / 👎.

Comment thread .github/workflows/docker-publish.yml Outdated
Comment on lines +96 to +97
UNSLOTH_REF=${{ github.event.inputs.unsloth_ref || 'main' }}
UNSLOTH_ZOO_REF=${{ github.event.inputs.unsloth_zoo_ref || 'main' }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Build from triggering ref instead of hardcoding main

For non-workflow_dispatch events (including tag pushes), github.event.inputs.* is unset, so these build args always resolve to main. That means images produced for v* tags can contain unsloth and unsloth-zoo code from main rather than the release ref that triggered the run, which breaks release correctness and reproducibility.

Useful? React with 👍 / 👎.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Dockerized environment for Unsloth and unsloth-zoo, specifically optimized for NVIDIA Blackwell GPUs (sm_100 and sm_120). The changes include a multi-stage Dockerfile, build and freeze scripts, and a comprehensive smoke test to verify GPU compatibility and training functionality. Review feedback suggests optimizing the Dockerfile by removing a redundant installation of the uv tool, correcting a version mismatch for torchaudio to ensure consistency with the PyTorch stack, and relocating cache directories outside of the workspace to prevent issues when mounting host volumes at runtime.

Comment thread docker/Dockerfile Outdated
Comment on lines +99 to +101
RUN ${VENV}/bin/pip install uv \
&& ${VENV}/bin/uv pip install \
--python ${VENV}/bin/python \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The uv tool is already installed at the system level in line 63. Installing it again inside the virtual environment at line 99 is redundant. Using the system-wide uv binary to install packages into the venv is more efficient and avoids unnecessary layers.

RUN uv pip install \
        --python ${VENV}/bin/python \

Comment thread docker/Dockerfile Outdated
--python ${VENV}/bin/python \
--index-strategy unsafe-best-match \
--extra-index-url https://download.pytorch.org/whl/cu128 \
"torch==2.10.0" "torchvision==0.25.0" "torchaudio==2.11.0" \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a version mismatch for torchaudio. While torch is pinned to 2.10.0 and torchvision to 0.25.0 (which correctly follows the standard 0.(Y+15) mapping for Torch 2.10), torchaudio is set to 2.11.0. Typically, PyTorch and Torchaudio versions are released in sync (e.g., Torch 2.6.0 with Torchaudio 2.6.0). Using 2.10.0 ensures consistency across the stack.

        "torch==2.10.0" "torchvision==0.25.0" "torchaudio==2.10.0" \

Comment thread docker/Dockerfile
Comment on lines +177 to +178
HF_HOME=/workspace/.cache/huggingface \
TRITON_CACHE_DIR=/workspace/.cache/triton \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Setting HF_HOME and TRITON_CACHE_DIR to subdirectories of /workspace (the WORKDIR) can lead to issues when users mount a host directory to /workspace at runtime. The mount will obscure the directories created during the build, forcing the application to recreate them at runtime, which can cause permission issues or redundant downloads. Moving these caches to a location outside of the workspace, such as /opt/cache, avoids these issues.

    HF_HOME=/opt/cache/huggingface \
    TRITON_CACHE_DIR=/opt/cache/triton \

Comment thread docker/Dockerfile
COPY --from=builder /opt/unsloth-venv /opt/unsloth-venv

WORKDIR /workspace
RUN mkdir -p ${HF_HOME} ${TRITON_CACHE_DIR}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When moving the cache directories outside of /workspace, creating them with broad permissions (e.g., 777) ensures that non-root users can write to the cache at runtime without encountering permission errors.

RUN mkdir -p ${HF_HOME} ${TRITON_CACHE_DIR} && chmod -R 777 /opt/cache

When someone launches the unsloth container, the common failure modes are not
unsloth bugs -- they're Docker / nvidia-container-toolkit / driver issues that
surface as cryptic CUDA errors deep in torch. The entrypoint catches the three
that cover ~95% of "it doesn't work" reports up front:

1. nvidia-smi inside the container sees no GPU
   -> user forgot --gpus all, or host is missing nvidia-container-toolkit
   -> entrypoint prints the exact docker run flag and the toolkit install URL
2. nvidia-smi works but torch.cuda.is_available() is False
   -> host driver is older than CUDA 12.8 supports
   -> entrypoint prints the minimum driver version per architecture
3. compute capability < sm_80
   -> entrypoint prints the supported architecture table and exits

Each check fails with a clear, actionable message rather than a stack trace.
Set UNSLOTH_SKIP_GPU_CHECK=1 to bypass (for docs builds, offline tooling, CI).

run.sh wraps `docker run` with the flags people most often forget:
  --gpus all           (without it, the new entrypoint refuses to start)
  --ipc=host           (DataLoader workers need >64MB shm)
  --ulimit memlock=-1  (NCCL + CUDA pinned host buffers)
  --ulimit stack=64MB  (some torch kernels OOM the default 8MB stack)

Plus it mounts the host HF cache + Triton JIT cache so model downloads and
compiled kernels persist across container runs, and forwards HF_TOKEN /
WANDB_API_KEY / UNSLOTH_LICENSE only when they are set on the host.

Usage:
  bash docker/run.sh                                  # interactive python REPL
  bash docker/run.sh bash                             # shell in container
  bash docker/run.sh python /workspace/smoke_test.py
  bash docker/run.sh python /workspace/host/train.py  # $PWD mounted at /workspace/host

Verified locally:
- No GPU visible: entrypoint refuses with driver-version message, exit 1
- B200 sm_100 visible: entrypoint prints GPU banner, exits cleanly into the
  user command (rc=0)

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 58693c4c73

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docker/run.sh Outdated
Comment on lines +55 to +59
[[ -n "${HF_TOKEN:-}" ]] && ENV_FORWARD+=(-e "HF_TOKEN=${HF_TOKEN}")
[[ -n "${WANDB_API_KEY:-}" ]] && ENV_FORWARD+=(-e "WANDB_API_KEY=${WANDB_API_KEY}")
[[ -n "${UNSLOTH_LICENSE:-}" ]] && ENV_FORWARD+=(-e "UNSLOTH_LICENSE=${UNSLOTH_LICENSE}")

set -x

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove xtrace before invoking docker run with secrets

This script conditionally forwards HF_TOKEN, WANDB_API_KEY, and UNSLOTH_LICENSE, then enables set -x right before docker run, which prints the fully expanded command line. In any environment where those variables are set (local terminals with history/log capture or CI logs), their raw values are exposed in plaintext, creating an avoidable credential leak.

Useful? React with 👍 / 👎.

Comment thread docker/smoke_test.py
Comment on lines +61 to +63
import transformers

print(f"transformers {transformers.__version__}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Import unsloth before transformers in smoke test

The default execution path runs training (check_tiny_train) but check_imports imports transformers first, which contradicts the later requirement that Unsloth be imported first for patched training behavior. This means the smoke test is not validating the intended Unsloth training path and can produce misleading pass/fail results (e.g., patches not applied or warnings/fallback paths during the LoRA step).

Useful? React with 👍 / 👎.

Single bash script that runs the full validation flow against the image:

  1. Host pre-flight: docker version, nvidia-smi, nvidia-container-toolkit
     runtime registered with docker.
  2. Build the image (auto-detects the build context -- current dir,
     docker/ subdir, or clones the docker-blackwell-build branch into
     /tmp/unsloth-pr/).
  3a. Smoke test: 5-step LoRA on Llama-3.2-1B-Instruct-bnb-4bit.
  3b. Real workload: gpt-oss-20B fine-tuning notebook from
      unslothai/notebooks, patched to max_steps=10, with the three
      pre-train demo generations dropped for brevity. Auto-installs
      triton_kernels at the SHA the upstream notebook pins for MXFP4.

All output is teed to /tmp/unsloth-docker-test/ (or --log-dir).

Usage:
  bash docker/test_locally.sh                  # full run, ~15 min
  bash docker/test_locally.sh --skip-notebook  # blocks 1-3a only, ~3 min
  bash docker/test_locally.sh --skip-build     # reuse existing TAG
  TAG=my:tag HF_TOKEN=hf_xxx bash docker/test_locally.sh

Each block fails fast with the exact log path to paste back.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: acbb16c8a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docker/Dockerfile Outdated
--python ${VENV}/bin/python \
--index-strategy unsafe-best-match \
--extra-index-url https://download.pytorch.org/whl/cu128 \
"torch==2.10.0" "torchvision==0.25.0" "torchaudio==2.11.0" \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Align torchaudio pin with torch 2.10 stack

The install set mixes torch==2.10.0 with torchaudio==2.11.0, but TorchAudio wheels are release-coupled to specific Torch versions; this can make dependency resolution fail or force an unexpected Torch change, which breaks the Docker build’s stated guarantee that Torch stays on 2.10.0. This is especially risky here because the same layer also installs many transitive deps from multiple indexes, so one incompatible pin can fail the image build on CI.

Useful? React with 👍 / 👎.

Comment thread docker/entrypoint.sh Outdated
Comment on lines +28 to +31
if ! command -v nvidia-smi >/dev/null 2>&1; then
err "nvidia-smi not found inside the container."
err "The CUDA runtime in this image is broken. Re-pull the image."
exit 1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid hard-failing when nvidia-smi binary is unavailable

Startup currently exits before any CUDA check if nvidia-smi is missing, but there are valid GPU runtimes (for example compute-only capability profiles) where CUDA is usable while nvidia-smi/NVML tools are not mounted. In those environments the container will refuse to start even though torch.cuda.is_available() could succeed, causing false negatives in production deployments that intentionally limit driver capabilities.

Useful? React with 👍 / 👎.

The Dockerfile uses BuildKit-only features (the # syntax=docker/dockerfile:1.7
parser directive and RUN ... <<'PY' heredocs added in dockerfile 1.3+). The
legacy builder rejects the --progress flag at the CLI level and would fail
later at the heredocs anyway.

Detect docker buildx and use it when available (preserves --progress=plain
output). Otherwise fall back to plain `docker build` with DOCKER_BUILDKIT=1
exported, which gets the BuildKit features without buildx's nicer formatting.

Reproduces the failure path seen on Docker 28.2.2 without buildx installed:
  unknown flag: --progress
  ERROR docker build exited 125
Docker 28 removed the legacy image builder entirely. Setting
DOCKER_BUILDKIT=1 no longer falls back to a builtin builder -- it
delegates to buildx, which then errors out if buildx isn't installed:

  ERROR: BuildKit is enabled but the buildx component is missing
         or broken.

The Ubuntu docker.io package omits buildx by default, so users on
that path hit this immediately. Detect missing buildx up front and
print exact install commands for apt / dnf / manual binary instead
of attempting a fallback that cannot work.
If the user is not in the 'docker' group, every docker command after the
pre-flight returns "permission denied while trying to connect to the Docker
daemon socket at /var/run/docker.sock". This used to surface as a confusing
buildx failure mid-Block-2, but the actual problem is a host permissions
issue that's settable up front.

Detect by running 'docker info' and checking its exit code (not just grep
on its output -- a permission failure prints to stderr and returns non-zero,
so the old grep-based check was a silent skip).

Also clarify the nvidia-runtime WARN: on Docker 28+ with CDI mode this is
a false positive most of the time. The real GPU-attach test is the smoke
run in Block 3a, where the container entrypoint catches missing GPUs with
an actionable message.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 23a5b43180

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docker/test_locally.sh
cd /workspace/host

echo "=== install triton_kernels (MXFP4 support for unsloth/gpt-oss-20b) ==="
pip install -q 'git+https://github.com/triton-lang/triton.git@0add68262ab0a2e33b84524346cb27cbb2787356#subdirectory=python/triton_kernels' 2>&1 | tail -5

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate pip install failures through the tail pipeline

The generated run_notebook.sh script runs with set -e but not pipefail, and this line pipes pip install into tail -5; in Bash that means the pipeline exits with tail's status, so a failed pip install can be treated as success and the script continues until later steps fail with misleading errors (for example missing triton_kernels imports). This causes false-positive notebook validation and makes debugging CI/local failures much harder whenever installation fails (network hiccups, dependency conflicts, or bad commit hash).

Useful? React with 👍 / 👎.

Ubuntu 24.04 (noble) marks the system Python interpreter as
externally-managed per PEP 668, so:

  curl get-pip.py | python
  python -m pip install -U pip uv

fails inside the builder image with:

  error: externally-managed-environment
  This environment is externally managed

The system-level pip and uv were never used: the very next RUN creates
the venv at /opt/unsloth-venv, which bootstraps its own pip via the
ensurepip module (provided by the python3.12-venv apt package). uv is
then installed INTO the venv with the venv's pip, and used from there.

Drop the two system-pip bootstrap lines. The venv path is unchanged.

Reproduces on any Docker build of the unsloth-blackwell image against
a noble base image (which our nvidia/cuda:12.8.1-cudnn-devel-ubuntu24.04
is).
… / peft

unsloth_zoo/__init__.py guards against being imported standalone:

  if "UNSLOTH_IS_PRESENT" not in os.environ:
      raise ImportError("Please install Unsloth via `pip install unsloth`!")

The env var is set by unsloth/__init__.py at import time, so importing
unsloth must happen first. The old check_imports() imported xformers,
bnb, transformers, trl, peft, then unsloth_zoo -- which fired the guard
because unsloth had not been imported yet.

Reorder check_imports() to import unsloth (and unsloth_zoo) first, then
the rest. check_unsloth_import() becomes a thin re-import to keep the
"FastLanguageModel reachable" banner in the output.

Same fix the unsloth README has been recommending for years: "import
unsloth at the top of your file, before transformers/trl/peft."
Triton's nvidia backend lazily JIT-compiles a small C extension
(CudaUtils, in triton/backends/nvidia/driver.py) on first GPU access.
Without a C compiler and Python headers in the runtime image, the
very first forward pass of any Unsloth model dies with:

  RuntimeError: Failed to find C compiler.
                Please specify via CC environment variable.

The builder stage has build-essential and python3.12-dev so this
worked during the build's verification step (no GPU = no Triton kernel
call = no C extension build). But the runtime stage stripped those
out for size, so the failure only surfaces when a real user runs
training inside the container.

Add gcc + g++ + python3.12-dev to the runtime stage. Increases the
runtime image by ~250MB, which is the cost of letting Triton JIT
correctly. Pre-compiling CudaUtils at build time would need a real
CUDA device (the constructor calls cuda runtime functions), so
shipping the toolchain is the right trade-off.
@danielhanchen

Copy link
Copy Markdown
Member Author

Smoke-test validation on a fresh deploy host (AWS B200, not the build host)

End-to-end validated docker/test_locally.sh --skip-notebook on an AWS EC2 instance with 8x B200, driver 590.48.01, Docker 28.2.2 (Ubuntu docker.io + docker-buildx 0.30.1). The build host was a separate GCP B200 — so this confirms cross-host reproducibility.

What was validated

  • Build: builder + runtime stages completed cleanly
  • Built-in arch check: arches: ['sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90', 'sm_100', 'sm_120']
  • Entrypoint pre-flight: Unsloth container: 8 GPU(s). Primary: NVIDIA B200 sm_100 bf16=True
  • Imports: unsloth → unsloth_zoo → xformers → bnb → transformers → trl → peft (no order violations)
  • Model load: unsloth/Llama-3.2-1B-Instruct-bnb-4bit loaded in 4-bit, 16 QKV + 16 O LoRA layers patched
  • Training: 5 LoRA steps completed, loss decreased monotonically

Unsloth's own banner inside the container:

NVIDIA B200. Num GPUs = 8. Max memory: 178.353 GB. Platform: Linux.
Torch: 2.10.0+cu128. CUDA: 10.0. CUDA Toolkit: 12.8. Triton: 3.6.0
Bfloat16 = TRUE. FA [Xformers = 0.0.34. FA2 = False]

Loss progression

Bit-for-bit identical to the internal validation on a GCP B200 (different host, same image):

step 0  loss=4.1108
step 1  loss=4.0617
step 2  loss=3.9786
step 3  loss=3.8695
step 4  loss=3.7511

Bugs caught and fixed during validation

Each was a real defect surfaced only by running the image on a fresh host:

  1. `docker build --progress` rejected by the legacy builder → require buildx (c6d9216, 56d2701)
  2. Pre-flight missed `permission denied` on docker socket → check `docker info` exit code, print `usermod -aG docker` recipe (23a5b43)
  3. PEP 668 rejected the system-wide `pip install -U pip uv` bootstrap → dropped, the venv self-bootstraps via ensurepip (fd55ed0)
  4. `smoke_test.py` imported `unsloth_zoo` before `unsloth` → reorder (00cbc82)
  5. Triton's lazily-built CudaUtils C extension needed gcc + python3-dev at runtime → add to runtime stage (1cdc5f1)

Full gpt-oss-20B fine-tuning notebook run still pending; will post follow-up.

…ia.com/cuda/gpus

TORCH_CUDA_ARCH_LIST now covers the full set of compute capabilities
NVIDIA publishes on https://developer.nvidia.com/cuda/gpus for x86_64
hardware, from Turing onward:

  sm_75    Turing       T4, RTX 20-series, Quadro RTX
  sm_80    Ampere DC    A100, A30
  sm_86    Ampere       A40, RTX A6000, RTX 30-series
  sm_89    Ada          L4, L40, L40S, RTX 40-series
  sm_90    Hopper       H100, H200, GH200
  sm_100   Blackwell DC B100, B200, GB200
  sm_103   Blackwell DC B300, GB300
  sm_120   Blackwell    RTX 50-series, RTX PRO 6000 Blackwell
  sm_121   Blackwell    GB10 (DGX Spark)

with +PTX on the highest entry so future arch revisions can JIT.

Setting TORCH_CUDA_ARCH_LIST only affects nvcc invocations for any
source build the user adds on top of this image (e.g. flash-attn, a
custom CUDA op). The prebuilt cu128 wheels already include SASS for
sm_70/75/80/86/90/100/120 (verified at build time via
torch._C._cuda_getArchFlags()). Ada (sm_89), B300 (sm_103) and DGX
Spark (sm_121) GPUs run via JIT-PTX from the nearest available arch.

Jetson archs (sm_87 Orin, sm_110 Thor) are intentionally NOT included
-- they require aarch64 wheels and this image is linux/amd64 only.

Also lower the entrypoint's compute-capability gate from sm_80 to
sm_75. Turing GPUs work, with the caveat that bfloat16 is unavailable;
the entrypoint prints a NOTE in that case so Unsloth's fp16 fallback
isn't a surprise.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dde5170e7a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docker/entrypoint.sh
("sm_120", "Blackwell", "RTX 50-series, RTX PRO 6000 Blackwell"),
("sm_121", "Blackwell", "GB10 (DGX Spark)"),
)
if major < 7 or (major == 7 and minor < 5):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject pre-Ampere GPUs in entrypoint check

The entrypoint claims to fail fast on GPUs older than Ampere, but this condition only rejects devices below sm_75, so Turing (sm_75) is allowed through. That contradicts both the script’s own header (Unsloth requires sm_80+) and smoke_test.py, which exits on any cap[0] < 8; on T4/RTX20 hosts the container starts successfully and then fails later during validation/workloads instead of giving the intended immediate, actionable error.

Useful? React with 👍 / 👎.

HF Hub does not act as a generic OCI registry for arbitrary Docker
images -- the registry.hf.space endpoint only serves images that
Spaces have built, not images pushed by `docker push`. So we cannot
do `docker push huggingface.co/user/repo:tag` for an Unsloth image.

For cross-host testing where we want one canonical place to pull
from (and Docker Hub credentials are not yet configured), wrap the
manual flow into push/pull-shaped commands:

  hf_push.sh: docker save | pigz | huggingface-cli upload
  hf_pull.sh: huggingface-cli download | gunzip | docker load

This is approximation, not real OCI semantics -- every push uploads
the full ~4 GB blob, no layer dedup, no manifest negotiation. Good
for testing across A100 / H100 / RTX 6000 boxes; the real release
should go through .github/workflows/docker-publish.yml to Docker Hub,
which gets layer dedup, multi-arch manifest support, and standard
`docker pull` UX for users.

Usage:
  bash docker/hf_push.sh unsloth-blackwell:test danielhanchen/unsloth-blackwell-docker
  bash docker/hf_pull.sh danielhanchen/unsloth-blackwell-docker unsloth-blackwell-test.tar.gz unsloth-blackwell:test

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4bfb4b891a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/docker-publish.yml Outdated
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=raw,value=latest,enable={{is_default_branch}}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent workflow_dispatch builds from overwriting latest

This tag rule publishes latest whenever the run is on the default branch, but workflow_dispatch also allows overriding UNSLOTH_REF/UNSLOTH_ZOO_REF to arbitrary refs. A manual test run from main can therefore push a non-release image as latest, which makes downstream users pull an unintended build. Gate latest to trusted release flows (or only when both refs are the defaults) to avoid accidental retagging.

Useful? React with 👍 / 👎.

Comment thread docker/hf_pull.sh
set -euo pipefail

REPO="${1:?usage: hf_pull.sh <hf_repo> [<blob>] [<verify_tag>]}"
BLOB="${2:-unsloth-blackwell.tar.gz}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match hf_pull default blob name to push naming

hf_push.sh uploads archives as <image-name>-<tag>.tar.gz, but hf_pull.sh defaults to unsloth-blackwell.tar.gz. If users run the documented short form bash docker/hf_pull.sh <hf_repo> after a normal push, the download will target a filename that was never uploaded and fail. Keeping both scripts on the same default naming convention avoids this broken default path.

Useful? React with 👍 / 👎.

…face-cli`

In huggingface_hub >= 0.27 the `huggingface-cli` binary is deprecated
and prints a "Use hf instead" notice then exits without doing the
operation. The previous wrappers ran `huggingface-cli upload/download`
silently, treated the deprecation exit as success, and uploaded
nothing.

Detect the new `hf` binary first and use that. If only the legacy
`huggingface-cli` is on PATH (older installs), fall back with a WARN
so users know the failure mode if anything goes sideways.

Also: hf_pull.sh now asserts the downloaded file is non-empty
(`test -s`) so we catch silent download failures before the
`docker load` step.
…t call

`jupyter nbconvert --to script nb.ipynb --output nb 2>/dev/null` was
silently exiting 0 without producing the output file in some
environments (likely because jupyter/jupyter_core wasn't on PATH or
nbconvert's --output handling differed across versions). The 2>/dev/null
hid the underlying error, and `set -e` did not catch the missing-output
case because nbconvert itself returned 0.

Switch to a direct nbformat-based conversion:

  pip install -q nbformat
  python -c "import nbformat; nb=nbformat.read('nb.ipynb', as_version=4);
             code='\n\n'.join(c.source for c in nb.cells if c.cell_type == 'code')
             open('nb.py','w').write(code + '\n')"

Smaller dep set, no shell-out to a jupyter wrapper script, and an
explicit `test -s nb.py` afterwards catches any silent failure
before downstream steps try to read the file.

Reproduces the failure on RTX PRO 6000 Blackwell (sm_120, docker
29.2.1, ubuntu 24.04) where nbconvert's CLI silently no-op'd.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8344fa0a56

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docker/Dockerfile Outdated
TRITON_CACHE_DIR=/workspace/.cache/triton \
# Keep the arch list visible at runtime in case the user source-builds anything
# extra inside the container (e.g. a custom CUDA op).
TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0;10.0;12.0+PTX"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include sm_75 in runtime arch list

The runtime image advertises support from Turing onward (see entrypoint.sh supported list), but TORCH_CUDA_ARCH_LIST here starts at 8.0. Any CUDA extension compiled inside the running container (the exact use case this env var comment describes) will be built without sm_75, so on T4/RTX20 hosts those kernels can fail at runtime with “no kernel image is available” despite the container claiming that architecture is supported.

Useful? React with 👍 / 👎.

The previous nbformat-based conversion dumped raw cell.source for every
code cell. The gpt-oss-20B notebook's first cell uses Jupyter !shell
magic to install dependencies:

  !pip install --upgrade -qqq uv
  !uv pip install -qqq ... \
  git+https://github.com/triton-lang/triton.git@0add68... ...

Dumped verbatim, the `@0add68...` token tripped the Python parser with
"SyntaxError: invalid decimal literal" before training could even start.

The container already has unsloth, triton, transformers, etc. baked in,
so we don't need the notebook's install cell. Skip any cell whose source
contains pip/install markers, and comment out stray !cmd / %magic lines
in any other cells. Then assert nb.py parses with ast.parse() before
trying to run it -- catches conversion failures up front instead of at
training time.

Reproduces on RTX PRO 6000 Blackwell (sm_120, fresh Docker 29.2.1
host) where the previous conversion produced an invalid nb.py.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 391532c031

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docker/test_locally.sh Outdated
Comment on lines +122 to +123
git clone --depth 1 -b docker-blackwell-build \
https://github.com/unslothai/unsloth.git /tmp/unsloth-pr 2>&1 | tail -3

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clone stable ref in fallback build-context path

When this script is run outside the repo tree, the fallback path clones a hardcoded docker-blackwell-build branch. That branch name is PR-specific and can disappear after merge, so the fallback clone will fail and Block 2 cannot build at all. This breaks the script’s advertised “clone if needed” flow for users validating from a clean host; use a stable default ref (or a configurable ref input) instead of a transient PR branch.

Useful? React with 👍 / 👎.

@danielhanchen

Copy link
Copy Markdown
Member Author

Cross-host validation #2: sm_120 (RTX PRO 6000 Blackwell, 96GB) — full gpt-oss-20B fine-tuning

End-to-end validated on a fresh GCP RTX PRO 6000 Blackwell Server Edition host. Image pulled from HF (huggingface.co/danielhanchen/unsloth-blackwell-docker), loaded into Docker, and exercised with both the smoke test AND the full gpt-oss-20B fine-tuning notebook.

Setup

  • Host: GCP, NVIDIA RTX PRO 6000 Blackwell Server Edition (sm_120), 97887 MiB, driver 580.126.09
  • Docker: 29.2.1, buildx 0.30.1
  • Image: identical `unsloth-blackwell:test` built on the AWS B200, distributed via HF Hub

Smoke test (5-step LoRA on Llama-3.2-1B-bnb-4bit)

```
Unsloth container: 1 GPU(s). Primary: NVIDIA RTX PRO 6000 Blackwell Server Edition sm_120 bf16=True
arches: ['sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90', 'sm_100', 'sm_120']
step 0 loss=4.0946
step 1 loss=4.0556
step 2 loss=3.9781
step 3 loss=3.8706
step 4 loss=3.7453
=== all checks passed ===
```

Loss is ~0.4% off the B200 (sm_100) run (4.1108 → 3.7511); expected because sm_100 vs sm_120 Triton kernels produce slightly different bf16 rounding paths, deterministically.

Full gpt-oss-20B fine-tuning (10 LoRA steps, MXFP4 + MoE)

10 SFT steps on `HuggingFaceH4/Multilingual-Thinking` with the gpt-oss-20B MXFP4 model. Real workload, real MoE expert LoRA, real Harmony format inference at multiple reasoning_effort levels:

```
==((====))== Unsloth 2026.5.6: Fast Gpt_Oss patching. Transformers: 5.5.0.
\\ /| NVIDIA RTX PRO 6000 Blackwell Server Edition. Num GPUs = 1.
Max memory: 94.971 GB. Platform: Linux.
O^O/ \/ \ Torch: 2.10.0+cu128. CUDA: 12.0. CUDA Toolkit: 12.8. Triton: 3.6.0
\ / Bfloat16 = TRUE. FA [Xformers = 0.0.34. FA2 = False]
"-
___-"

Unsloth: Detected MoE model with num_experts = 32 and target_modules = [...].
Enabling LoRA on MoE parameters: ['mlp.experts.gate_up_proj', 'mlp.experts.down_proj']

Trainable parameters = 3,981,312 of 20,918,738,496 (0.02% trained)
Num examples = 933 | Num Epochs = 1 | Total steps = 10
Batch size per device = 1 | Gradient accumulation steps = 4

step 1: loss=1.071 grad_norm=2.805
step 2: loss=1.633 grad_norm=3.155
step 3: loss=1.053 grad_norm=2.815
step 4: loss=0.835 grad_norm=2.14
step 5: loss=1.363 grad_norm=2.239
step 6: loss=0.936 grad_norm=1.645
step 7: loss=0.958 grad_norm=1.641
step 8: loss=1.203 grad_norm=1.843
step 9: loss=1.296 grad_norm=2.013
step 10: loss=0.952 grad_norm=1.679

train_runtime = 141.7s (2.36 minutes)
peak reserved memory = 12.625 GB / 94.971 GB (13.3%)
```

Post-train inference at `reasoning_effort="medium"` and `"high"` produced coherent French reasoning output via the Harmony format -- confirming MXFP4 weights are loading correctly, MoE expert routing works, and the Triton kernels JIT-compile for sm_120 at first use.

What this validates

  • cu128 wheel's sm_120 SASS works on real sm_120 hardware
  • xformers + bnb + triton fat binaries run cross-arch (built for the same image that worked on B200 sm_100)
  • MXFP4 quantization (`unsloth/gpt-oss-20b`, no bnb-4bit variant) works inside the container
  • MoE expert LoRA targeting (`mlp.experts.gate_up_proj`, `mlp.experts.down_proj`) works
  • The whole `docker save → HF Hub → docker pull → docker load → docker run` flow works for a 13 GB image
  • Triton 3.6.0 + sm_120 first-run JIT cost is acceptable (the 1st training step is slow at 105s as Triton compiles; subsequent steps drop to ~3-7s/step)

Additional bugs caught and fixed during this validation

  • `huggingface-cli` is deprecated in huggingface_hub >= 0.27, silently exits without doing the upload/download. Switched `hf_{push,pull}.sh` to the new `hf` CLI (7354642).
  • `jupyter nbconvert` was silently failing in the container. Replaced with direct `nbformat`-based conversion (8344fa0).
  • `nbformat` conversion dumped raw `!pip install` shell magic from notebook install cells, breaking Python parse. Added install-cell skip + `ast.parse()` assertion (391532c).

Make the docker image multi-arch so DGX Spark (GB10, sm_121, aarch64) and
the Grace-Hopper / Grace-Blackwell SoCs (GH200 arm64, GB200 arm64) pull a
natively-built arm64 child from the same manifest. Runtime emulation is
NOT involved -- QEMU is used only for the cross-compile step on x86_64
CI runners; consumers on aarch64 hosts get a normal arm64 image and CUDA
works as on any other host.

Dockerfile:
  * ARG TARGETARCH; switch unsloth extras between cu128-ampere-torch2100
    (amd64, with xformers) and huggingface (arm64, no xformers -- there
    is no cu128 aarch64 xformers wheel as of 0.0.34, so we fall back to
    Unsloth's native SDPA path; ~5-10% slowdown but functionally complete).
  * Build-time torch._C._cuda_getArchFlags() assertion: amd64 still
    requires sm_120, arm64 accepts sm_120 or sm_121.
  * Same TORCH_CUDA_ARCH_LIST on both arches; nvcc emits whatever's listed.

docker/setup_qemu.sh (new):
  One-time host setup -- registers binfmt_misc handlers via
  tonistiigi/binfmt and creates a 'unsloth-multiarch' docker-container
  buildx builder. Required only on x86_64 build hosts targeting arm64.

docker/test_locally.sh:
  --platform amd64|arm64 flag. Cross-builds verify QEMU is registered,
  then build through the in-image arch-flags assertion. Smoke + notebook
  blocks auto-skip when image arch != host arch (CUDA cannot run under
  user-space QEMU + nvidia-container-toolkit cannot bridge a QEMU guest
  to a real GPU).

.github/workflows/docker-publish.yml:
  platforms: linux/amd64,linux/arm64 (single manifest, two children).
  Timeout bumped 60 -> 150 min for the slower arm64-under-QEMU leg.
  docker/setup-qemu-action@v3 with platforms: arm64 (was implicit before).
@danielhanchen

Copy link
Copy Markdown
Member Author

DGX Spark / linux/arm64 support added via QEMU at build time (e7cfcea).

The image is now multi-arch: one Docker manifest with linux/amd64 + linux/arm64 children. docker pull unsloth/unsloth:latest on x86_64 hosts gets the amd64 layer; on DGX Spark / Grace / Grace-Hopper it gets the arm64 layer natively, with normal CUDA access. QEMU is only used at build time on the x86_64 CI runner -- never at runtime, where it would break CUDA.

What changed:

  • docker/Dockerfile: ARG TARGETARCH switches the unsloth extras between cu128-ampere-torch2100 (amd64, with xformers) and huggingface (arm64, falls back to Unsloth's native SDPA path -- xformers does not yet publish a cu128 aarch64 wheel as of 0.0.34). Build-time torch._C._cuda_getArchFlags() assertion now accepts sm_120 or sm_121 on arm64.
  • docker/setup_qemu.sh: one-time host setup that registers binfmt_misc handlers via tonistiigi/binfmt and creates a multi-arch buildx builder. Only needed when building arm64 on an x86_64 host.
  • docker/test_locally.sh: --platform amd64|arm64 flag. When the target arch differs from the host arch, the smoke test and notebook blocks auto-skip with a clear warning (CUDA does not work under QEMU runtime emulation and nvidia-container-toolkit cannot bridge a QEMU guest to a real GPU). The build itself still runs through the arch-flags assertion, so PRs get build-time validation of the arm64 leg even on amd64 runners.
  • .github/workflows/docker-publish.yml: platforms: linux/amd64,linux/arm64 (single manifest). Timeout bumped 60 -> 150 min for the slower arm64-under-QEMU leg.

Arm64 GPU SoCs supported by the new variant:

Compute Cap SoC Examples
sm_90 Grace-Hopper GH200
sm_100 Grace-Blackwell GB200
sm_121 Blackwell + Grace GB10 (DGX Spark)

End-to-end arm64 validation on actual hardware is pending -- the build-time assertion exercises the wheel resolution and the cu128 aarch64 fat binary, but the final proof is docker run --gpus all on a real DGX Spark / GB200. That can run once we have access; the published manifest will already include the arm64 child for early users to test.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7cfceadab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docker/smoke_test.py Outdated
import unsloth_zoo

print(f"unsloth_zoo {unsloth_zoo.__version__}")
import xformers

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate xformers import by platform in smoke test

The arm64 image path intentionally omits xformers (UNSLOTH_EXTRA="huggingface" in docker/Dockerfile) because no cu128 aarch64 wheel is expected, but check_imports() unconditionally does import xformers here. On arm64 runs this makes /workspace/smoke_test.py fail before training, so the published arm64 variant cannot pass the repository’s own runtime validation despite being a supported target.

Useful? React with 👍 / 👎.

Comment thread docker/smoke_test.py
print(f"cuda build {torch.version.cuda}")
print(f"arches {arches}")
assert "sm_100" in arches, f"sm_100 missing: {arches}"
assert "sm_120" in arches, f"sm_120 missing: {arches}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept sm_121 in smoke-test arch validation

This assertion hard-requires sm_120, but the same commit’s Dockerfile build validation explicitly treats arm64 Blackwell as valid when either sm_120 or sm_121 is present. As written, a valid arm64 build that reports only sm_121 will fail the smoke test with a false negative, even though the image is intended to support GB10/DGX Spark.

Useful? React with 👍 / 👎.

…st tag, arm64 decord

- pip shim: count editable/local/url/vcs targets (-e ., ., git+https, wheel
  URLs) as install targets, not just canonical package names, so they are no
  longer silently skipped inside notebooks
- notebook sync: never overwrite a pre-existing user notebook on first boot
  (match the refresh path's ownership rule); skip .unsloth_sync_state.tmp when
  recording state so it is not tracked as a managed file
- docker-publish: set flavor latest=false on the base image metadata so a v*
  tag push cannot publish :latest from the base image (the Studio image owns it)
- notebook deps: pin to tested versions and install decord on its own, hard on
  amd64 and fail-soft on arm64 (no aarch64 wheel) so the arm64 base build works
The base build-args never passed LLAMA_PREBUILT_TAG, so the Dockerfile fell back
to latest and each matrix leg resolved whatever unslothai/llama.cpp release was
current at its own build time. If latest moved between the amd64 and arm64 legs,
one published manifest could carry different GGUF binaries per arch.

Resolve the release once in a new prepare job (explicit llama_prebuilt_tag
dispatch input for a frozen build, else follow the /releases/latest redirect to a
concrete tag, mirroring docker/build.sh) and pass that single tag to both legs.
@danielhanchen

Copy link
Copy Markdown
Member Author

Thanks, all four addressed.

Gate decord on amd64 builds (8f693c6): decord now installs on its own line, hard on amd64 and fail-soft on arm64 (no aarch64 wheel, no sdist), so the arm64 base build no longer dies in that layer.

Invoke pip for requirement-file installs (8f693c6): the final guard now counts any non-flag token as an install target (has_install_target), so -e ., ., git+https://..., a wheel URL, and -r reqs.txt all run pip instead of being silently skipped.

Disable auto latest for tag refs (8f693c6): flavor: latest=false is set on the base image metadata (merge job and the smoke-test recompute), so a v* tag push can't publish :latest from the lean base image. Left latest=auto on the Studio image deliberately: the Studio image owns :latest, and on a tagged release we do want :latest to follow the newest semver tag. The base base- prefix keeps the two namespaces separate.

Pin llama.cpp prebuilt in release builds (8402dce): the base build-args never passed LLAMA_PREBUILT_TAG, so each matrix leg resolved latest independently and a release moving between the amd64 and arm64 legs could put different GGUF binaries under one manifest. A new prepare job resolves the release once (explicit llama_prebuilt_tag dispatch input for a frozen build, else the /releases/latest redirect to a concrete tag, mirroring docker/build.sh) and passes that single tag to both legs.

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@danielhanchen

Copy link
Copy Markdown
Member Author

Went through the latest review pass (the 5-reviewer run plus the Codex review) and fixed the genuinely-open items on this branch:

  • pip shim (docker/unsloth_pip_shim.py): it counted only canonical package names as install targets, so pip install -e ., pip install ., git+https://... and direct wheel URLs were silently dropped inside notebooks. It now treats any non-flag token as a target, so editable / local / VCS / URL installs run while a flags-only line (e.g. pip install --upgrade) still no-ops.
  • notebook sync (docker/unsloth_sync_notebooks.sh): the first-boot populate cp -a'd the baked template over any pre-existing notebook, which could clobber a bind-mounted or hand-created file on the very first start. It now applies the same ownership rule the refresh path already had (keep a pre-existing file unless it is byte-identical to the template). Also excluded .unsloth_sync_state.tmp when recording state so the temp file is not tracked as a managed notebook.
  • docker-publish.yml: set flavor: latest=false on the base image metadata steps. metadata-action defaults to latest=auto, which would have tagged :latest on a v* (semver) tag push from the base image and collided with the Studio image that legitimately owns :latest.
  • notebook deps / decord: pinned the notebook dependency set to the tested versions and split decord into its own step, hard on amd64 (a missing/incompatible wheel is a real regression there) and fail-soft on arm64/other (decord ships no aarch64 wheel), so the native arm64 base build no longer breaks on it.

Also re-merged current main so the branch carries the latest offline checkpoint load/export and Colab tunnel work and stays conflict-free.

Most of the older comments in this PR were already resolved by the branch's history (build-args prose moved out of the build-args: blocks, least-privilege permissions: default, run.sh secret handling / TTY-gated -it / --gpus translation, torchaudio==2.10.0, preserved numpy/_core/tests, pinned UNSLOTH_REF / UNSLOTH_ZOO_REF).

One follow-up that is intentionally out of scope here: the reviewers flagged asymmetric CPU-mode CUDA-probe guards in unsloth_zoo (compiler.py, loss_utils.py). Those live in the unslothai/unsloth_zoo repo and need a separate PR there; the Docker CPU-mode confirmation scripts stay torch-only until that lands so they do not fail on the unguarded probe.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8402dcebdd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docker/unsloth_sync_notebooks.sh Outdated
fi
cp -a "$TEMPLATE/$rel" "$DEST/$rel" 2>/dev/null || true
done
record_state

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve pre-existing notebooks outside the sync state

When /workspace/unsloth-notebooks already contains a user-edited file on first boot, the branch above skips copying the template, but this record_state call records every file under DEST as if the sync wrote it. On the subsequent GitHub refresh, that file's current hash matches the recorded hash, so it is treated as pristine and can be overwritten by the upstream notebook despite being pre-existing user data.

Useful? React with 👍 / 👎.

Comment thread docker/unsloth_pip_shim.py Outdated
# not just tokens with a canonical pkg name: editable / local / url / vcs
# installs (`-e .`, `.`, `git+https://...`, a wheel URL) carry no canonical
# name but must still run, and a `-r`/`-c` file pulls in real requirements.
has_install_target = any(not t.startswith("-") for t in keep_args)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ignore option values when deciding whether pip has targets

For notebook install cells that only request baked packages plus an index/link flag, e.g. pip install --extra-index-url https://download.pytorch.org/whl/cu128 torch transformers==5.3.0, the shim drops the package specs but leaves the flag and its URL in keep_args. This test then treats the URL as an install target and execs pip install --extra-index-url <url> with no requirement, causing the cell to fail instead of no-oping after preserving the baked stack.

Useful? React with 👍 / 👎.

# The full Studio image owns the unprefixed namespace, headed by
# :latest. Same :latest gating rationale as the base job.
type=raw,value=latest,enable=${{ github.ref == format('refs/heads/{0}', github.event.repository.default_branch) && github.event.inputs.unsloth_ref == '' }}
type=ref,event=tag

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Disable implicit latest tags for Studio tag builds

This Studio metadata block omits flavor: latest=false, so type=ref,event=tag can still emit an implicit :latest tag under metadata-action's default latest=auto behavior (docs), bypassing the explicit branch-only gate above on v* tag pushes. The mirrored smoke-test metadata block has the same config, so it can also pull :latest instead of the tag that was just published.

Useful? React with 👍 / 👎.

Comment thread docker/unsloth_studio_update.sh Outdated
# (or any branch/tag/sha); otherwise take the latest PyPI release.
if [ -n "$REF" ]; then
SPECS="git+https://github.com/unslothai/unsloth.git@${REF}#egg=unsloth"
SPECS="$SPECS git+https://github.com/unslothai/unsloth-zoo.git@${REF}#egg=unsloth_zoo"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve unsloth-zoo separately for ref updates

When --ref is an Unsloth release tag or commit SHA, this installs unsloth-zoo from the same ref even though that repo is not guaranteed to have matching tags or SHAs; the publish workflow already has separate zoo-ref resolution for this reason. In those cases the advertised unsloth-studio-update --ref <tag|sha> path fails before updating Studio, so the script should resolve the zoo ref independently or leave it on a known default.

Useful? React with 👍 / 👎.

Comment thread docker/Dockerfile
# omegaconf TTS families + both NeMo-Gym RL notebooks' config objects
# einx TTS codec tensor-rearrange (Llasa / Oute / Spark TTS)
# librosa Whisper audio feature extraction (pairs with soundfile + torchcodec)
# ftfy Oute TTS text normalisation

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove unsupported sm_103 from the CUDA 12.8 arch list

This runtime arch list is consumed by PyTorch/CUDA extension builds inside the container, but the image only installs CUDA 12.8 nvcc on amd64 while NVIDIA documents compiler target support for sm_103 as added in CUDA 12.9 (CUDA features archive). Any pip install/JIT path that honors TORCH_CUDA_ARCH_LIST will pass an unsupported compute_103 target to nvcc 12.8 and fail, even on non-B300 hosts; drop 10.3 here or ship a 12.9+/13 compiler wherever it is advertised.

Useful? React with 👍 / 👎.

…gs, zoo ref, arch list)

- pip shim: do not treat the value of an index-url / find-links / constraint flag
  as an install target. A cell like 'pip install --extra-index-url <url> torch'
  now no-ops after keeping the baked stack instead of exec'ing a bare
  'pip install --extra-index-url <url>' that fails. Positional . / url / vcs and
  -r/--requirement files still count as targets.
- notebook sync: on first boot, record only files we actually wrote (or that are
  byte-identical to the template), never a kept pre-existing user file; and on the
  GitHub refresh, treat a file present in DEST but absent from the sync state as
  user-owned and keep it. Previously a bind-mounted notebook was recorded as
  managed and then overwritten by upstream.
- docker-publish: add flavor latest=false to the Studio metadata steps too, so a
  v* tag push cannot emit an implicit :latest via metadata-action's latest=auto;
  :latest stays default-branch-only, and the smoke test pulls the published tag.
- unsloth-studio-update: resolve the unsloth-zoo ref independently of --ref (new
  --zoo-ref, else use the ref only when the zoo repo has it, else fall back to
  main) so 'update --ref <unsloth-tag/sha>' does not fail on a missing zoo ref.
- Dockerfile: drop 10.3 (compute_103) from TORCH_CUDA_ARCH_LIST in both the
  builder and runtime stages. B300 runs sm_100 SASS, and the bundled CUDA 12.8
  nvcc cannot compile compute_103 (added in 12.9), which broke arch-list-honoring
  source / JIT builds.
@danielhanchen

Copy link
Copy Markdown
Member Author

All five are valid; fixed:

  • docker/unsloth_sync_notebooks.sh: two-part fix. On first boot, a kept pre-existing user file is no longer recorded in the sync state (only files we actually wrote, or that are byte-identical to the template, are recorded), and the GitHub refresh now treats a file that exists in DEST but is absent from the state as user-owned and keeps it. Previously a bind-mounted notebook was recorded as managed, then overwritten by upstream on the next refresh.
  • docker/unsloth_pip_shim.py: the install-target test no longer counts the value of an index-url / find-links / constraint flag as a target. pip install --extra-index-url <url> torch transformers==5.3.0 now no-ops after keeping the baked stack instead of exec'ing a bare pip install --extra-index-url <url> that fails. Positional . / url / vcs / editable targets and -r/--requirement files still count.
  • .github/workflows/docker-publish.yml: added flavor: latest=false to the Studio metadata steps as well (the merge step and the smoke-test step). They omitted it, so type=ref,event=tag under the default latest=auto could still emit an implicit :latest on a v* tag and bypass the default-branch-only gate, and the smoke test could pull :latest instead of the just-published tag. :latest (and the :studio alias) now publish only via the explicit default-branch rules.
  • docker/unsloth_studio_update.sh: --ref no longer forces the same ref onto unsloth-zoo. There is a new --zoo-ref, and otherwise the unsloth ref is used for the zoo only when the zoo repo actually has it (checked via git ls-remote), falling back to main so update --ref <unsloth-tag/sha> does not fail on a tag/SHA the zoo never cut. This mirrors the publish workflow's separate zoo-ref resolution.
  • docker/Dockerfile: dropped 10.3 (compute_103) from TORCH_CUDA_ARCH_LIST in both the builder and runtime stages. B300 runs sm_100 SASS (already noted in the wheel-coverage comment), and the bundled CUDA 12.8 nvcc cannot compile compute_103 (added in CUDA 12.9), so listing it broke any source/JIT build inside the container that honors the arch list.

Validation: bash -n on the shell scripts, py_compile on the Python helpers, and the workflow parses as valid YAML.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0ebbdbb9cc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +132 to +133
if prev_flag in _REQ_FILE_FLAGS:
has_target = True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Filter requirement files before invoking pip

When a notebook install cell uses pip install -r requirements.txt, this branch marks the requirements file as a real target and later execs pip with the file unchanged. Any protected packages inside that file, such as torch, transformers, vLLM, or nvidia wheels, bypass _KEEP and the Transformers sidecar marker logic, so a requirements file with common training deps can overwrite the baked cu128 stack or install Transformers into the base venv. Parse/filter requirement files or reject files that mention protected packages before passing them through.

Useful? React with 👍 / 👎.

# out of build-args -- forwarded lines must be KEY=VALUE only.)
build-args: |
BASE_IMAGE=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ needs.merge.outputs.digest }}
UNSLOTH_STUDIO_REF=${{ github.event.inputs.unsloth_ref || (startsWith(github.ref, 'refs/tags/') && github.ref_name) || github.sha || 'main' }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Forward the resolved zoo ref into Studio builds

For workflow dispatches that set unsloth_zoo_ref (or future tag pushes where the zoo resolver returns something other than main), the base image bakes steps.zoo_ref.outputs.ref but the Studio job only forwards the Unsloth ref. Dockerfile.studio runs install.sh --local, and that local-install path overlays unsloth-zoo from git main, so the published full image can run a Studio backend with a different zoo than the base image and the operator-requested ref; pass the resolved zoo ref through this build and install that ref in the Studio venv.

Useful? React with 👍 / 👎.

Comment thread docker/Dockerfile Outdated
# see unsloth_sync_notebooks.sh + unsloth_nb_content_sig.py. Inherited as-is by
# the studio image (FROM base).
RUN set -eux \
&& git clone --depth 1 https://github.com/unslothai/notebooks /opt/unsloth-notebooks \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pin baked notebooks to one resolved commit

Each architecture leg runs this clone independently against unslothai/notebooks HEAD, so if that repo advances between the amd64 and arm64 builds (or between release reruns), the same Docker tag seeds different baked notebook templates and .unsloth_template_commit state depending on the pulled platform. Resolve the notebooks commit once in the workflow/build script and clone that ref, like the llama.cpp prebuilt tag, so the multi-arch image contents stay consistent.

Useful? React with 👍 / 👎.

…inned notebooks commit)

- unsloth_pip_shim.py: filter protected packages out of a notebook
  `pip install -r requirements.txt`. The -r value was passed to the real pip
  unchanged, so torch / transformers / vLLM / nvidia pins inside the file could
  overwrite the baked cu128 stack or push transformers into the base venv.
  _filter_requirements_file() applies the same _KEEP / transformers-sidecar
  rules per line, writes the survivors to a temp file, keeps comments, option
  lines, nested includes and urls verbatim, and records a pinned transformers
  version for the sidecar.

- install.sh + Dockerfile.studio + docker-publish.yml: forward the resolved
  unsloth-zoo ref into the Studio build. install.sh --local overlaid
  unsloth-zoo from git main regardless of the operator-requested or base-image
  ref, so the full image could run a different zoo than the base. install.sh
  now honors UNSLOTH_ZOO_REF across all four --local overlays, Dockerfile.studio
  passes UNSLOTH_STUDIO_ZOO_REF through to it, and the workflow resolves one zoo
  ref in the prepare job and shares it with both the base and Studio builds.

- Dockerfile + docker-publish.yml: pin unslothai/notebooks to one resolved
  commit. Each arch leg cloned HEAD independently, so the same tag could seed
  different baked templates and .unsloth_template_commit depending on the pulled
  platform. The prepare job freezes notebooks to one sha (like the llama.cpp
  prebuilt tag) and the Dockerfile fetches that single ref at depth 1.
@danielhanchen

Copy link
Copy Markdown
Member Author

All three were valid. Fixed in d476c7764:

  • docker/unsloth_pip_shim.py (filter -r files): a notebook pip install -r requirements.txt previously passed the file to the real pip unchanged, so torch / transformers / vLLM / nvidia pins inside the file bypassed _KEEP and the transformers sidecar and could overwrite the baked cu128 stack or push transformers into the base venv. _filter_requirements_file() now applies the same _KEEP / transformers rules per requirement line, writes the survivors to a temp file passed to pip, and records a pinned transformers version for the sidecar. Comments, option lines, nested -r/-c includes and url/vcs/path lines are kept verbatim; when nothing is protected or the file cannot be read, the original path is passed through unchanged.
  • install.sh + docker/Dockerfile.studio + .github/workflows/docker-publish.yml (forward the zoo ref into Studio): install.sh --local overlaid unsloth-zoo from git main regardless of the resolved/operator-requested ref, so the full image could run a different zoo than the base. install.sh now honors UNSLOTH_ZOO_REF across all four --local overlays, Dockerfile.studio forwards UNSLOTH_STUDIO_ZOO_REF into it, and the workflow resolves one zoo ref in the prepare job and shares it with both the base build and the Studio build (so they always agree).
  • docker/Dockerfile + .github/workflows/docker-publish.yml (pin the notebooks commit): each arch leg cloned unslothai/notebooks HEAD independently, so the same tag could seed different baked templates and .unsloth_template_commit across platforms or release reruns. The prepare job now freezes notebooks to one resolved sha (the same pattern as the llama.cpp prebuilt tag) and the Dockerfile fetches that single ref at depth 1.

Validation: bash -n on the shell scripts, py_compile plus a functional check of _filter_requirements_file (torch/transformers/nvidia/vLLM dropped, transformers version recorded, numpy/omegaconf/nested-r/git-url kept), and the workflow parses as valid YAML. Propagated into the stacked Studio PR.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d476c7764b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +85 to +86
if re.match(r"^[a-z]+\+", token) or "://" in token or token.startswith((".", "/")):
return None # vcs / url / local path -> let it pass through

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep protected direct-reference installs out

When a notebook uses a quoted PEP 508 direct reference for a protected package, e.g. pip install "torch @ https://.../torch.whl" or "unsloth @ git+https://...", this early URL check returns None before extracting the distribution name. The token is then kept and treated as a real target, so the shim can reinstall torch/Unsloth into the base venv even though _KEEP is supposed to preserve the baked CUDA stack.

Useful? React with 👍 / 👎.

Comment on lines +196 to +200
if tok in _VALUE_FLAGS:
keep_args.append(tok)
skip_next = True
prev_flag = tok
continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle equals-form requirement files

For a valid pip invocation like pip install --requirement=requirements.txt (the same --requirement <file> option also accepts the standard --option=value form), this exact-token check does not recognize the requirements file. The argument starts with -, so it is kept as an option but has_target remains false; cells whose only install target is that file return as a no-op and skip all of its dependencies.

Useful? React with 👍 / 👎.

…value req files

Two more notebook-shim gaps from review:

- A quoted PEP 508 direct reference for a protected package, e.g.
  `pip install "torch @ https://.../torch.whl"` or `"unsloth @ git+https://..."`,
  bypassed _KEEP: _canon hit the url guard and returned None before pulling the
  distribution name, so the token was treated as a real target and reinstalled
  into the base venv. _canon now extracts the name from the `name [extras] @ url`
  form first, so a protected package pinned through a URL/VCS is still dropped; a
  non-protected direct reference returns its name and is kept exactly as before.

- The `--requirement=reqs.txt` equals-form (pip accepts `--option=value` for any
  value-taking flag) was not recognized: the token starts with `-`, so it was
  kept as an opaque option, the file was never filtered, and has_target stayed
  false -- a cell whose only target was that file silently no-op'd. The scan now
  splits `--flag=value`, filters the requirements file for `-r`/`--requirement`,
  and counts it as a target; other inline-value options stay options.
@danielhanchen

Copy link
Copy Markdown
Member Author

Both valid. Fixed in 7083a2d9f:

  • Direct-reference protected installs: a quoted PEP 508 direct reference for a protected package, e.g. pip install "torch @ https://.../torch.whl" or "unsloth @ git+https://...", hit the url guard in _canon and returned None before the distribution name was extracted, so the token was kept as a real target and reinstalled into the base venv. _canon now extracts the name from the name [extras] @ <url> form first, so a protected package pinned through a URL/VCS is dropped like any other _KEEP entry. A non-protected direct reference still returns its name and is kept exactly as before, so genuine extra installs are unaffected.
  • --requirement= equals-form: pip accepts --option=value for any value-taking flag, but the scan only matched the exact --requirement / -r tokens, so --requirement=requirements.txt started with -, was kept as an opaque option, and has_target stayed false. A cell whose only target was that file silently no-op'd and installed nothing. The scan now splits --flag=value, filters the requirements file for -r/--requirement, and counts it as a target; other inline-value options (--index-url=...) stay options.

Validation: py_compile plus a functional matrix: protected direct refs (torch @ url, unsloth @ git+..., unsloth-zoo@git+..., vllm[extra] @ url) drop and no-op; non-protected (mypkg @ url) and plain url/vcs/path pass through; --requirement=FILE and the space forms all filter the file and record the transformers pin; --index-url=URL pkg keeps the option and installs only the package. Propagated into the stacked Studio PR.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ee7f4b644

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread docker/Dockerfile.studio
&& git checkout -q FETCH_HEAD \
&& UNSLOTH_STUDIO_HOME="${UNSLOTH_STUDIO_HOME}" \
UNSLOTH_TORCH_INDEX_FAMILY="${TORCH_FAMILY}" \
UNSLOTH_ZOO_REF="${UNSLOTH_STUDIO_ZOO_REF}" \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the resolved zoo ref during Studio setup

When a Studio build passes UNSLOTH_STUDIO_ZOO_REF other than main, this only pins the first install.sh overlay; the same --local flow then runs studio/setup.sh, whose install_python_stack.py local branch force-reinstalls unsloth-zoo @ git+https://github.com/unslothai/unsloth-zoo with no @ref, so the Studio venv ends up on zoo main while the base venv used the resolved ref. This still affects workflow_dispatch/tag publishes that intentionally resolve a non-main zoo ref; thread this env through setup/install_python_stack before the final overlay.

Useful? React with 👍 / 👎.

Comment on lines +129 to +130
if not stripped or stripped.startswith(("#", "-")):
out.append(line) # comment / blank / option / nested include -> keep

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Filter nested requirement includes too

For a notebook cell like pip install -r reqs.txt where reqs.txt contains -r torch-reqs.txt, this branch copies the nested include verbatim; pip recursively processes those requirement includes, so protected specs in the nested file never pass through _KEEP or the Transformers marker logic and can still reinstall torch/transformers into the baked venv. The fresh issue in the current helper is the one-level filter: nested -r/--requirement lines are explicitly preserved instead of being filtered or rejected.

Useful? React with 👍 / 👎.

fi
fi
mkdir -p "$(dirname "$dst")" 2>/dev/null || true
if cp -a "$f" "$dst" 2>/dev/null; then

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor deleted notebooks during refresh

When UNSLOTH_KEEP_DELETED_NOTEBOOKS=1 is set, the offline restore block is skipped, but the GitHub refresh still falls through here for any tracked notebook the user deleted and copies it back from the fresh clone whenever upstream advances. That makes the documented deletion opt-out only work until the next remote update; skip paths that are recorded in the state but absent from DEST when this option is enabled.

Useful? React with 👍 / 👎.

@LeoBorcherding

Copy link
Copy Markdown
Collaborator

reviewed on windows 11 / WSL2 (ubuntu 24.04, docker 29.6.1 + nvidia-container-toolkit). built the :core image with no gpu used during the build, then ran it on my RTX 5060 Ti (sm_120 blackwell).

image came out with torch 2.10.0+cu128 and arch flags sm_70 sm_75 sm_80 sm_86 sm_90 sm_100 sm_120, so the sm_100/sm_120 kernels are baked in even though nothing built against a gpu.

ran it with --gpus all and /workspace/smoke_test.py passed:

NVIDIA GeForce RTX 5060 Ti. Torch: 2.10.0+cu128. CUDA Toolkit: 12.8
loaded unsloth/Llama-3.2-1B-Instruct-bnb-4bit in 4-bit
step 0 loss=4.1073 ... step 4 loss=3.7499
OK: 5 LoRA steps completed

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ee7f4b644

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/docker-publish.yml Outdated
ok_studio=0; ok_jupyter=0
for i in $(seq 1 60); do
if curl -fsS http://localhost:18000/api/health >/dev/null 2>&1; then ok_studio=1; fi
if curl -fsS http://localhost:18888/api >/dev/null 2>&1; then ok_jupyter=1; fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Probe a public Jupyter endpoint

In the full-image smoke test this request hits Jupyter's /api without any token or login cookie. The launcher always configures Jupyter with a password hash when booting the image, so unauthenticated API calls return 403; because this uses curl -f, ok_jupyter never flips and any HAS_GPU_RUNNER publish run reports the full image unhealthy even when JupyterLab is up. The same /api probe is duplicated in docker/docker_confirm.sh, so use a public endpoint such as /login or authenticate the request.

Useful? React with 👍 / 👎.

Comment thread docker/entrypoint.sh
# setup is not silently ignored.
if [[ "${UNSLOTH_ALLOW_CPU:-0}" == "1" ]]; then
if ! command -v nvidia-smi >/dev/null 2>&1 || ! nvidia-smi -L 2>/dev/null | grep -q '^GPU'; then
warn "UNSLOTH_ALLOW_CPU=1 and no GPU visible -- continuing on CPU."

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate CPU mode before Studio model loads

When the image is started on a CPU-only host with UNSLOTH_ALLOW_CPU=1, this branch lets the full Studio image continue and the comments/docs advertise Studio chat as usable. However UNSLOTH_ALLOW_CPU makes Unsloth report DEVICE_TYPE == "cuda", and Studio inference then calls FastLanguageModel.from_pretrained(...), whose CUDA path still unconditionally executes torch.cuda.get_device_properties(0) (see unsloth/models/llama.py:2310 and unsloth/models/vision.py:747). On Docker Desktop/macOS or Windows+AMD, loading a chat model therefore raises instead of falling back to CPU; either keep CPU mode to tooling/Jupyter or guard those CUDA probes.

Useful? React with 👍 / 👎.

Comment on lines +19 to +20
absolute path so there is no recursion. `python -m pip` / `%pip` bypass PATH and
are not intercepted -- the driven `unsloth-run` handles those by parsing the

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Intercept %pip before it mutates the baked stack

For notebooks that use %pip or python -m pip, this explicitly bypasses the PATH shim, but neither the IPython startup hook nor unsloth-run rewrites those cells or installs a pip module wrapper. In that scenario a cell like %pip install transformers==... or %pip install torch... runs the real pip inside /opt/unsloth-venv and can overwrite the cu128 torch/transformers stack that the shim is meant to protect, so the safe notebook execution path is only safe for !pip/!uv shell commands.

Useful? React with 👍 / 👎.

danielhanchen and others added 2 commits June 27, 2026 08:46
…XDEV, %pip shim)

- docker-publish smoke + docker_confirm.sh probe Jupyter /login, not /api: the
  launcher always configures a password hash so /api returns 403 and curl -f
  would never flip the health flag (false build failure).
- entrypoint.sh CPU messaging: CPU mode covers Jupyter, GGUF tooling and
  llama.cpp (GGUF) Studio chat; training AND loading an Unsloth model
  (FastLanguageModel) still need a GPU, since from_pretrained runs CUDA probes.
- install_llama_prebuilt.py: rollback/activation moves used bare os.replace,
  which fails with EXDEV across overlayfs in a Docker build and fell back to a
  broken source build (no nvcc). Add is_cross_device_error + move_install_dir_aside
  (os.replace fast path, copy+remove on EXDEV; busy errors still re-raise).
- notebooks: %pip / %uv line magics and the `!python -m pip` form bypassed the
  PATH pip/uv shim and could overwrite the baked cu128 torch/vLLM stack. Add
  unsloth_nb_pip_magic.py to re-point them at the shim, wired via the IPython
  startup hook and installed into the venv site-packages.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

https://github.com/unslothai/unsloth/blob/f1525695e55fe5c85d3f33efb585d4bff3dcadb9/docker/unsloth_nb_content_sig.py#L256-L257
P2 Badge Do not drop captured body cells from signatures

For any real tutorial cell that starts with %%capture or %%bash to hide noisy output or run shell preprocessing, this helper excludes the entire cell from the middle digest even if it does not install packages. If upstream later changes that cell, middle_unchanged reports SAME, the refresh treats the notebook as only header/footer churn, and users keep stale executable content; only classify these magics as boilerplate when the cell is actually an install/setup cell.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread docker/docker_confirm.ps1 Outdated
$okStudio = $false; $okJupyter = $false
foreach ($i in 1..60) {
if (-not $okStudio) { try { Invoke-WebRequest -UseBasicParsing -Uri "http://localhost:$PORT_STUDIO/api/health" -TimeoutSec 4 | Out-Null; $okStudio = $true } catch {} }
if (-not $okJupyter) { try { Invoke-WebRequest -UseBasicParsing -Uri "http://localhost:$PORT_JUPYTER/api" -TimeoutSec 4 | Out-Null; $okJupyter = $true } catch {} }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Probe Jupyter login in the Windows confirmer

When the Windows confirmation script reaches this full-image check, studio_launch.sh has already configured Jupyter with a hashed password, so an unauthenticated request to /api returns 403 even when JupyterLab is healthy. The Linux confirmer and workflow use /login for this reason; leaving the PowerShell path on /api makes Windows users see a false Jupyter failure unless they authenticate the request or probe /login.

Useful? React with 👍 / 👎.

Comment on lines +97 to +98
if re.match(r"^[a-z]+\+", token) or "://" in token or token.startswith((".", "/")):
return None # vcs / url / local path -> let it pass through

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Block VCS egg installs for protected packages

Fresh evidence beyond the earlier fixed PEP 508 case: legacy VCS requirements such as pip install git+https://github.com/unslothai/unsloth.git#egg=unsloth still take this URL/VCS passthrough branch, so _canon() returns None and the shim later executes the token as a real install target. In notebooks using that valid pip form, protected packages can still be reinstalled into the baked venv and bypass _KEEP; parse #egg=/editable values before treating VCS URLs as passthrough.

Useful? React with 👍 / 👎.

Comment on lines +74 to +77
# Of those value-flags, the ones whose VALUE is itself an install target: a
# requirements file pulls real requirements. An index-url / find-links /
# constraint / target value is an option, not something to install.
_REQ_FILE_FLAGS = {"-r", "--requirement"}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Filter constraint files before pip sees them

When a notebook runs something like pip install -c constraints.txt peft and that constraints file pins transformers or torch, this branch keeps -c verbatim because only requirement files are inspected. Pip applies constraints to dependency resolution, so installing a kept package can still downgrade/reinstall protected packages from the constraint file without _KEEP or the sidecar marker ever seeing those specs; filter or reject protected entries in constraint files too.

Useful? React with 👍 / 👎.

Comment on lines +18 to +21
# Persistence: the update is written to the container's writable layer, so it
# survives `docker restart`. To keep it across a full `docker rm` + `docker run`
# (and to keep your chats/users/models), run Studio with its home on a named
# volume: -v unsloth_studio_home:/opt/unsloth-studio

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not mount over the Studio install home

If users follow this persistence instruction with a fresh named volume, Docker masks the baked /opt/unsloth-studio tree, including bin/unsloth and the Studio venv that supervisord.conf starts. The full image then boots with an empty home and the Studio service cannot exec; persist only a data subdirectory or seed the volume before recommending this mount.

Useful? React with 👍 / 👎.

@danielhanchen

Copy link
Copy Markdown
Member Author

Two parts here: the WSL2 report and the Codex review on the same commit.

@LeoBorcherding thanks for the thorough Windows 11 / WSL2 pass.

The llama.cpp EXDEV failure is fixed in 2c31686. install_llama_prebuilt.py no longer uses a bare os.replace for the rollback and activation moves: is_cross_device_error plus move_install_dir_aside fall back to copy + remove on EXDEV, while busy/in-use errors still re-raise so a live install is never half-copied. The updater now survives the cross-overlayfs move inside a Docker build instead of giving up and source-building (which is broken here with no nvcc). Good catch on why it did not show in our own testing, the base already had the latest llama tag so the updater never ran.

On the cu128 dependency: correct, the studio build needs install.sh's UNSLOTH_TORCH_INDEX_FAMILY=cu128 support, which lives in #6692. These three PRs are stacked: #6692 lands first (or point UNSLOTH_STUDIO_REF at the branch when building before merge), #5748 is the base image, and #6681 sits on top.

Codex review 4583085472 (same commit) is also addressed in 2c31686:

  • Jupyter smoke probe (docker-publish workflow + docker_confirm.sh) now hits /login instead of /api. The launcher always configures a password hash, so /api returns 403 and curl -f would never flip the health flag.
  • entrypoint.sh CPU messaging clarified: CPU mode covers Jupyter, the GGUF tooling and llama.cpp (GGUF) Studio chat; training and loading an Unsloth model (FastLanguageModel) still need a GPU, since from_pretrained runs CUDA probes.
  • %pip / %uv / python -m pip: the PATH shim only intercepted !pip / !uv. Added unsloth_nb_pip_magic.py so the %pip / %uv line magics and the !python -m pip form also route through the shim and cannot overwrite the baked cu128 torch/vLLM stack.

@danielhanchen

Copy link
Copy Markdown
Member Author

End to end validation of the published Docker image

I ran the published Blackwell Docker image end to end on the free GitHub-hosted Linux x64 runners. The image is danielhanchen/unsloth-blackwell-docker (unsloth-blackwell-studio.tar.gz), pulled with docker/hf_pull.sh and docker loaded, then booted in CPU mode (UNSLOTH_ALLOW_CPU=1) so it exercises the GGUF / llama.cpp path the image ships. Matrix: ubuntu-latest, ubuntu-24.04, ubuntu-22.04, all green.

Feature probe (identical PASS on all three runners):

Feature Result
Studio auth (login + forced password rotation) PASS
Load unsloth/gemma-4-E4B-it-GGUF:UD-Q4_K_XL PASS (is_gguf, ctx=2048)
Chat completion PASS (412 char reply)
RAG (create KB, ingest a doc, search) PASS (1 hit)
Web search tool PASS (tool call observed)
llama.cpp prebuilt PASS (tag b9813-mix-1f1aaa4, repo unslothai/llama.cpp, llama-server v9813)
unsloth-llama-update --check PASS (up to date)
Branding integrity (python -m unsloth_branding --verify) PASS
Studio versions unsloth 2026.6.9, unsloth_zoo 2026.6.7

gemma-4-E4B-it-GGUF at UD-Q4_K_XL loaded and answering in the Studio chat, with the web-search tool firing live and citing sources (CPU generation ~5.7 tok/s):

Studio chat with gemma-4-E4B-it-GGUF and web search

Live session:

Studio chat session

Boot path: docker load of the published tarball, then /api/health (Studio) and /login (JupyterLab) both healthy in CPU mode. The update button is covered in a separate check (staged unsloth 2026.5.5, ran unsloth-studio-update, version moved forward and the service restarted clean).

A Jupyter password hash is always configured, so /api returns 403; the Windows
confirmation reported a healthy full image as a hard failure. Matches the fix
already in docker_confirm.sh and docker-publish.yml.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8fc483ec62

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

echo = False,
name = "vLLM STDERR",
ready_regex = None,
# vLLM >= 0.19 emits "Starting vLLM API server ... on ..." (and

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve explicit infinite vLLM startup waits

When callers pass timeout=None, the previous Event.wait(timeout=None) waited indefinitely, which is a useful escape hatch for large models or slow first-time downloads. This expression now converts None (and 0) back to 1200 seconds, so those runs are killed after 20 minutes even though the caller explicitly disabled the timeout; handle None as an unbounded deadline instead of falling back to the default.

Useful? React with 👍 / 👎.

Comment on lines +132 to +134
v = requested_version()
if v and "transformers" not in sys.modules:
activate(v)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Scope transformer pins to the current kernel

This hook reads one shared /tmp/unsloth_nb/requested_transformers marker for every Jupyter kernel in the container. If two notebooks run concurrently with different install-cell pins, whichever cell writes the marker last controls the other kernel's next pre-run hook, so that notebook can activate the wrong transformers sidecar before its model cell; make the marker per-kernel/notebook rather than global.

Useful? React with 👍 / 👎.

if not changed:
return path, None, []
try:
fd, tmp = tempfile.mkstemp(prefix = "unsloth-nb-req-", suffix = ".txt")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep filtered requirement files beside the original

When a requirements file is changed because a protected package was dropped, the filtered copy is written under the default temp directory. For a valid file that also contains a relative nested include such as -r extras.txt, pip resolves that include relative to the requirements file it is currently reading, so after this rewrite it looks in /tmp instead of the notebook/project directory and the install fails; create the temporary file next to path or rewrite relative include paths.

Useful? React with 👍 / 👎.

Comment thread docker/Dockerfile
Comment on lines +686 to +687
&& mkdir -p /root/.ipython/profile_default/startup \
&& cp /opt/unsloth-nb/unsloth_ipython_startup.py /root/.ipython/profile_default/startup/00-unsloth-nb.py \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Install the notebook startup hook outside root home

When users start the base image with --user (common with mounted workspaces to avoid root-owned files), IPython uses that user's home rather than /root, so this startup file is never loaded. In that context UNSLOTH_NB_SHIM is not set and the PATH shim deliberately execs the real pip, letting notebook !pip/%pip cells mutate the baked torch/transformers stack; install the hook in a system-wide IPython/Jupyter startup location or otherwise enable it per kernel.

Useful? React with 👍 / 👎.

Comment on lines +58 to +59
first = t.lstrip().split("\n", 1)[0].strip().lower()
return first.startswith("%%capture") or first.startswith("%%bash")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hash substantive captured or bash notebook cells

This treats every %%capture or %%bash cell as boilerplate, even when the cell is real tutorial logic such as data prep, launches, or captured training code. For an untouched notebook where upstream changes one of those cells, both content signatures drop the changed cell and middle_unchanged can return SAME, so the boot refresh skips a substantive upstream fix; only exclude these magics after confirming they are the generated install/setup cell.

Useful? React with 👍 / 👎.

#
# Persistence: the swap lands in the container's writable layer (survives
# docker restart). To keep it across a full recreate, mount the prebuilt dir on
# a named volume: -v unsloth_llama:/opt/unsloth/llama.cpp

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not mount over the baked llama.cpp bundle

If users follow this persistence example with a fresh named volume, Docker masks /opt/unsloth/llama.cpp, including the baked binaries and converter that GGUF export and the Studio symlink rely on. The image then boots with an empty llama.cpp install and GGUF tooling fails until the update command seeds it; recommend seeding the volume first or mounting a parent/data path instead.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants