Add shared Xet to HTTP stall fallback (hf_xet_fallback, hf_cache_state) by danielhanchen · Pull Request #829 · unslothai/unsloth-zoo

danielhanchen · 2026-06-24T11:12:14Z

Summary

Hugging Face Xet (hf_xet) is the fast default transport, but a Xet transfer can hang on a blob with no progress and no exception, and a blocked native Xet thread cannot be killed in-process. Unsloth Studio already recovers from this (a no-progress watchdog respawns the download over plain HTTP with HF_HUB_DISABLE_XET=1). This PR lands the same logic in unsloth_zoo so it has a single home that both Unsloth main and Studio can share, rather than living only inside Studio.

This is the foundation PR. A follow-up in the unsloth repo wires FastModel.from_pretrained to it and re-points Studio's copy to import from here.

What this adds

unsloth_zoo/hf_cache_state.py: the sparse-aware HF cache primitives. On-disk byte accounting prefers st_blocks so a partially written Xet / hf_transfer .incomplete is not mistaken for full-size progress, plus active .incomplete detection over the current HF_HUB_CACHE root.
unsloth_zoo/hf_xet_fallback.py:
- start_watchdog: a 180s no-progress watchdog that fires only while a .incomplete is present and the on-disk size is unchanged, and resets when none is present (so model init is never misread as a stall).
- hf_hub_download_with_xet_fallback: a single-file download. A cached file short-circuits with no child; otherwise the download runs in a spawn child (its own process group) supervised by the watchdog. On a stall it kills the child group, makes the partial safe for HTTP, and respawns once with HF_HUB_DISABLE_XET=1. Cancellation and deterministic errors (auth / missing / disk full) propagate without a fallback.
- snapshot_download_with_xet_fallback: the whole-repo variant. It warms a repo in a killable child (the entrypoint Unsloth from_pretrained will use before its in-process load), with an in-process local_files_only fast path so a warm cache spawns nothing.
- An UNSLOTH_DISABLE_XET knob (also honors UNSLOTH_STABLE_DOWNLOADS and HF_HUB_DISABLE_XET) to force HTTP from the first attempt, plus injectable prepare_for_http_fn / scrub hooks so Studio can pass its marker-aware cache management.

Both new files are AGPL-3.0 (per-file SPDX), matching the Studio source they are lifted from; the rest of unsloth_zoo stays LGPL-3.0.

Tests

tests/test_hf_xet_fallback.py: CPU-only, no network, no real subprocess (the per-attempt download seam is monkeypatched). Covers the watchdog state machine (constant vs growing vs absent .incomplete, sparse-aware bytes, fires at most once), the transport decision (cached short-circuit, cancel and deterministic errors take no fallback, exactly one Xet to HTTP retry, per-file independence), the injected prepare seam, the UNSLOTH_DISABLE_XET knob, the snapshot fast path and stall fallback, and the HF_HUB_DISABLE_XET precondition in a fresh interpreter.

Result: 22 passed, 1 skipped (the sparse-blocks assertion skips on filesystems that report st_blocks == 0).

Hugging Face Xet downloads can hang on a blob with no progress and no exception, and a blocked native Xet thread cannot be killed in-process. Unsloth Studio already recovers from this; this lands the same logic in unsloth_zoo so both Unsloth main and Studio can share one implementation. - hf_cache_state.py: sparse-aware HF cache primitives (st_blocks based byte accounting, active .incomplete detection) over the active HF_HUB_CACHE. - hf_xet_fallback.py: a 180s no-progress watchdog plus a spawn-child download that keeps Xet primary and falls back to plain HTTP exactly once on a stall. hf_hub_download_with_xet_fallback handles a single file; the new snapshot_download_with_xet_fallback warms a whole repo in a killable child (the entrypoint Unsloth from_pretrained uses before its in-process load), with an in-process local_files_only fast path for a warm cache. Honors an UNSLOTH_DISABLE_XET knob and injectable prepare-for-http / scrub hooks so Studio can pass its marker-aware cache management. Both files are AGPL-3.0 (per-file), matching the Studio source. Tests: CPU-only, no network, no real subprocess (the download seam is faked).

chatgpt-codex-connector · 2026-06-24T11:12:18Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request introduces sparse-aware Hugging Face cache primitives and a robust fallback mechanism from Xet to HTTP downloads on no-progress stalls, accompanied by a comprehensive test suite. The feedback focuses on enhancing the reliability and robustness of the implementation. Key recommendations include removing a redundant import check, catching broader OSError exceptions during file cleanup to handle permission issues, improving Windows path detection, resetting the watchdog timer during unmeasurable states to prevent false stalls, and adding a small timeout when draining the process queue to avoid race conditions.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-24T11:14:15Z

+    try:
+        from huggingface_hub import constants as hf_constants
+    except ImportError:
+        return None


The try...except ImportError block for huggingface_hub is redundant because huggingface_hub is a required dependency of the project. Importing it directly is cleaner and adheres to the general rules.

from huggingface_hub import constants as hf_constants

References

Avoid using try...except ImportError for libraries that are required dependencies of the project, as the check is redundant.

gemini-code-assist · 2026-06-24T11:14:15Z

+                    try:
+                        blob.unlink()
+                    except FileNotFoundError:
+                        continue


If blob.unlink() raises a PermissionError (which is common on Windows or when files are locked), the outer except Exception block will catch it and abort the entire cleanup loop. This leaves other incomplete files untouched. Catching OSError (which covers both FileNotFoundError and PermissionError) inside the loop ensures maximum cleanup effort.

Suggested change

try:

blob.unlink()

except FileNotFoundError:

continue

try:

blob.unlink()

except OSError:

continue

gemini-code-assist · 2026-06-24T11:14:15Z

+            if not repo_id or repo_id.startswith(("/", ".", "~")) or "\\" in repo_id:
+                continue


Windows absolute paths using forward slashes (e.g., C:/path) or containing drive letters won't be skipped by the current check, potentially leading to unnecessary cache directory lookups. Checking for : (which is never present in valid HF repo IDs) is a robust way to skip Windows absolute paths.

Suggested change

if not repo_id or repo_id.startswith(("/", ".", "~")) or "\\" in repo_id:

continue

if not repo_id or repo_id.startswith(("/", ".", "~")) or "\\" in repo_id or ":" in repo_id:

continue

gemini-code-assist · 2026-06-24T11:14:15Z

+            if state is None:
+                if on_heartbeat is not None:
+                    on_heartbeat(f"Downloading ({transport} transport)...")
+                continue


If get_hf_download_state returns None due to transient issues, last_change is not updated. If this state persists longer than stall_timeout, the watchdog will immediately trigger a false stall as soon as the state becomes measurable again. Resetting last_change = now during unmeasurable periods avoids this.

Suggested change

if state is None:

if on_heartbeat is not None:

on_heartbeat(f"Downloading ({transport} transport)...")

continue

if state is None:

last_change = now

if on_heartbeat is not None:

on_heartbeat(f"Downloading ({transport} transport)...")

continue

gemini-code-assist · 2026-06-24T11:14:15Z

+                result = result_queue.get_nowait()
+            except queue.Empty:
+                result = None
+    finally:


Under heavy CPU/disk load, there can be a slight delay between the child process exiting and the OS flushing the queue's pipe. get_nowait() will fail instantly if the pipe is temporarily empty, whereas a small timeout (e.g., 1.0s) provides robustness against scheduling delays.

Suggested change

result = result_queue.get_nowait()

except queue.Empty:

result = None

finally:

try:

result = result_queue.get(timeout = 1.0)

except queue.Empty:

result = None

gemini-code-assist · 2026-06-24T11:14:15Z

+
+def test_get_state_skips_local_paths(hf_cache):
+    # Filesystem paths are not HF repo IDs and must be ignored without error.
+    assert xf.get_hf_download_state(["/abs/path", "./rel", "~user", "c:\\x"]) == (0, False)


Update the test assertion to verify that Windows absolute paths using forward slashes (e.g., c:/x) are also correctly skipped.

Suggested change

assert xf.get_hf_download_state(["/abs/path", "./rel", "~user", "c:\\x"]) == (0, False)

assert xf.get_hf_download_state(["/abs/path", "./rel", "~user", "c:\\x", "c:/x"]) == (0, False)

…d sys import - The no-progress watchdog and the HTTP-retry cache purge now honor a caller supplied snapshot cache_dir, not just HF_HUB_CACHE, so a stall under a custom cache is detected and the partial is purged before the HTTP fallback. Threaded through hf_cache_root / iter_active_repo_cache_dirs / has_active_incomplete_blobs (hf_cache_state) and get_hf_download_state / start_watchdog / _default_prepare_for_http / _run_download_attempt (hf_xet_fallback). - Remove the unused 'import sys' (ruff F401). - Add a regression test that a stall under a custom cache_dir is watched and cleaned.

… on file path - Critical: the HTTP retry could still use Xet. The spawn child re-imports the (heavy) unsloth_zoo package -- importing huggingface_hub, which reads HF_HUB_DISABLE_XET into a module constant at import time -- BEFORE the child body ran, so a child-side os.environ assignment landed too late. Now set the transport env in the parent (under a lock) around proc.start() so the child inherits it from creation; the child still sets it defensively. Tests assert the child inherits HF_HUB_DISABLE_XET=1 on the HTTP retry and is left untouched on the Xet attempt. - hf_hub_download_with_xet_fallback now accepts cache_dir (symmetric with the snapshot variant); threaded into try_to_load_from_cache and the child download.

danielhanchen · 2026-06-24T23:26:04Z

@codex review

danielhanchen · 2026-06-24T23:26:04Z

/gemini review

gemini-code-assist · 2026-06-24T23:26:08Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8af1b5debd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-24T23:34:59Z

+
+logger = logging.getLogger(__name__)
+
+_CTX = mp.get_context("spawn")


Avoid forcing spawn for library downloads

When this helper is called from the normal unguarded top-level model-loading scripts that FastModel.from_pretrained will use, the spawn context re-imports the user's __main__ module in the child before _download_child_entry runs. That re-executes the same top-level download/model load and typically hits multiprocessing's bootstrap RuntimeError or exits without putting a queue result, so the wrapper reports a download error instead of downloading. Use an entry point that does not import the caller's main module, or avoid forced spawn where safe.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-24T23:34:59Z

+            # The generic purge is cache_dir-aware; an injected (Studio) hook owns
+            # its own cache accounting and keeps the (repo_type, repo_id) signature.


Abort when HTTP prep cannot remove unsafe partials

If _default_prepare_for_http or an injected prep hook fails after a Xet stall, this handler only logs at debug level and still starts the HTTP retry. In the locked-file/permission-error cases where the .incomplete sparse partial is still present, the retry can resume over exactly the unsafe partial the preceding comment says can silently corrupt the blob, leaving a bad model in cache instead of failing safely.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-24T23:34:59Z

+    return (blobs_dir.is_dir() and _blob_dir_is_partial(blobs_dir)) or (
+        _repo_dir_has_broken_snapshot_symlinks(repo_dir)
+    )


Don't treat stale snapshot gaps as active downloads

In a cache that already has a broken snapshot symlink from a previous interrupted or evicted download but no *.incomplete file, this still reports has_incomplete=True. The watchdog is supposed to reset while no active partial exists, but this state starts the stall timer before the new attempt has written anything and can kill/retry otherwise healthy downloads that spend stall_timeout on metadata or lock waits.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-24T23:34:59Z

+            for blob in blobs_dir.iterdir():
+                if blob.is_file() and blob.name.endswith(INCOMPLETE_SUFFIX):
+                    try:
+                        blob.unlink()


Limit HTTP cleanup to this stalled attempt

When two files from the same repo are downloading concurrently, a stall in one wrapper call enters the HTTP retry and this loop unlinks every *.incomplete blob in the repo cache, including a sibling file that is still actively making progress. That can make the other download fail or lose its partial even though it did not stall, so the cleanup needs to be scoped or locked to the attempt being retried.

Useful? React with 👍 / 👎.

- prepare_for_http: continue past a locked/permission-denied partial (catch OSError, not just FileNotFoundError) so one bad blob does not abort cleanup of the rest. - get_hf_download_state: also skip drive-letter (":") repo ids so Windows absolute paths like C:/models are not mistaken for Hub ids. - watchdog: reset the no-progress timer on an unmeasurable tick so a long unmeasurable gap cannot trip a false stall the instant state is readable. - result drain: get(timeout=1.0) instead of get_nowait() so a child that exits microseconds before its queue feeder flushes is not misreported as 'exited without a result'.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6266d0811d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-25T00:00:35Z

+    return _download_with_xet_fallback(
+        repo_id = repo_id,
+        label = repo_id,
+        kind = "snapshot",


Preserve local model paths before Hub fallback

When this warm-up helper receives a filesystem model path (for example /mnt/model), the local_files_only probe above raises HFValidationError and this fall-through sends the path into the child snapshot_download, which rejects local paths instead of letting a normal from_pretrained('/mnt/model') proceed. Since the docstring says this is the from_pretrained cache warm-up path, guard or return existing local directories before entering the Hub download loop.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T00:00:35Z

+    out = re.sub(r"hf_[A-Za-z0-9]{8,}", "***", out)
+    out = re.sub(r"([Bb]earer\s+)[A-Za-z0-9._\-]+", r"\1***", out)


Redact signed URL query credentials

When running outside Studio, child exceptions fall back to this default scrubber, but these patterns only cover hf_ tokens and Bearer headers. If an HTTP/Xet exception includes the resolved presigned blob URL (for example a 403/timeout message with ...?X-Amz-Signature=...), the signed query string is returned to the parent and can be logged with the raised RuntimeError, despite the child-entry contract saying signed URLs are not exposed; please strip query strings or known signing parameters here as well.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T00:00:35Z

+            cache_dir = cache_dir,
+            allow_patterns = allow_patterns,
+            ignore_patterns = ignore_patterns,
+            local_files_only = True,


Preserve Hub freshness checks for moving revisions

When revision is omitted or is a branch/tag like main, this local_files_only fast path returns any cached snapshot without the normal Hub ref/etag refresh. After a model is updated upstream, snapshot_download_with_xet_fallback can keep serving the stale cached snapshot indefinitely instead of matching snapshot_download/from_pretrained semantics; restrict this no-network shortcut to immutable commit hashes or offline mode, or perform the refresh in the child.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T00:00:35Z

+            os.environ.update(child_env)
+            proc.start()


Avoid leaking temporary Xet env to sibling downloads

During an HTTP retry, this mutates process-global os.environ to HF_HUB_DISABLE_XET=1 while spawning the child. The lock only serializes writers in this helper; a concurrent download can call xet_force_disabled() or import huggingface_hub during this window and incorrectly force/cache HTTP for its own first attempt, so multi-model loads can lose Xet because a sibling stalled. Use a spawn bootstrap that does not expose the flag process-wide, or guard all reads/imports with the same lock.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T00:00:35Z

+    target = target_dir_name(repo_type, repo_id)
+    try:
+        for entry in root.iterdir():
+            if entry.name.lower() == target:


Match Hub cache dirs case-sensitively

Hugging Face accepts uppercase repo IDs and its cache folder naming preserves case (for example models--Foo--Bar and models--foo--bar are distinct), but this lowercased comparison makes both directories match either repo. In a cache containing repos that differ only by case, the watchdog and HTTP cleanup can count or unlink the other repo's active partials, causing false stalls or cross-repo download disruption; compare against the exact repo_cache_dir_name instead.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T00:00:35Z

+        if isinstance(cached, str) and os.path.exists(cached):
+            return cached


Honor cancellation before cached fast paths

If cancel_event is already set and this file is cached, the helper returns the cached path here before reaching _download_with_xet_fallback's cancellation check, contradicting its documented RuntimeError("Cancelled") behavior. In cancellation-driven model loads this can continue into expensive local loading after the caller has already aborted; check the event before the cache short-circuit (and similarly before the snapshot fast path).

Useful? React with 👍 / 👎.

danielhanchen · 2026-06-25T00:40:16Z

@codex review

danielhanchen · 2026-06-25T07:51:03Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6266d0811d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-25T08:00:49Z

+
+logger = logging.getLogger(__name__)
+
+_CTX = mp.get_context("spawn")


Avoid spawn re-importing unguarded training scripts

When this helper is used from FastModel.from_pretrained in a regular Python training script that calls it at module top level, the spawn context re-imports and executes the user's __main__ in the child before _download_child_entry runs. That re-enters the download during multiprocessing bootstrapping and Python aborts with the usual safe-import RuntimeError, so the child exits without warming the cache or reaching the fallback. Please launch a dedicated helper subprocess that only imports this module, or avoid spawn on POSIX for this path.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T08:00:49Z

+                    except OSError:
+                        # A locked / permission-denied blob (common on Windows)
+                        # must not abort cleanup of the rest of the partials.
+                        continue


Abort HTTP fallback when cleanup cannot remove partials

When a stalled .incomplete is locked or permission-denied, this except leaves it in place and reports success to the caller. The fallback then launches HTTP against the same cache even though this function documents that HTTP resume over a sparse Xet/hf_transfer partial can corrupt the blob, so Windows locks or restricted caches can turn a cleanup failure into a corrupted model file instead of a safe failure.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T08:00:49Z

+            for blob in blobs_dir.iterdir():
+                if blob.is_file() and blob.name.endswith(INCOMPLETE_SUFFIX):
+                    try:
+                        blob.unlink()


Limit cleanup to the stalled attempt

When two files from the same repo are downloaded concurrently and only one child stalls, this repo-wide loop removes every *.incomplete, including a sibling file that another live child is still writing. On Unix that can unlink the active writer's target and on Windows it collides with the locked-file path above, so the single-file fallback is not independent under parallel shard downloads; cleanup needs to target only the stalled attempt's partials or coordinate at repo scope.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T08:00:49Z

+    if hf_token:
+        out = out.replace(hf_token, "***")


Handle boolean Hugging Face tokens when scrubbing

Hugging Face download APIs accept token=True to use the saved token, and callers commonly pass that through from from_pretrained. If the child then raises a deterministic error, this path calls str.replace with a bool, causing a TypeError inside the exception handler before result_queue.put; the parent then reports that the process exited without a result instead of the real 401/404/network error. Guard the replacement with isinstance(hf_token, str).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T08:00:49Z

+    with _SPAWN_ENV_LOCK:
+        saved_env = {k: os.environ.get(k) for k in child_env}
+        try:
+            os.environ.update(child_env)
+            proc.start()


Keep temporary Xet env out of other threads

During an HTTP retry this mutates process-wide os.environ while the child is started, but other threads are not protected by _SPAWN_ENV_LOCK; for example, another concurrent helper can call xet_force_disabled() during this window and incorrectly decide to skip Xet and run HTTP cleanup for an unrelated download. Pass the environment to an isolated helper process or keep the transport decision and spawn under a synchronization scheme that other callers also use.

Useful? React with 👍 / 👎.

…ce_download (review 3) - spawn from notebook / stdin / python -c / unguarded script: multiprocessing 'spawn' re-opens __main__.__file__, which is missing or '<stdin>' there, so proc.start() raised before the child ran and the download never happened. Point __main__ at this importable module just for the spawn (serialized under the existing lock) and restore it. Verified from a real <stdin> context: the child now runs and returns the expected HF error instead of failing to spawn. - _default_prepare_for_http only deletes from an exact-case cache dir, or a single unambiguous case-insensitive match, so preparing HTTP for Org/Repo no longer purges a case-colliding org/repo on a case-sensitive filesystem. - HTTP prep now also clears broken snapshot symlinks, which the incomplete-state detector counts as active; otherwise the HTTP retry inherited stale state and re-tripped the watchdog. - snapshot/file fallbacks accept and forward force_download, skipping the warm cache short-circuit so force_download=True re-fetches in the killable child. - Added regression tests for all four (30 passed, 1 skipped).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2619c825f8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-25T08:30:25Z

+
+
+def repo_cache_dir_name(repo_type: str, repo_id: str) -> str:
+    return f"{repo_type}s--{repo_id.replace('/', '--')}"


Normalize repo_type=None before cache scans

When callers forward Hugging Face's default repo_type=None for a model download, the child hf_hub_download/snapshot_download normalizes it to "model" and writes under models--..., but the watchdog and HTTP-prep cache probes use this formatter first and look for Nones--... instead. In that scenario an actual stalled model download never exposes has_incomplete=True, so the Xet child is not killed and the HTTP fallback this helper is meant to provide does not run; normalize None to "model" before building cache directory names.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T08:30:25Z

+                for blob in blobs_dir.iterdir():
+                    try:
+                        if blob.is_file():
+                            total += blob_bytes_present(blob)


Scope single-file progress to the watched file

When two hf_hub_download_with_xet_fallback calls download different files from the same repo, this total includes every blob in the repo cache. A stalled child sees a sibling's growing .incomplete as progress and keeps resetting its stall timer, so it will not retry over HTTP after stall_timeout until all other repo downloads stop; large parallel shard downloads can therefore hang on the stuck shard much longer than configured.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T08:30:25Z

+                cache_dir = cache_dir,
+                allow_patterns = allow_patterns,
+                ignore_patterns = ignore_patterns,
+                local_files_only = True,


Do not treat partial snapshots as warm caches

When a previous interrupted download left a snapshot directory for this revision with only some files linked and no active .incomplete, Hugging Face's local_files_only path returns the snapshot folder as soon as it exists and does not verify that every allow_patterns file is present. This fast path then skips the supervised child entirely, so the later model load either fails on missing files or downloads them in-process, defeating the Xet stall fallback for exactly the partial-cache case it is meant to recover.

Useful? React with 👍 / 👎.

…safe (review 4) - repo_type=None (Hugging Face's default model repo) resolved to a bogus Nones--<id> cache dir, so get_hf_download_state/the watchdog missed the real models--<id> partial and could let a stalled Xet child run forever. Normalize repo_type to 'model' in repo_cache_dir_name and at the public entrypoints. - The destructive HTTP-prep path was case-collision safe but the read/watchdog path still yielded every case-insensitive match, so a stale partial in a colliding org/repo could make the watchdog kill an active Org/Repo download. Move the exact-or-unambiguous guard into iter_active_repo_cache_dirs so both paths share one rule, and drop the now-redundant _destructive_repo_cache_dirs. - Added regression tests for both (32 passed, 1 skipped).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dc0fddb108

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-25T09:00:11Z

+import time
+from typing import Any, Callable, Optional
+
+from unsloth_zoo.hf_cache_state import (


Keep the fallback importable without full Unsloth init

This new module is meant to be shared with Studio/standalone callers, but a normal import unsloth_zoo.hf_xet_fallback executes unsloth_zoo/__init__.py before this submodule can load; that pre-existing init raises at unsloth_zoo/__init__.py:200-202 when the full unsloth package is not installed and also performs heavy Torch/GPU setup otherwise. The tests only avoid this by loading the file via spec_from_file_location, so CPU/Studio workers that import the documented package path can fail before any fallback code runs; expose this through a lightweight import path or make the package init skip heavy checks for these cache/download submodules.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T09:00:11Z

+    if proc.is_alive():
+        _signal_group(getattr(signal, "SIGKILL", signal.SIGTERM))
+        proc.join(timeout = 5.0)


Kill the group even after the wrapper exits

When the Xet transfer has spawned a helper in the child's process group and the Python wrapper exits on SIGTERM before that helper does, this guard skips the SIGKILL pass because proc.is_alive() is already false. The surviving helper can keep the stalled writer alive while the parent starts HTTP cleanup/retry, so the fallback should still send the post-grace SIGKILL to the process group (or otherwise reap the group) even if the leader has exited.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T09:00:11Z

+                return
+
+            if on_heartbeat is not None:
+                on_heartbeat(f"Downloading ({transport} transport)...")


Isolate status hook failures from the watchdog

If a caller-supplied on_status/heartbeat hook raises here (for example because a Studio client disconnected), the exception terminates the daemon watchdog thread before it can set the stalled event. In that environment a genuinely hung Xet child can remain supervised only by the parent poll loop and never trigger the HTTP retry, so nonessential status callbacks should be wrapped/logged instead of being allowed to kill stall detection.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T09:00:11Z

+    if cache_dir is not None:
+        root = Path(cache_dir)


Normalize cache_dir before scanning it

When callers pass a custom cache_dir that needs user expansion or resolution, such as ~/hf-cache, Hugging Face expands and resolves that path before writing, but this cache-state code scans the literal Path(cache_dir). In that scenario the child writes the expanded cache while the watchdog reports (0, False) and HTTP prep misses the partial, so a stalled Xet download under that custom cache never triggers or cleans up the fallback.

Useful? React with 👍 / 👎.

…(review 4 P2) - _terminate_process_group always sends the post-grace SIGKILL to the whole process group, even when the Python leader already exited on SIGTERM, so a surviving Xet helper cannot keep the stalled writer alive during HTTP cleanup. - Watchdog status/heartbeat callbacks are wrapped: a raising on_status (e.g. a disconnected Studio client) no longer kills the daemon thread and stops stall detection for a genuinely hung child. - hf_cache_root expands ~ in a custom cache_dir (as huggingface_hub does on write), so a stall under e.g. ~/hf-cache is still seen and cleaned. - Added regression tests (34 passed, 1 skipped).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27c2795880

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-25T10:12:34Z

+            if on_status is not None:
+                on_status(f"{label}: Xet stalled, retrying over HTTP")


Guard the retry status callback

When a stalled Xet attempt reaches the HTTP fallback, this direct on_status(...) call can raise (for example if a Studio client disconnected) and abort the retry path before disable_xet is set for the next attempt. The heartbeat callback is now isolated with _safe_status, but this remaining direct call still lets a nonessential status hook turn an otherwise recoverable Xet stall into a failed download.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T10:12:34Z

+def hf_hub_download_with_xet_fallback(
+    repo_id: str,
+    filename: str,
+    token: Optional[str],


Default the optional token argument

When callers use this single-file helper like Hugging Face's hf_hub_download for a public repo and omit token, the current signature raises TypeError before it can hit the cache or start the supervised download. The snapshot helper already defaults token=None; doing the same here keeps unauthenticated single-file downloads from requiring a dummy third argument.

Useful? React with 👍 / 👎.

The Xet fallback runs the download in a spawn child -- a fresh interpreter that re-imports unsloth_zoo.hf_xet_fallback and so re-runs unsloth_zoo/__init__.py, which on any non-MLX host imports torch + transformers and does device init the child never uses (~5-7s per download child). The cache/download submodules only need stdlib + huggingface_hub. Reuse the existing MLX lightweight-import path: __init__ now also honors an opt-in UNSLOTH_ZOO_DISABLE_GPU_INIT=1 (off by default, so normal CUDA/CPU runs are byte-for-byte unchanged), and the parent sets it in the child's environment only around spawning the download child (restored immediately, like the transport env). The unconditional HF cache redirect still runs, so the child writes the same cache. MLX stub injection is now gated on actual MLX, not the generic skip flag. Verified: normal import still runs full init (DEVICE_TYPE=cuda, torch+transformers imported); with the flag the helper imports with torch/transformers absent; a real download through the spawn child completes with zero heavy-init banners (child stays light) and the parent env is restored. 35 passed, 1 skipped.

danielhanchen · 2026-06-25T12:38:52Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e93e669ee6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-25T12:50:22Z

+        if main_module is not None:
+            main_file = getattr(main_module, "__file__", None)
+            if not main_file or str(main_file).startswith("<"):
+                saved_main_file = main_file
+                main_module.__file__ = __file__


Avoid importing caller scripts in spawn children

When this wrapper is invoked from a regular Python script at module top level, __main__.__file__ is a real path, so this branch leaves it unchanged. The spawn child then imports that user script as __mp_main__, re-executes the top-level download/from_pretrained call during multiprocessing bootstrap, and exits with the standard “start a new process before bootstrapping” error; the parent then reports exited ... without a result instead of downloading. This breaks the common unguarded script pattern the comment says this workaround should handle, so the helper should avoid re-opening the caller script even when __file__ is a normal path.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T12:50:22Z

+                cache_dir = cache_dir,
+                allow_patterns = allow_patterns,
+                ignore_patterns = ignore_patterns,
+                local_files_only = True,


Verify cached snapshots before the fast path

When a previous snapshot download was interrupted after Hub wrote refs/<branch> and created snapshots/<sha>, snapshot_download(..., local_files_only=True) returns that directory without checking that all allowed files are present. This fast path therefore skips the killable child even while the cache is partial or has stale incomplete state, so from_pretrained can proceed with missing model files instead of completing the download; validate the snapshot or active incomplete state before returning here.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T12:50:22Z

+    # spawning that child, never for a training/inference process. The
+    # unconditional HF cache redirect above still runs, so the child writes to the
+    # same cache as the parent.
+    _SKIP_GPU_INIT = os.environ.get("UNSLOTH_ZOO_DISABLE_GPU_INIT", "0") == "1"


Allow the shared downloader to import standalone

When Studio or a lightweight utility imports unsloth_zoo.hf_xet_fallback directly in an environment without the unsloth package, package import still leaves _SKIP_GPU_INIT false unless the caller already set this child-only env var, so the later find_spec("unsloth") guard raises before the new shared module can load. That makes the advertised standalone/Studio import path unusable unless consumers know a private pre-import knob; select the lightweight path for these cache/download modules without requiring that external setup.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T12:50:22Z

+            if on_status is not None:
+                on_status(f"{label}: Xet stalled, retrying over HTTP")


Isolate stall status callback failures

When a transfer stalls, this direct on_status call runs outside _safe_status. If a UI/status callback raises at that moment (for example a disconnected Studio client, which the watchdog heartbeat path already tolerates), the exception escapes before disable_xet = True, so the HTTP retry is skipped and a recoverable Xet stall fails; wrap this notification with the same safe callback handling used for heartbeats.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T12:50:22Z

+                            blob.unlink()
+                        except OSError:


Do not resume HTTP when cleanup failed

When a stalled Xet partial cannot be removed (for example a locked file on Windows or a permission error), this branch suppresses the unlink failure and the caller still proceeds to the HTTP retry. The surrounding comments note that resuming HTTP over a sparse Xet/hf_transfer partial can silently corrupt the blob, so the retry should not continue unless the unsafe partial was actually cleared.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T12:50:22Z

+    if hf_token:
+        out = out.replace(hf_token, "***")


Handle boolean HF tokens while scrubbing errors

Hugging Face callers commonly pass token=True to use the configured token; if such a download fails in the child, this scrubber treats the boolean as a replacement string and raises TypeError, so the child exits without putting the real auth/network error on the queue and the parent reports a misleading “exited without a result”. Guard this replacement to string tokens only.

Useful? React with 👍 / 👎.

…ety, robust callbacks/tokens P1: - Spawn child now always repoints __main__.__file__ at this side-effect-free module (not only for stdin/notebook callers). An unguarded top-level script (python script.py with a bare from_pretrained) previously had its file re-imported as __mp_main__ in the child, re-running the download and failing with the multiprocessing bootstrapping error; now the child imports this module instead and downloads. Verified end-to-end with a real unguarded script. - snapshot fast path no longer short-circuits a cached-but-incomplete snapshot (an interrupted download leaving .incomplete blobs / broken symlinks): it checks has_active_incomplete_blobs and completes the download in the killable child rather than letting from_pretrained load missing files. P2: - Xet to HTTP retry status callback wrapped in _safe_status so a raising on_status (disconnected client) cannot abort the recoverable retry. - hf_hub_download_with_xet_fallback defaults token to None (parity with hf_hub_download). - If an unsafe partial cannot be cleared before HTTP, force a clean re-download instead of an unsafe resume over a sparse partial. - _default_scrub_secrets tolerates token=True (no TypeError in the child scrubber). - Docstring: state the package imports with unsloth installed; the spawn child uses the UNSLOTH_ZOO_DISABLE_GPU_INIT lightweight path. 41 passed, 1 skipped.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2300e5580

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-25T14:54:04Z

+            repo_type = repo_type,
+            token = token,
+            revision = params.get("revision"),
+            cache_dir = params.get("cache_dir"),


Normalize snapshot cache_dir before spawning

When snapshot_download_with_xet_fallback is called with a cache_dir containing ~, the watchdog and HTTP cleanup scan Path(cache_dir).expanduser() via hf_cache_root, but the child passes the original value to snapshot_download here. In the supported huggingface_hub code path, snapshot_download builds its storage folder from that string without expanding it, so a stalled snapshot download writes under the literal ~/... path while the parent watches $HOME/... and never trips the Xet fallback; normalize the cache_dir once and use the same value for both the child and cache-state probes.

Useful? React with 👍 / 👎.

The snapshot fast path and the incomplete-state detector only inspected the newest snapshot by mtime. A caller requesting an older revision whose snapshot had a dangling symlink (an interrupted download) while a newer revision was clean would read the repo as healthy and load with missing files. - hf_cache_state.py: add snapshot_dir_has_broken_symlinks(snapshot_dir) and make the broken-symlink detector iterate every snapshot, not just the latest. - hf_xet_fallback.py: validate the exact returned revision dir in the local_files_only fast path; clear broken symlinks across all snapshots during HTTP prep; write the forced-stall test partial under the caller's cache_dir and the repo_type-correct dir name so the watchdog actually sees it. - tests: add regression coverage for the per-snapshot primitive, broken older snapshot detection, fast-path rejection of a broken requested revision, and all-snapshot symlink cleanup.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a04a983576

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-25T15:36:31Z

+    if len(entries) <= 1:
+        return entries


Reject folded-only cache matches

When downloading Org/Repo on a case-sensitive cache that only has models--org--repo from another repo/call, this branch returns that sole folded match because len(entries) <= 1. The watchdog/HTTP prep for Org/Repo then counts or deletes the lowercase repo's .incomplete partial before HF has created the exact models--Org--Repo directory, so a stale case-colliding cache can still disrupt the wrong download. Fresh evidence in the current patch is that the new exact-match helper still accepts a folded-only match here; it should require entry.name == target for Hub cache dirs.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T15:36:31Z

+            if (
+                not snapshot_dir_has_broken_symlinks(Path(cached_dir))
+                and not has_active_incomplete_blobs(repo_type, repo_id, cache_dir = cache_dir)
+            ):


Verify absent snapshot files before returning cache

When an interrupted snapshot wrote refs/<rev> and the snapshot directory but died before creating a pointer/symlink for one allowed file, snapshot_download(..., local_files_only=True) returns the snapshot folder; these checks pass because there is no dangling symlink and no remaining .incomplete. This still short-circuits to a partial snapshot, so later from_pretrained can fail or download in-process instead of under the fallback. Fresh evidence is that the new guard only validates broken symlinks/partials, not missing expected files.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T15:36:31Z

+        if main_module is not None:
+            saved_main_file = getattr(main_module, "__file__", _UNSET)
+            main_module.__file__ = __file__


Clear spec before spawn

When the caller is launched with python -m train or any __main__ that has __spec__.name, multiprocessing's spawn preparation uses that module name in preference to __file__. This assignment therefore does not repoint the child; it re-imports the user's module as __mp_main__, re-running top-level from_pretrained and hitting the bootstrapping RuntimeError instead of reaching _download_child_entry. Clear or replace __spec__ as well, or avoid spawn importing __main__.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T15:36:31Z

+def snapshot_download_with_xet_fallback(
+    repo_id: str,
+    *,
+    revision: Optional[str] = None,
+    token: Optional[str] = None,


Honor local-only warmups

When a from_pretrained caller sets local_files_only=True (common for offline runs), this wrapper has no way to receive or forward that setting. If the snapshot is not fully cached, it falls through to _download_with_xet_fallback and starts a network-capable child download instead of matching Hugging Face/Transformers local-only semantics, or a caller that forwards the kwarg gets TypeError; add and honor local_files_only before spawning.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T15:36:31Z

+            cached = try_to_load_from_cache(
+                repo_id, filename, repo_type = repo_type, revision = revision, cache_dir = cache_dir
+            )


Expand cache_dir before probing file cache

When cache_dir is passed as ~/hf-cache, a finalized file already present in the expanded cache is missed here because try_to_load_from_cache joins the string directly, while the child hf_hub_download expands/resolves it before writing. Warm single-file loads then spawn a child and may perform Hub metadata/network work instead of the documented no-child/no-network cache short-circuit; normalize the cache dir before this probe and reuse the same value for the attempt.

Useful? React with 👍 / 👎.

danielhanchen · 2026-06-26T03:15:55Z

/gemini review

danielhanchen · 2026-07-02T09:52:31Z

One of the four is real and fixed in 61a629a; the other three I did not take (the license one per maintainer direction).

Fixed:

Retry transient LocalEntryNotFoundError cases (P2): huggingface_hub raises LocalEntryNotFoundError both for a genuine offline / uncached miss AND as its wrapper around a transient HEAD connection error / timeout for an uncached file ("... Please check your connection and try again"). The classifier matched the class NAME against the deterministic set before the transient-hint check, so the transient connection wrapper (which a raw ConnectionError with the same message would retry) was surfaced immediately instead of taking the HTTP retry. _is_retryable_download_error now retries a LocalEntryNotFoundError only when its message carries a transient hint (connection / timeout / reset / 5xx ...); a true offline miss still falls through to the deterministic set, stays non-retryable, and keeps its reconstructed type across the spawn boundary. Watchdog / spawn / retry-loop mechanism untouched.

Not taken:

Require all declared Diffusers component weights (P2): a static "weight-bearing component" heuristic false-rejects a complete download whenever a component is disabled at the repo level (safety_checker: [null, null]) or optional -- I verified it rejects a real StableDiffusionPipeline that legitimately omits the safety_checker. POST-download stays lenient by design: snapshot_download already guarantees manifest completeness on the normal path, and "at least one declared component weight present" already separates a real pipeline warm from a config-only / checkpoint-only / undeclared-leftover stale cache.
Keep broad globs from satisfying root model warms (P2): the loaders warm a bare model with allow_patterns=None (routed through the canonical-scoped _root_model_has_weight), never a broad ['*.safetensors'] glob. A broad glob's load intent is ambiguous (adapter / subfolder / diffusers are all valid), so requiring a canonical root weight would false-reject those legitimate broad-glob downloads.
Align package license metadata (P2): the AGPL-3.0-only headers on the new files are intentional and the header already documents the split from the LGPL package; reconciling the package-level license expression is a maintainer decision, out of scope for this PR.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61a629a050

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-02T10:08:48Z

+            cached = try_to_load_from_cache(
+                repo_id, probe_filename, repo_type = repo_type, revision = revision, cache_dir = cache_dir
+            )
+            if isinstance(cached, str) and os.path.exists(cached):
+                return cached


Don't skip online checks for mutable file revisions

When revision is omitted or is a branch/tag, try_to_load_from_cache only resolves the locally cached ref and this returns without contacting the Hub. Unlike hf_hub_download, this bypasses the online HEAD/ETag check for mutable refs, so after main or a tag moves callers can keep receiving a stale file instead of refreshing it under the supervised child; restrict this fast path to immutable commit hashes or perform the ref check in the child.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-02T10:08:48Z

+    return all(
+        any(candidate in present for candidate in candidates) for candidates in groups.values()
+    )


Treat exact allow patterns as filters after download

When allow_patterns enumerates optional exact files, such as ['tokenizer.json', 'tokenizer_config.json'] for a repo that only publishes one of them, snapshot_download succeeds because the allow list filters the repo contents rather than requiring every pattern to exist. This all(...) turns that valid subset into an incomplete result, so the wrapper retries over HTTP and then raises DownloadStallError even though both transports produced the complete filtered snapshot.

Useful? React with 👍 / 👎.

danielhanchen · 2026-07-02T12:19:12Z

Neither of these two is a bug I'm taking; reasoning below.

Don't skip online checks for mutable file revisions (P2): this is the single-file twin of the mutable-revision report on the snapshot path from the previous review, and I'm not taking it for the same reason. The try_to_load_from_cache short-circuit (like the snapshot local_files_only=True fast path) is an intentional offline-capable warm-cache resolution that avoids a per-load child spawn / network round-trip for an already-finalized blob. Gating it on immutable commit hashes would force a child spawn -- and, offline, a regression -- for the common mutable-ref (main/tag) case even when the file is already warm, which the design deliberately avoids. A moved main/tag is resolved by the caller's own hf_hub_download / from_pretrained (which performs the online HEAD), matching Hugging Face's own local-cache resolution semantics; the wrapper's job is to make the killable-child download available on a miss, not to re-implement ref freshness on the warm path.
Treat exact allow patterns as filters after download (P2): correct in isolation -- _requested_exact_files_present_grouped requires each distinct exact-named file, so a multi-file exact list for a repo that publishes a subset would report incomplete. But it does not fire for any warm the loaders generate: every allow list they build includes a glob in the aux patterns (*.py, *.tiktoken, additional_chat_templates/*.jinja), and the check short-circuits to True as soon as any pattern is a glob (I verified both: the all-exact subset returns incomplete, the same list plus a glob returns complete). For a general all-exact list, requiring each named file is the intended guard against a silent-stale config-only snapshot: a requested file that was never fetched leaves no dangling symlink, so this all-present check is its only guard, and relaxing it to a pure filter would let that stale case through. Interchangeable weights are already grouped (the either-format pair needs any one). So this is deliberate all-present semantics for an exact-named subset, not a path the loaders can hit.

danielhanchen · 2026-07-02T20:48:24Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61a629a050

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-02T21:04:42Z

+    if "pytorch_model.bin" in root_files and _format_kept("pytorch_model.bin"):
+        return True


Require safetensors before skipping supervised download

When a previous cache contains only pytorch_model.bin for a repo that also publishes model.safetensors, this returns True and lets snapshot_download_with_xet_fallback hand the cached snapshot back without running the killable download child. A subsequent default Transformers load probes model.safetensors before falling back to .bin, so it can download the missing preferred file through the normal in-process Hub path and reintroduce the unkillable Xet hang this fallback is meant to avoid. Since the local cache alone cannot prove the safetensors file is absent remotely, a bin-only snapshot should not satisfy the conservative fast path.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-02T21:04:42Z

+            rels.append(rel)
+    except OSError:
+        return False
+    return bool(_filter_paths(rels, None, ignore_patterns))


Require declared Diffusers components before accepting

When snapshot_download returns a stale Diffusers snapshot after a transient Hub error, this accepts the result as soon as any declared component weight is present. For example, if model_index.json declares weight-bearing components like unet and vae but the cache only contains unet/diffusion_pytorch_model.safetensors, _download_result_usable returns true and DiffusionPipeline.from_pretrained can then fetch the missing component weights in-process, potentially through the unkillable Xet path the fallback is trying to avoid. The post-download check should reject partial declared component sets instead of treating one component as proof that the pipeline is warm.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-02T21:04:42Z

+                continue
+            try:
+                if entry.is_file():
+                    weights.append(entry.relative_to(snapshot_dir).as_posix())


Reject noncanonical weights for broad patterned warms

When callers use broad allow patterns such as ['*'] or ['*.safetensors'] to warm a default model, a stale snapshot containing only a selected noncanonical weight (for example adapter_model.safetensors, consolidated.safetensors, or a GGUF file) is appended here and then makes _download_result_usable return true. A subsequent default from_pretrained does not read those files as the base model weight and can fetch the missing canonical model.safetensors/pytorch_model.bin in-process over Xet, so broad patterned requests need the same canonical-weight guard used by the unpatterned path.

Useful? React with 👍 / 👎.

…fetensors is preferred A default transformers load probes model.safetensors BEFORE pytorch_model.bin, but the pre-download fast path treated a cache holding only pytorch_model.bin as complete and handed it back without the killable child. For a repo that ALSO publishes model.safetensors (which the local cache cannot rule out), the in-process load then fetches the preferred model.safetensors over un-killable Xet -- the exact hang this fallback exists to prevent. The strict pre gate (_cache_can_skip_download) now passes prefer_safetensors=True to snapshot_dir_is_complete: a .bin weight satisfies the fast path only when safetensors is IGNORED (use_safetensors=False reads .bin), otherwise a bin-only cache defers to the child, which does the authoritative Hub check. A safetensors cache, a dual-format cache, and a use_safetensors=False bin cache all still fast-path, so the common load is unaffected. The flag is pre-only. The POST shard check (_has_incomplete_canonical_root_shards -> snapshot_dir_is_complete) leaves it False, so a finished bin-only download -- a genuinely bin-only repo -- is still accepted and never false-rejected into a DownloadStallError (POST stays lenient; a bin-only cache is not positive breakage evidence there). Adds a regression test (184 -> 185). Verified against the safety-invariant fuzz (0 false-accepts) and the stall->HTTP e2e sim. Not taken (same review): - Require ALL declared Diffusers components: a static weight-bearing-component heuristic false-rejects a complete download when a component is disabled at the repo level (safety_checker: [null, null]) or optional; POST stays lenient and one declared component weight already separates a real warm from a config-only / undeclared-leftover stale cache. - Reject noncanonical weights for broad patterned warms: a base load passes allow_patterns=None (routed through the canonical-scoped _root_model_has_weight, which this change makes safetensors-aware), never a broad ['*'] / ['*.safetensors'] glob; a broad glob's intent is ambiguous (adapter / subfolder / diffusers), so requiring a canonical root weight would false-reject those legitimate broad-glob downloads.

danielhanchen · 2026-07-03T06:16:43Z

Thanks for the review. Addressed in 26bd3d56.

Fixed (finding 1, real strict-PRE hang): bin-only cache when safetensors is preferred.
A default transformers load probes model.safetensors before pytorch_model.bin, but the pre-download fast path treated a cache holding only pytorch_model.bin as complete and loaded in-process. For a repo that also publishes model.safetensors (which the local cache cannot rule out), that in-process load then fetches the preferred safetensors over un-killable Xet, exactly the hang this fallback exists to prevent. The strict pre gate (_cache_can_skip_download) now passes prefer_safetensors=True to snapshot_dir_is_complete: a .bin weight satisfies the fast path only when safetensors is ignored (use_safetensors=False), otherwise a bin-only cache defers to the killable child, which does the authoritative Hub check. A safetensors cache, a dual-format cache, and a use_safetensors=False bin cache all still fast-path, so the common load is unaffected. The flag is pre-only: the POST shard check leaves it False, so a genuinely bin-only repo's finished download is still accepted and never false-rejected into a DownloadStallError. Added a regression test (185 pass); safety-invariant fuzz stays at 0 false-accepts and the stall to HTTP e2e sim stays green.

Not taken (finding 2): require all declared Diffusers components.
A static weight-bearing-component heuristic false-rejects a genuinely complete download: a component can be disabled at the repo level (safety_checker: [null, null] ships no weights) or be optional, and requiring its weights fails a download that correctly omitted them. The POST path stays lenient by design, and one declared component weight already separates a real warm from a config-only or undeclared-leftover stale cache.

Not taken (finding 3): reject noncanonical weights for broad patterned warms.
A base load passes allow_patterns=None (routed through the canonical-scoped _root_model_has_weight, which this change makes safetensors-aware), never a broad ['*'] / ['*.safetensors'] glob. A broad glob's intent is ambiguous (adapter / subfolder / diffusers), so requiring a canonical root weight there would false-reject those legitimate broad-glob downloads. No loader reaches the gate with a broad glob for a base load.

danielhanchen · 2026-07-03T06:22:11Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26bd3d56f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-03T06:35:08Z

+            rels.append(rel)
+    except OSError:
+        return False
+    return bool(_filter_paths(rels, None, ignore_patterns))


Require missing Diffusers components before accepting cache

When model_index.json declares multiple weight-bearing components, this accepts the snapshot as soon as any declared component weight is present. For example, a stale cache from a prior vae/*-only download with unet also declared but absent passes _download_result_usable, so the later DiffusionPipeline.from_pretrained still has to fetch the missing unet weights in-process over Xet, bypassing the stall fallback this code is meant to enforce. Please verify the declared weight-bearing components needed by the pipeline, not just that one declared component has a weight.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T06:35:08Z

+        # a numbered shard of the read (preferred) format with NO index in its directory: the load cannot
+        # enumerate the set and would fetch the index + remaining shards over Xet.
+        preferred = "safetensors" if "safetensors" in fmts else "bin"
+        if preferred not in index_fmts.get(dir_rel, set()):


Match shard indexes to the selected shard set

This check keys index_fmts only by directory and format, so any safetensors index in the same directory can mask a selected lone shard even when it belongs to a different weight. For example, with allow_patterns=['adapter_model*'], a cache containing adapter_model-00001-of-00002.safetensors but no adapter_model.safetensors.index.json is accepted if an unrelated safetensors index is present, and the subsequent load will fetch the missing adapter index/shards in-process over Xet instead of using the fallback.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T06:35:08Z

+    if not allow or any(not isinstance(p, str) or _has_glob(p) for p in allow):
+        return True


Require exact files even when allow_patterns has globs

When an allow list mixes a glob with exact required files, this returns early and never verifies the exact names. A common adapter warm like allow_patterns=['adapter_model*', 'adapter_config.json'] is accepted with only adapter_model.safetensors on disk, so the later PEFT load has to fetch the missing adapter_config.json in-process instead of going through the watched fallback.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T06:35:08Z

+                continue
+    except OSError:
+        return False
+    return bool(_filter_paths(weights, allow_patterns, ignore_patterns))


Distinguish root model globs from adapter weights

For a root-level glob such as allow_patterns=['*.safetensors'], this treats any selected loadable weight as satisfying the default model load. If a stale cache contains only adapter_model.safetensors, _download_result_usable returns true even though from_pretrained will still probe model.safetensors / pytorch_model.bin and fetch the missing base weight in-process over Xet; root-model glob requests need to require a canonical root weight rather than any selected weight.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T06:35:08Z

+                # No psutil / /proc: fall back to following only newly-created (post-baseline) partials.
+                owned = {name: n for name, n in sizes.items() if name not in baseline}
+            return (sum(owned.values()), len(owned) > 0)
+        return get_hf_download_state(repo_ids, repo_type = repo_type, cache_dir = cache_dir)


Scope snapshot stall progress to the child process

Snapshot attempts fall back to the repo-wide cache measurement here, so a concurrent same-repo download that is still writing any other blob keeps changing current_size and resets the stalled snapshot child's timer. In that environment the child with its own unchanged .incomplete can hang indefinitely without switching to HTTP until the sibling stops; the snapshot watchdog should use the same child-owned partial scoping as single-file downloads, or otherwise exclude sibling progress.

Useful? React with 👍 / 👎.

danielhanchen · 2026-07-03T07:19:26Z

Thanks for the pass. Checked all five against the actual loader allow-lists (maybe_prefetch_hf_snapshot) and real HF layouts; none holds up, so no change on 26bd3d56. Details, with the empirical result each rests on:

1. Require missing Diffusers components (POST) - not taken.
model_index.json declares weightless components right alongside weight-bearing ones. A standard StableDiffusionPipeline declares unet/vae/text_encoder (weights) and scheduler/tokenizer/feature_extractor (configs only) and safety_checker: [null, null] (disabled). Requiring a weight per declared component false-rejects that complete pipeline on all four config-only components, and there is no static way to tell a required weight-bearing component from a declared-but-weightless one without instantiating the pipeline class. POST is the lenient silent-stale backstop (the child's snapshot_download already did the authoritative manifest compare); one declared component weight already separates a real warm from a config-only / undeclared-leftover stale cache. The vae-only-vs-unet example is two different downloads: a vae/*-scoped request accepting a vae-only result is correct, and a later full-pipeline load runs its own pre-download gate.

2. Match shard indexes to the selected shard set - not taken (not reachable).
The only same-directory index in a real layout is the root model index (model.safetensors.index.json / pytorch_model.bin.index.json), and it is already excluded from index_fmts (the _ROOT_MODEL_SHARD_INDEX_RE skip), so a lone adapter_model-00001-of-00002.safetensors without its index is flagged incomplete. Reproduced: with a root model index co-resident, _selected_shard_index_incomplete returns True. Masking only occurs if a differently-based non-root index (foobar.safetensors.index.json) sits in the same dir as the lone shard, which no real repo ships. An adapter/component dir holds exactly one weight base, so there is no second index to mask it.

3. Require exact files even when allow_patterns has globs - not taken.
The loader allow-list is a best-effort superset, not a manifest: the adapter warm is ["adapter_config.json", "adapter_model*", ...ROOT_AUX...], where ROOT_AUX lists ~25 optional aux names (generation_config.json, added_tokens.json, merges.txt, spiece.model, vocab.txt, ...) most repos never ship. Reproduced: requiring the exact names present false-rejects a complete adapter cache on 11 absent optional files, turning a good download into a DownloadStallError. That is why the exact-file check stays lenient once a glob is present. The shared helper is generic (also used by Studio) and cannot encode "adapter_config.json is mandatory but generation_config.json is optional"; the weight the load reads (the real stall risk) is already verified by _selected_readable_weight_complete.

4. Distinguish root model globs from adapter weights - not taken (not reachable).
No loader emits a bare ['*.safetensors'] for a base-model load. A bare from_pretrained passes allow_patterns=None (routed through the canonical-scoped root-weight check); only gguf / tokenizer / adapter / subfolder warms are patterned, and each is scoped (adapter_model*, <subfolder>/*, the gguf path). A broad root glob whose intent is "the base model" is never generated, so requiring a canonical root weight there would only false-reject legitimate broad-glob warms without protecting any real load.

5. Scope snapshot stall progress to the child process - not taken.
This is the documented dual-mode choice: a snapshot pull is every blob of the repo, so the repo-wide measurement tracks that one pull's progress; single-file downloads scope to the child's own partial precisely because a repo can hold unrelated siblings. The failure described needs two concurrent snapshot pulls of the same repo in separate processes, each running this fallback - uncommon, and HF's per-repo cache locks largely serialize same-repo pulls. Switching snapshots to child-pid fd scoping risks the opposite and worse failure: if any in-progress blob is briefly not among the child's open fds, a genuinely-progressing snapshot reads as stalled and a working snapshot_download gets killed and retried over HTTP. The repo-wide measure reliably observes progress across hf_xet's internal workers, which is the common single-process case the watchdog must never misfire on.

danielhanchen · 2026-07-03T07:44:21Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26bd3d56f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-03T08:00:51Z

+    rels: list = []
+    try:
+        for entry in snapshot_dir.rglob("*"):
+            if not _is_default_load_weight_file(entry.name):


Exclude variant component weights from plain diffusers checks

When a previous variant='fp16' diffusers download left only files such as unet/diffusion_pytorch_model.fp16.safetensors, a later plain warmup with variant=None accepts that snapshot because this check only verifies the suffix. _download_result_usable can then return true even though the plain pipeline load needs the non-variant component weight, so the subsequent in-process load may fetch the missing diffusion_pytorch_model.safetensors through Xet instead of retrying over HTTP.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T08:00:51Z

+            if not _is_loadable_weight_file(entry.name):
+                continue
+            try:
+                if entry.is_file():
+                    weights.append(entry.relative_to(snapshot_dir).as_posix())


Ignore training checkpoints for patterned model warms

When a patterned model request is broad enough to include training checkpoints, such as allow_patterns=['*'] or ['*.bin'], this loop treats checkpoint-500/pytorch_model.bin as the selected readable model weight. Those checkpoint subtrees are not read by the subsequent base from_pretrained, so a stale checkpoint-only snapshot can pass _download_result_usable and then fetch the missing root weights in-process via Xet instead of retrying over HTTP.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T08:00:51Z

+            base = index_name.split(marker, 1)[0]
+            if not base:
+                return None
+            prefix = f"{dir_rel}/" if dir_rel else ""
+            return f"{prefix}{base}-00001-of-00002.{ext}"


Preserve variant tokens in malformed-index probes

For variant shard indices like diffusion_pytorch_model.safetensors.index.fp16.json, this builds a probe without the fp16 token. With an allow pattern such as unet/*fp16*, a truncated or malformed variant index is treated as out of scope; if one matching shard is present, the snapshot can be accepted even though the loader will read the broken index or missing shards in-process rather than retrying over HTTP.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T08:00:51Z

+    if _patterns_are_exact_names(allow_patterns):
+        return False  # an exact-named subset defers to the exact-file presence check


Validate exact sharded weight requests

When allow_patterns is an exact list, this returns before checking shard indices. An exact request that includes a sharded weight index plus only some of the listed shards can therefore pass _download_result_usable as soon as the named files exist, even though the loader will read the index and fetch the missing shards in-process through Xet instead of retrying the incomplete snapshot over HTTP.

Useful? React with 👍 / 👎.

…oad check _has_diffusers_component_weight counted ANY loadable weight in a declared component subfolder, including a variant one (unet/diffusion_pytorch_model.fp16.safetensors). So a PLAIN diffusers warm (variant=None) whose silently-stale snapshot_download returned a cache left by a prior variant='fp16' pull passed _download_result_usable, and the in-process DiffusionPipeline load -- which reads the NON-variant diffusion_pytorch_model.safetensors -- then fetched the missing plain weight over un-killable Xet, the hang this fallback exists to prevent. The plain component-presence check now counts only CANONICAL (non-variant) component weights via _CANONICAL_COMPONENT_WEIGHT_RE (a base with no intermediate dotted token, single or numbered shard, safetensors/bin). This mirrors the checks that were already variant-aware: the root check (_CANONICAL_ROOT_MODEL_WEIGHT_RE excludes model.fp16.safetensors) and the plain component SHARD regex in _diffusers_component_shards_incomplete. A complete plain pipeline, a pipeline shipping both plain and variant weights, and the variant='fp16' warm of the same cache are all still accepted -- the variant analog (_has_diffusers_component_variant_weight) is untouched, so no complete download is false-rejected into a DownloadStallError. Adds a regression test (185 -> 186). Safety-invariant fuzz stays at 0 false-accepts; stall->HTTP e2e sim stays green. Not taken (same review): - Ignore training checkpoints for patterned model warms (allow=['*'] / ['*.bin']): not loader-reachable -- a base load passes allow_patterns=None (root-scoped, checkpoint subtrees already excluded), never a broad ['*'] / ['*.bin']; the patterned warms are gguf / adapter / subfolder-scoped. - Preserve variant tokens in malformed-index probes (allow=['unet/*fp16*']): not loader-reachable -- a variant diffusers warm passes allow=None or the subfolder ['unet/*'] glob, both of which already flag a malformed variant component index (the probe survives an allow=None or 'unet/*' filter); no loader emits a variant-infix glob. - Validate exact sharded weight requests: not loader-reachable -- every loader warm carries a glob (*.py, ...), so _patterns_are_exact_names is never true for a loader; a direct caller that lists an index but only some of its shards is honoring exactly the files it named.

danielhanchen · 2026-07-03T08:10:47Z

Thanks for the pass. Checked all four; one was real and is fixed in 21fcb558, the other three are not reachable from any loader.

Fixed (finding 1): variant component weights leaking into the plain Diffusers check.
_has_diffusers_component_weight counted any loadable weight in a declared component subfolder, including a variant one (unet/diffusion_pytorch_model.fp16.safetensors). So a plain warm (variant=None) whose silently-stale snapshot_download returned a cache left by a prior variant='fp16' pull passed _download_result_usable, and the in-process DiffusionPipeline load (which reads the non-variant diffusion_pytorch_model.safetensors) then fetched the missing plain weight over un-killable Xet. The plain presence check now counts only canonical (non-variant) component weights via _CANONICAL_COMPONENT_WEIGHT_RE, mirroring the checks that were already variant-aware: the root check (_CANONICAL_ROOT_MODEL_WEIGHT_RE excludes model.fp16.safetensors) and the plain component shard regex. Reproduced: the variant-only cache is now rejected for a plain load, while a complete plain pipeline, a pipeline shipping both plain and variant, and the variant='fp16' warm of the same cache all stay accepted (no false-reject). Regression test added (186 pass); safety-invariant fuzz stays at 0 false-accepts.

Not taken (finding 2): ignore training checkpoints for patterned model warms.
Reachability requires a base-model request with allow_patterns=['*'] or ['*.bin']. No loader emits that: a bare from_pretrained passes allow_patterns=None (routed through the root-scoped canonical check, which already excludes checkpoint-N/ subtrees), and the only patterned warms are gguf / adapter / subfolder-scoped. A broad glob that pulls checkpoint weights for a base load is never generated, so _has_selected_weight counting a checkpoint weight is not on any real load path.

Not taken (finding 3): preserve variant tokens in malformed-index probes.
The miss needs allow_patterns=['unet/*fp16*'] (a variant-infix glob). No loader emits one: a variant Diffusers warm passes allow_patterns=None or the subfolder glob ['unet/*'], and both already flag a malformed variant component index. Reproduced: with a malformed unet/diffusion_pytorch_model.safetensors.index.fp16.json, the allow=None path (_diffusers_component_shards_incomplete) filters the probe against None and the ['unet/*'] path matches the variant-stripped probe against unet/*, so both return incomplete. Only a hand-crafted *fp16* allow slips through, which no load path constructs.

Not taken (finding 4): validate exact sharded weight requests.
The exact fast-path (_patterns_are_exact_names -> skip the shard check) is only hit by a pure exact allow list. Every loader warm carries a glob (*.py, additional_chat_templates/*.jinja, adapter_model*, ...), so _patterns_are_exact_names is never true for a loader-generated request. A direct API caller that explicitly lists a shard index plus only some of its shards is requesting exactly those files -- snapshot_download fetches only what is named, and honoring that exact scope is correct rather than a bug.

danielhanchen · 2026-07-03T08:16:17Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21fcb5580d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-03T08:30:06Z

+                repo_id, probe_filename, repo_type = repo_type, revision = revision, cache_dir = cache_dir
+            )
+            if isinstance(cached, str) and os.path.exists(cached):
+                return cached


Refresh mutable file revisions before returning cache

When revision is omitted or is a branch/tag, try_to_load_from_cache only resolves the local cached ref and does not perform the HEAD/etag check that hf_hub_download normally does. Returning here means a previously cached main file is reused forever after the Hub repo moves, unless the caller knows to pass force_download=True, so users can silently load stale weights/configs.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T08:30:06Z

+                ignore_patterns = ignore_patterns,
+                variant = variant,
+            ):
+                return cached_dir


Refresh mutable snapshots before taking the fast path

For the common revision=None/main case, this local_files_only=True probe validates only the cached snapshot currently recorded in refs/main; it never checks whether the remote branch now points at a newer commit as snapshot_download normally would. A complete old snapshot therefore bypasses the child download and users keep loading stale model contents until they force a download.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T08:30:06Z

+        logical = _exact_weight_logical(base)
+        groups.setdefault((parent, logical), []).append(rel)
+    return all(
+        any(candidate in present for candidate in candidates) for candidates in groups.values()


Allow unmatched exact snapshot filters

When allow_patterns is an exact list, Hugging Face treats those names as filters, not required files: an unmatched pattern is fine if the repo simply does not contain that optional file. This check instead requires every exact non-weight-equivalent name to be present after a successful child download, so a request such as ['tokenizer.json', 'tokenizer.model'] against a repo that only has one tokenizer file is retried over HTTP and then fails with DownloadStallError despite the snapshot being valid.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T08:30:06Z

+    # spawning that child, never for a training/inference process. The
+    # unconditional HF cache redirect above still runs, so the child writes to the
+    # same cache as the parent.
+    _SKIP_GPU_INIT = os.environ.get("UNSLOTH_ZOO_DISABLE_GPU_INIT", "0") == "1"


Make shared fallback importable without Unsloth

When Studio or another lightweight caller imports unsloth_zoo.hf_xet_fallback in a normal process, this opt-in gate leaves _SKIP_GPU_INIT false unless the caller already knew to set UNSLOTH_ZOO_DISABLE_GPU_INIT=1, so package import continues into the existing torch/unsloth requirements and can raise before the new shared fallback module is available. Since the new module is intended to be imported outside the training process, it needs a default import-light path for that context rather than only for spawned children.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T08:30:06Z

+                continue
+    except OSError:
+        return False
+    return bool(_filter_paths(weights, allow_patterns, ignore_patterns))


Require base weights for broad patterned model warms

When a model snapshot is checked with a broad allow_patterns such as ['*.safetensors'], this accepts any selected loadable weight as proof the model warm is usable. A stale cache containing only adapter_model.safetensors therefore passes _download_result_usable even though a default base-model load reads model.safetensors/pytorch_model.bin and will fetch the missing base weight in-process over Xet after the protective child returns.

Useful? React with 👍 / 👎.

danielhanchen · 2026-07-03T08:56:48Z

Thanks. Checked all five against the loader paths and the module's actual consumers; none needs a change on 21fcb558.

Not taken (1 & 2): refresh mutable file/snapshot revisions before the fast path.
This is the intentional offline-capable warm short-circuit, and it is skipped under force_download=True (the escape hatch, alongside pinning a revision/SHA). The fast path is a pre-warm, not the loader: after it returns, the normal from_pretrained still runs and governs freshness exactly as it does today, so a moved main is handled by the load's own resolution, not silently pinned by the prefetch. The whole point of the short-circuit is to avoid spawning the killable child (and to stay offline-capable) when the cached ref is already complete; forcing a HEAD/etag check on every warm cache would defeat both and reintroduce the per-load Xet exposure this PR exists to bound. Same behavior HF itself gives a warm cache.

Not taken (3): allow unmatched exact snapshot filters.
Reproduced: this only bites a pure exact list such as ['tokenizer.json', 'tokenizer.model'], and no loader emits one. Every prefetch allow-list carries a glob (*.py, additional_chat_templates/*.jinja, adapter_model*, ...), so _requested_exact_files_present_grouped bails (returns accept) before requiring any name; the tokenizer-only warm with a repo shipping just tokenizer.json is accepted, not retried. The other real consumer (Studio) uses its own cache-state fork and does not call this. A direct caller passing a pure-alternative exact list is the only way to reach the strict branch, and none exists today; if one ever does, the fix belongs there rather than loosening the guard that catches a stale cache genuinely missing a named file.

Not taken (4): make the shared fallback importable without Unsloth.
The import-light path is deliberately opt-in (UNSLOTH_ZOO_DISABLE_GPU_INIT=1, off by default) so a normal training/inference import unsloth_zoo is byte-for-byte unchanged and still runs the required torch/allocator/device init. The download child sets that flag (that is exactly its purpose); Studio imports its own fork, not this module. Defaulting to the light path would regress the primary consumer (the training process) by skipping init it needs. There is no lightweight caller that both imports this module and cannot set the flag, so the current default is correct.

Not taken (5): require base weights for broad patterned model warms.
Not loader-reachable: a base-model load passes allow_patterns=None (routed through the canonical root-weight check), never a broad ['*.safetensors']. The patterned warms are gguf / adapter / subfolder-scoped, none of which is a base-model glob, so _has_selected_weight is never asked to prove a base model from a broad glob. This is the same case raised (and declined) in the prior two reviews.

…atchdog, and honor exact variant shard requests Three fixes from a consolidated 15-reviewer pass (12x reviewer.py personas + 3 forked reviewers). 1. from_tf / from_flax POST false-reject (HIGH, loader-reachable). A base load with from_tf=True / from_flax=True prefetches with ignore=['*.safetensors','*.bin',...] and reads tf_model.h5 / flax_model.msgpack. _root_model_has_weight recognized a root weight only via _CANONICAL_ROOT_MODEL_WEIGHT_RE (safetensors/bin), so a COMPLETE h5/msgpack download read as "no weight" -> POST rejected it -> HTTP retry -> same h5-only result -> DownloadStallError on a working load. It now also counts a canonical TF/Flax root weight (_CANONICAL_ROOT_TF_FLAX_WEIGHT_RE), gated on _pytorch_root_weight_formats_ignored (BOTH model.safetensors and pytorch_model.bin dropped by the ignore filter -- the from_tf/from_flax signature). A normal load keeps a PyTorch format so the gate is False and behavior is unchanged; a stray leftover h5 under a normal load still does not count. 2. Single-file watchdog resumed-partial hang without pid ownership (Med, Windows/macOS without psutil). When a child_pid is supplied but its open files cannot be inspected (no psutil AND no /proc), _measure followed only post-baseline partials; a RESUMED partial reuses a baseline blob-hash name, so a frozen Xet resume was excluded forever and never tripped the stall -- the exact hang the fallback exists to prevent. That branch now falls back to the repo-wide measure (as snapshots use), so a resumed partial is watched. The child_pid=None path (no pid at all) keeps the post-baseline name filter unchanged; the precise open-fds path (psutil / /proc) is untouched. 3. Exact-named variant shard requests false-rejected (Low). The plain branch skipped shard-set completeness for an exact-name allow list, but the variant branch did not, so allow=['model.fp16-00001-of-00002.safetensors'] + variant='fp16' -- a satisfied exact request -- was rejected as an incomplete shard set. The _patterns_are_exact_names escape now sits above both branches, so an exact-named subset (variant or plain) defers to the exact-file presence check. Adds 3 regression tests (186 -> 189). Safety-invariant fuzz stays at 0 false-accepts (PRE stays conservative -- these are all POST leniency / watchdog fixes); stall->HTTP e2e sim stays green. Not taken (same review pass): - Injected prepare_for_http_fn missing owned_incomplete_blobs (flagged by several personas): by design and documented at the call site -- an injected (Studio) hook owns its own marker-based cache accounting and keeps the plain (repo_type, repo_id) signature; the default hook gets the owned set. - Missing HF kwargs (local_dir / max_workers / etag_timeout): the helper is a cache-based prefetch wrapper, not a full snapshot_download drop-in; local_dir bypasses the cache the completeness gate reasons about. Current callers pass only what the prefetch needs. - token dropped on the in-process cache probes: both are local_files_only=True (cache-only, no network), so no auth is needed there. - License metadata (LGPL vs AGPL): out of scope per maintainer. - Snapshot repo-wide stall masked by a concurrent same-repo pull in a separate process: documented tradeoff; per-pid snapshot scoping would risk false-stalling a progressing snapshot.

…t-checkpoint shard sets, harden the process-group kill, and drop dead code Four fixes from a second consolidated review pass (12x reviewer.py personas + 3 forked reviewers) on 930d027. 1. Sharded TF/Flax POST false-accept -> hang (from the prior from_tf/from_flax fix). TF/Flax weights can be SHARDED (transformers TF2_WEIGHTS_INDEX_NAME=tf_model.h5.index.json, FLAX_WEIGHTS_INDEX_NAME=flax_model.msgpack.index.json). _CANONICAL_ROOT_TF_FLAX_WEIGHT_RE matched the shard form, so a LONE shard read as a present weight and _download_result_usable accepted an INCOMPLETE sharded set -> the in-process load then fetched the missing shard over un-killable Xet. The regex is now single-file only (tf_model.h5 / flax_model.msgpack); a sharded set is judged through its index via _weight_shard_index_complete (every listed shard present). Complete sharded and single-file downloads are accepted; an incomplete shard set or a lone shard without its index is retried over HTTP. 2. Explicit-checkpoint shard set accepted incomplete -> hang. _selected_shard_index_incomplete skipped EVERY training-checkpoint subtree, but subfolder=checkpoint-N (allow=['checkpoint-N/*']) is an explicit load of that checkpoint. A lone checkpoint-7/model-00001-of-00002.safetensors with no index was accepted, then the load fetched the missing index/shards over Xet. The skip is now conditional (_request_scopes_into_dir): a checkpoint the request explicitly targets is completeness-checked, while a leftover checkpoint a base/adapter/other-subfolder warm does not read is still skipped (no false-reject). 3. _terminate_process_group could signal the WRONG process group. The SIGTERM path called os.killpg(pid) whenever pid was non-null; before the child runs os.setsid() its pid is not yet a pgid, and a recycled pid could collide with an unrelated group. It now signals the group only when os.getpgid(pid) == pid (the child is confirmed its own leader), else falls back to a single-process signal -- matching the caution the post-reap SIGKILL branch already documents. 4. Removed dead _has_any_weight (defined, never called; the readable-weight checks supersede it) and the two docstrings that named it. Adds 2 regression tests (189 -> 191). Safety-invariant fuzz stays at 0 false-accepts; stall->HTTP e2e sim (which exercises the kill path) stays green. Not taken (same review pass): - Require ALL declared Diffusers components in POST: a static heuristic false-rejects a complete pipeline that declares weightless components (scheduler/tokenizer/feature_extractor, safety_checker:[null,null]); POST stays lenient and one declared component weight already separates a real warm from a config-only stale cache. - Injected prepare_for_http_fn missing owned_incomplete_blobs (5 personas): by design and documented at the call site -- an injected (Studio) hook owns its own marker-based cache accounting and keeps the plain (repo_type, repo_id) signature. - Transient-error / crashed-child HTTP retry uses the coarse mtime guard rather than an owned-partial set: the stall path captures owned partials from the child's open fds, but a crashed/errored child has no fds to inspect, so the coarse guard (spares actively-progressing siblings) is the only option there -- inherent, not an asymmetry that can be closed. - Broad ['*.safetensors'] base-model warm accepts an adapter-only cache: not loader-reachable (a base load passes allow_patterns=None, never a broad glob). - Missing HF kwargs (local_dir / max_workers / etag_timeout): the helper is a cache-based prefetch wrapper, not a full snapshot_download drop-in; local_dir bypasses the cache the completeness gate reasons about. Current callers pass only what the prefetch needs.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b2089bfd95

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-03T10:37:39Z

+    if want_variant is None:
+        shard_file_re = re.compile(r"^[^.]+-\d{5}-of-\d{5}\.(?:safetensors|bin)$")
+    else:
+        v = re.escape(want_variant)
+        shard_file_re = re.compile(rf"^[^.]+\.{v}-\d{{5}}-of-\d{{5}}\.(?:safetensors|bin)$")


Check TF/Flax shard sets for patterned warms

For patterned from_tf/from_flax warms such as allow_patterns=['*.h5'] or ['*.msgpack'], this helper only recognizes safetensors/bin shard files and indices. A stale snapshot containing only tf_model-00001-of-00002.h5 (or an *.h5.index.json missing a shard) therefore passes _download_result_usable, and the later from_pretrained(..., from_tf=True) / from_flax=True load still has to fetch the index or remaining shard in-process over Xet instead of taking the supervised HTTP retry path. Please include the TF/Flax shard/index names in the selected-shard completeness check, not just the unpatterned root check.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T10:37:39Z

+                if index_path.is_file() and _weight_shard_index_complete(index_path):
+                    return True


Honor ignores for TF/Flax shard indices

When the request excludes one of the TF/Flax formats, the single-file path above applies ignore_patterns, but this sharded-index path accepts any complete TF/Flax index regardless of whether that format was ignored. For example, a from_tf warm that ignores *.msgpack can be satisfied by a stale complete flax_model.msgpack.index.json while tf_model.h5 is absent, so the later from_pretrained(..., from_tf=True) fetches the missing TF weights in-process over Xet instead of using the supervised retry. Apply the same format filter to these index checks before accepting them.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T10:37:39Z

+    "adapter_model.safetensors",
+    "adapter_model.bin",
+    "consolidated.safetensors",
+    "consolidated.00.pth",
+    "diffusion_pytorch_model.safetensors",


Classify dashed adapter/component shard globs as weights

The probe set omits dashed adapter and component shard names, so filters like allow_patterns=['adapter_model-*', 'adapter_model.safetensors.index.json'] or ['unet/diffusion_pytorch_model-*', ...] are classified as weightless. In those patterned shard warms, _download_result_usable skips the readable-weight and shard-completeness checks entirely, so an index-only or config-only stale snapshot can be accepted and the later PEFT/Diffusers load fetches the missing shards in-process over Xet. Add representative dashed adapter/component shard probes so these requests stay weight-bearing.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T10:37:39Z

+    if not is_diffusers and _request_selects_canonical_root_shards(
+        allow_patterns, ignore_patterns
+    ) and _has_incomplete_canonical_root_shards(snapshot_dir, ignore_patterns = ignore_patterns):


Scope root shard rejects to the allowed format

For patterned root warms, this only checks that the request selects some canonical root shard and then runs the root-shard completeness check without allow_patterns. With allow_patterns=['pytorch_model*'], a successful bin-only snapshot containing pytorch_model.bin is rejected if an unrelated stale model.safetensors.index.json is missing shards, even though that safetensors set is outside the filter and a bin load will not read it; after the HTTP retry this becomes a DownloadStallError for a valid filtered snapshot. Thread the allow scope into the root shard check or choose the selected format before failing.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T10:37:39Z

+_CANONICAL_COMPONENT_WEIGHT_RE = re.compile(
+    r"^[^.]+(?:-\d{5}-of-\d{5})?\.(?:safetensors|bin)$"


Reject noncanonical weights in Diffusers components

This regex accepts any no-dot safetensors/bin basename inside a declared Diffusers component, including files such as unet/adapter_model.safetensors. A stale cache with model_index.json declaring unet and only that adapter sidecar present passes _download_result_usable, but DiffusionPipeline will still look for the component's canonical weight such as diffusion_pytorch_model.safetensors/model.safetensors and can fetch it in-process over Xet. Restrict declared component evidence to the weight basenames Diffusers/Transformers components actually load.

Useful? React with 👍 / 👎.

…n CI Two fixes from a third consolidated review pass (12x reviewer.py personas + 3 forked reviewers) on b2089bf. 1. Nested checkpoint subfolder POST false-accept -> hang. _request_scopes_into_dir matched the checkpoint dir only against the FIRST path segment, so a nested subfolder=foo/checkpoint-7 (allow=['foo/checkpoint-7/*']) did not register as an explicit checkpoint load: the checkpoint subtree was skipped, a lone incomplete shard there was accepted, and the in-process load then fetched the missing index/shards over un-killable Xet -- the same hang the flat-checkpoint fix closed, one directory deeper. The scope check now matches the target dir against ALL literal leading segments (up to the first glob), so an explicitly-loaded checkpoint at any depth is completeness-checked while a leftover checkpoint the request does not target is still skipped. 2. test_hf_xet_fallback.py now runs in CI. The 191-test fallback suite was only reached by `pytest --collect-only` (continue-on-error), so it could regress without failing CI. It is now in the behavioral HARD GATE alongside the other CPU-pure gated files, so a future change that breaks the watchdog / cache-completeness / Xet->HTTP fallback behavior blocks the merge. Adds a nested-checkpoint regression assertion. Full suite 191 passed; safety-invariant fuzz stays at 0 false-accepts. Known low-reachability edge (documented, not fixed): a from_tf/from_flax load with a non-root subfolder AND a SHARDED TF/Flax weight AND a silent-stale transient can POST-accept an incomplete shard set -- the sharded TF/Flax completeness check landed on the allow=None root path, and the patterned-subfolder path's index detection is safetensors/bin-only. The conjunction (from_tf is already niche + subfolder + sharding + a transient-error stale return) is essentially unreachable, and PRE still defers safely; extending the generic patterned-shard machinery to h5/msgpack is not worth its surface for that edge. Not taken (same review pass): - Retryable-error / crashed-child HTTP retry uses the coarse mtime guard, not an owned-partial set: the stall path captures owned partials from the LIVE child's open fds, but a child that returned a retryable error / crashed has already exited, so there are no fds to inspect -- the coarse guard (which spares an actively-progressing sibling by mtime) is the only option there. Inherent, not a closable asymmetry. - Injected prepare_for_http_fn missing owned_incomplete_blobs: by design and documented at the call site -- a Studio hook owns its own marker-based cache accounting. - Broad ['*.safetensors'] / ['model*','adapter_model*'] accepting one selected weight: not loader-reachable (a base load passes allow=None; no loader requests base+adapter together). - Exact ['config.json','tokenizer.json'] over-requiring optional files: not loader-reachable (every loader allow-list carries a glob, so the exact-file check is never the deciding gate). - import unsloth_zoo.hf_xet_fallback not import-light by default: intentional opt-in (UNSLOTH_ZOO_DISABLE_GPU_INIT=1); the download child sets it, Studio imports its own fork.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fb80cf9a16

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-03T11:17:52Z

+    if base.endswith(_WEIGHT_FILE_SUFFIXES):
+        return True


Treat trainer state files as weightless filters

When an exact allow_patterns entry names a trainer artifact such as optimizer.pt, training_args.bin, or rng_state_0.pth, this branch marks the request as weight-bearing solely because the basename has a weight-like suffix. The later readable-weight check excludes those same artifacts via _is_loadable_weight_file, so a successful filtered snapshot containing the requested state file is classified as incomplete and retried/failed with DownloadStallError. Apply the same non-weight basename/prefix exclusion here before returning true for exact suffix matches.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T11:17:52Z

+    "model.fp16.safetensors",
+    "model-00001-of-00002.safetensors",
+    "pytorch_model.bin",
+    "pytorch_model-00001-of-00002.bin",


Classify pytorch variant globs as weights

For variant-specific bin filters such as allow_patterns=['pytorch_model.fp16*'], this probe set has no representative pytorch_model.fp16.bin / sharded variant name, so _pattern_can_select_weight classifies the request as weightless. A stale config-only snapshot is then accepted even with variant='fp16', and the later from_pretrained(..., variant='fp16', use_safetensors=False) load can fetch the missing pytorch_model.fp16.bin in-process over Xet; add PyTorch-bin variant probes alongside the safetensors variant probe.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T11:17:52Z

+                        _active_incomplete_blob_sizes(repo_type, repo_id, params.get("cache_dir"))
+                    )
+                    owned = current - baseline_partials
+                params["_owned_incomplete_blobs"] = owned


Don't scope HTTP cleanup to an empty owned set

When a snapshot watchdog fires before the child has opened any .incomplete file, such as a stale dangling symlink from an older snapshot making has_active_incomplete_blobs true during a slow metadata call, _child_open_incomplete_blobs returns an empty set and this stores it as the owned cleanup scope. _default_prepare_for_http treats that empty set as authoritative and skips every stale blob/link, leaving the same broken symlink to retrip the HTTP attempt; use None for the no-owned-partials case or avoid child-owned scoping for snapshot-wide stale state.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T11:17:52Z

+        for shard_rels in by_fmt.get("safetensors") or by_fmt.get("bin") or []:
+            for shard in shard_rels:
+                try:
+                    if not (snapshot_dir / shard).exists():
+                        return True


Let single subfolder weights beat stale indices

When a patterned subfolder warm selects both encoder/model.safetensors and a stale encoder/model.safetensors.index.json, this loop rejects the snapshot as soon as any shard listed by the index is absent. Transformers checks the single safetensors file before the index in that same directory, so this valid warmed subfolder is retried and can end in DownloadStallError; mirror the root-weight precedence here and ignore a selected index when the corresponding single weight of the preferred format is present.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fb80cf9a16

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-03T11:23:21Z

+    if base.endswith(_WEIGHT_FILE_SUFFIXES):
+        return True
+    if any(ch in base for ch in _GLOB_CHARS):
+        return any(fnmatch.fnmatchcase(probe, base) for probe in _WEIGHT_PATTERN_PROBES)


Treat directory-prefix globs as weight-bearing

When allow_patterns uses a directory-prefix glob without a slash, such as ['checkpoint-*'] or ['unet*'], Hugging Face's filter selects files below matching directories, but this check only compares the final basename to weight-file probes and returns False. That makes request_can_include_weights treat these weight-bearing snapshot requests as weightless, so a stale config-only subset can be accepted and the later checkpoint/component load may fetch the missing weights in-process instead of through the fallback.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T11:23:21Z

+        target = Path(os.readlink(link))
+        if not target.is_absolute():
+            target = link.parent / target
+        incomplete_partner = target.with_name(target.name + INCOMPLETE_SUFFIX)


Spare unique incomplete partners for live symlinks

In supported newer huggingface_hub versions, the temporary partial for blob <etag> is named like <etag>.<uuid>.incomplete, not exactly <etag>.incomplete. When default HTTP prep runs without an owned-partial set, a concurrent same-repo sibling can have a dangling snapshot symlink to <etag> plus a fresh unique partial, but this exact lookup misses the active partner and unlinks the sibling's live symlink instead of sparing it, leaving that sibling snapshot vulnerable to being returned with the pointer removed.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-03T11:23:21Z

+                    "retrying with HF_HUB_DISABLE_XET=1", label
+                )
+                _safe_status(on_status, f"{label}: download crashed, retrying over HTTP")
+                disable_xet = True


Clear crashed snapshot partials before HTTP retry

When a snapshot child crashes shortly after creating a partial, this retry path flips to HTTP without recording any child-owned partials. The next loop's HTTP prep treats the just-written .incomplete as a fresh active sibling and spares it, so the snapshot watchdog still sees a constant repo-wide incomplete blob and can declare the HTTP retry stalled during metadata or lock wait even though the crashed child's partial should have been removed before retrying.

Useful? React with 👍 / 👎.

The shared Xet->HTTP fallback does not require a version bump; keep the base version.

Condense docstrings and inline comments across hf_xet_fallback.py, hf_cache_state.py, the __init__.py opt-in block, and the test suite: drop restating-the-obvious comments, collapse multi-line blocks, cut small-helper docstrings to one line. Comments/docstrings only, no code changed (AST-verified). Preserves the load-bearing rationale tersely (STRICT pre-download / LENIENT post-download, the un-killable Xet hang reason, and the scoping / format / variant / from_tf / diffusers / checkpoint / redaction notes). Full suite 191 passed; safety fuzz 0 false-accepts.

chatgpt-codex-connector · 2026-07-03T12:28:45Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

Consolidated review pass (12x reviewer.py personas + 3 forked reviewers). 8 of 12 reviewers independently reproduced a post-download false-accept: _has_diffusers_component_weight returned True as soon as ANY declared component held a weight, so a stale partial pipeline (model_index.json declaring unet + vae with only unet present, or vae present with its config but no weight) was accepted -- then the in-process DiffusionPipeline load fetched the missing component over un-killable Xet, the exact hang this fallback prevents. Diffusion is a real loader consumer (diffusion.py warms with allow_patterns=None), so it is reachable. An unpatterned diffusers warm now requires that every DECLARED ACTIVE component (a [library, class] spec with both non-null) is materialised as a non-empty subfolder AND that each model-style component (one carrying config.json) holds a readable weight of the read format. A [null, null] disabled component (e.g. safety_checker) and weightless components (scheduler / tokenizer / feature_extractor, which carry *_config.json, not config.json) are not required, so a complete or safety-checker-disabled pipeline is not false-rejected into a DownloadStallError. A variant load accepts a component's canonical weight as diffusers' per-component fallback, so a mixed variant/canonical pipeline still passes. A malformed / empty model_index.json fails OPEN to the prior lenient any-weight check. Adds a regression over component-absent, config-only, complete-with-null-safety- checker, variant-absent, and mixed-variant cases. Full suite 192 passed; safety- invariant fuzz stays at 0 false-accepts. Not taken (re-verified, no new evidence): - Injected prepare_for_http_fn dropping owned_incomplete_blobs: by design; a Studio hook owns its marker-based cache accounting (documented at the call site). - Retryable / crashed-child HTTP prep using the coarse mtime guard: an exited child has no open fds to scope owned partials; inherent, not closable. - Missing hf_hub kwargs / non-import-light default: a purpose-built prefetch helper (local_dir breaks the cache model) and an intentional opt-in (UNSLOTH_ZOO_DISABLE_GPU_INIT), not a general hf_hub_download replacement. - Patterned TF/Flax sharded-subfolder edge: documented low-reachability; PRE never accepts tf/flax, so it can only ever cause a safe retry.

chatgpt-codex-connector · 2026-07-03T13:40:23Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

Follow-up review pass (12x reviewer.py personas + 3 forks) on the diffusers component-completeness gate. A residual false-accept remained: an active component dir that is non-empty but carries NEITHER a top-level config.json NOR a readable weight NOR a weightless sidecar (only a stray file or a nested training-checkpoint) passed the presence rule and skipped the config-gated weight rule, so a stale partial mid-download component was accepted and the in-process pipeline load then fetched its weight over un-killable Xet. A component now satisfies the warm only if it holds a readable weight of the read format, OR is WEIGHTLESS-shaped (a scheduler / tokenizer / feature_extractor sidecar: a *_config.json / tokenizer file and no config.json). A dir with neither is an incomplete model component and is retried over HTTP. Complete pipelines (including SDXL dual text encoders, sharded components, fp16 mixed / canonical fallback) still accept; the safety-invariant fuzz stays at 0 false-accepts and the suite is 193 passed. Not changed (re-verified this pass): the variant floor requiring at least one variant weight is correct -- diffusers itself raises on a zero-variant repo, so a canonical-only request for a variant is not a load diffusers would complete. The recurring reviewer.py "stale-branch revert" flag is a harness artifact (GitHub reports the PR mergeable / clean).

chatgpt-codex-connector · 2026-07-04T01:27:45Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

gemini-code-assist Bot reviewed Jun 24, 2026

View reviewed changes

danielhanchen mentioned this pull request Jun 24, 2026

Auto Xet to HTTP download fallback in from_pretrained; share Studio's fallback via unsloth_zoo unslothai/unsloth#6638

Open

danielhanchen added 2 commits June 24, 2026 11:56

chatgpt-codex-connector Bot reviewed Jun 24, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 25, 2026

View reviewed changes

danielhanchen added 2 commits June 25, 2026 15:23

Merge remote-tracking branch 'origin/main' into hf-xet-fallback-shared

a04a983

chatgpt-codex-connector Bot reviewed Jun 25, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jul 2, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jul 3, 2026

View reviewed changes

danielhanchen added 2 commits July 3, 2026 09:27

chatgpt-codex-connector Bot reviewed Jul 3, 2026

View reviewed changes

danielhanchen added 2 commits July 3, 2026 11:52

Revert version bump to 2026.6.7

346ae86

The shared Xet->HTTP fallback does not require a version bump; keep the base version.

		if not repo_id or repo_id.startswith(("/", ".", "~")) or "\\" in repo_id:
		continue

	assert xf.get_hf_download_state(["/abs/path", "./rel", "~user", "c:\\x"]) == (0, False)
	assert xf.get_hf_download_state(["/abs/path", "./rel", "~user", "c:\\x", "c:/x"]) == (0, False)


		logger = logging.getLogger(__name__)

		_CTX = mp.get_context("spawn")

		# The generic purge is cache_dir-aware; an injected (Studio) hook owns
		# its own cache accounting and keeps the (repo_type, repo_id) signature.

		out = re.sub(r"hf_[A-Za-z0-9]{8,}", "***", out)
		out = re.sub(r"([Bb]earer\s+)[A-Za-z0-9._\-]+", r"\1***", out)

		if isinstance(cached, str) and os.path.exists(cached):
		return cached



		def repo_cache_dir_name(repo_type: str, repo_id: str) -> str:
		return f"{repo_type}s--{repo_id.replace('/', '--')}"

		if on_status is not None:
		on_status(f"{label}: Xet stalled, retrying over HTTP")

		if "pytorch_model.bin" in root_files and _format_kept("pytorch_model.bin"):
		return True

		if not allow or any(not isinstance(p, str) or _has_glob(p) for p in allow):
		return True

		if _patterns_are_exact_names(allow_patterns):
		return False # an exact-named subset defers to the exact-file presence check

		if index_path.is_file() and _weight_shard_index_complete(index_path):
		return True

		_CANONICAL_COMPONENT_WEIGHT_RE = re.compile(
		r"^[^.]+(?:-\d{5}-of-\d{5})?\.(?:safetensors\|bin)$"

Uh oh!

Conversation

danielhanchen commented Jun 24, 2026

Summary

What this adds

Tests

Uh oh!

chatgpt-codex-connector Bot commented Jun 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Jun 24, 2026

Uh oh!

danielhanchen commented Jun 24, 2026

Uh oh!

gemini-code-assist Bot commented Jun 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Jun 25, 2026

Uh oh!

danielhanchen commented Jun 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment