Fix CI read-only cache failures by patching cached_files in conftest#47043
Open
ydshieh wants to merge 2 commits into
Open
Fix CI read-only cache failures by patching cached_files in conftest#47043ydshieh wants to merge 2 commits into
ydshieh wants to merge 2 commits into
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Contributor
CI recapDashboard: View test results in Grafana |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
In CPU CI, the shared HF cache is read-only (pre-populated with most models). Tests that call
from_pretrainedwith a model not yet in the cache — or one with updated files — hitOSError: [Errno 30] Read-only file systemand fail with an opaque error.This is a temporary workaround. We are in discussion with the infra team to evaluate whether a proper writable cache system is feasible. Until then, we want a lightweight fix that handles the common cases without over-engineering.
Why not #46768
#46768 addressed this with a per-test
@use_temp_cache_if_readonlydecorator. We moved away from that approach because it is opt-in and contributors adding a new test can forget it.Why patch
cached_filesinstead offrom_pretrainedPatching
from_pretraineddirectly would require importing several large classes (PreTrainedModel,PreTrainedTokenizerBase,PretrainedConfig,ImageProcessingMixin, etc.) intoconftest.py, which we want to avoid.transformers.utils.hub.cached_filesis the common code path that allfrom_pretrainedimplementations go through for hub downloads, so a single patch there covers everything with no class imports.What this does
Wraps
transformers.utils.hub.cached_filesinconftest.pywith a try/except that catchesEROFSonly. On failure, a session-scoped tmp dir is created (lazily, once) and the call is retried withHF_HUB_CACHEandHF_XET_CACHEpatched to it. All other tests continue to read from the pre-populated read-only cache as before.This covers the common cases well enough for a temporary fix. Edge cases that don't go through
cached_filesare out of scope for now.