Skip to content

fix(core): load ONNX Runtime dynamically so headroom._core imports on non-AVX2 x86-64#1715

Open
Parideboy wants to merge 1 commit into
headroomlabs-ai:mainfrom
Parideboy:fix/1278-non-avx2-import
Open

fix(core): load ONNX Runtime dynamically so headroom._core imports on non-AVX2 x86-64#1715
Parideboy wants to merge 1 commit into
headroomlabs-ai:mainfrom
Parideboy:fix/1278-non-avx2-import

Conversation

@Parideboy

Copy link
Copy Markdown
Contributor

Description

import headroom._core dies with SIGILL (Illegal instruction) on x86-64 CPUs without AVX2 (Pentium N4200, Celeron N4500, AMD FX 8350 — all reported on the issue). The repo sets no RUSTFLAGS/target-cpu anywhere, so first-party Rust code is baseline x86-64; the AVX2 code comes from Microsoft's prebuilt ONNX Runtime, statically linked into the extension by fastembed's ort-download-binaries-rustls-tls feature on non-Windows targets. Because it is statically linked, its code is mapped and initialized when the extension module loads — before the runtime AVX2 guard from #1162 can run, which is why that fix helped Magika init but not the import-time crash.

Fix, mirroring what Windows already does for its own reasons (DirectML link libs): build with ort-load-dynamic on every platform, so ONNX Runtime is only dlopen'd at first use, where the #1162 AVX2 guard falls back to the non-ONNX detection tiers on unsupported CPUs. Since both target blocks became identical, they are collapsed into one platform-independent fastembed dependency.

To keep Magika/fastembed working out of the box on Linux/macOS, the existing ORT_DYLIB_PATH auto-pin (headroom/_ort.py, previously Windows-only) now resolves the pip onnxruntime package's shared library on all platforms (onnxruntime.dll / libonnxruntime.so* / libonnxruntime*.dylib). The pip onnxruntime CPU wheels use runtime CPU dispatch, so they also work on pre-AVX2 machines — non-AVX2 users get working ML detection instead of a crash. Without the onnxruntime package, ML detection degrades gracefully to the non-ONNX tiers exactly as it already does on Windows.

Fixes #1278

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Performance improvement
  • Code refactoring (no functional changes)

Changes Made

  • crates/headroom-core/Cargo.toml: replaced the per-target fastembed blocks (ort-download-binaries-rustls-tls on non-Windows, ort-load-dynamic on Windows) with a single platform-independent dependency on ort-load-dynamic, with a comment documenting both the DirectML and the AVX2/[BUG] SIGILL crash on headroom._core import: prebuilt wheel requires AVX2, incompatible with non-AVX2 x86-64 CPUs #1278 rationale.
  • Cargo.lock: regenerated — ort-sys drops its static-download dependencies (hmac-sha256, lzma-rust2, ureq); no version bumps.
  • headroom/_ort.py: ORT_DYLIB_PATH auto-pin extended from Windows-only to all platforms via a small _find_dylib helper that resolves the platform's shared-library name inside the pip onnxruntime package.
  • tests/test_transforms/test_ort_dylib.py: replaced the obsolete test_noop_on_non_windows with Linux (versioned .so) and macOS (.dylib) pin tests; module docstring updated.
  • docs/content/docs/configuration.mdx: ORT_DYLIB_PATH row updated from Windows-only wording to the cross-platform behavior.

Testing

  • Unit tests pass (pytest)
  • Linting passes (ruff check .)
  • Type checking passes (mypy headroom)
  • New tests added for new functionality
  • Manual testing performed

Test Output

$ cargo fmt --all -- --check && cargo clippy --workspace -- -D warnings && cargo test -p headroom-core --lib
clean
test result: 844 passed; 0 failed; 1 ignored

$ python -m pytest tests/test_transforms/test_ort_dylib.py -q
8 passed

$ ruff check headroom/_ort.py tests/test_transforms/test_ort_dylib.py
All checks passed!

Real Behavior Proof

  • Environment: Windows 11 (AVX2-capable — the SIGILL itself is not reproducible on this machine), Python 3.13, Rust 1.95.0, local checkout branched from upstream/main (9fbd47b).
  • Exact command / steps: cargo check -p headroom-core after the feature switch; inspected the Cargo.lock diff; rebuilt and ran python -c "import headroom; from headroom._core import detect_content_type; print(detect_content_type('hello world'))"; ran the ort-pin test suite with monkeypatched linux/darwin platforms.
  • Observed result: build succeeds with ort-load-dynamic; the lockfile shows ort-sys no longer pulls the binary-download machinery (hmac-sha256, lzma-rust2, ureq removed), confirming the statically-linked prebuilt ORT is gone; import + content detection works with ORT_DYLIB_PATH auto-pinned to the pip onnxruntime library; all 8 pin tests pass including the new Linux/macOS branches.
  • Not tested: actual pre-AVX2 x86-64 hardware (none available — the fix removes AVX2 code from the import path by construction, and the issue reporters on [BUG] SIGILL crash on headroom._core import: prebuilt wheel requires AVX2, incompatible with non-AVX2 x86-64 CPUs #1278 can verify); Linux/macOS wheel runtime behavior beyond CI's ubuntu/macOS wheel-build jobs; embedding quality/performance under a pip-provided ORT version differing from the previously vendored one.

Review Readiness

  • I have performed a self-review
  • This PR is ready for human review

… non-AVX2 x86-64

fastembed's ort-download-binaries-rustls-tls feature statically links
Microsoft's prebuilt ONNX Runtime into the extension on non-Windows
targets. That prebuilt x86_64 binary requires AVX2 and its code runs as
soon as the module loads, so import headroom._core died with SIGILL on
pre-AVX2 CPUs before the runtime AVX2 guard from headroomlabs-ai#1162 could intervene.

Build with ort-load-dynamic on every platform (Windows already did, for
DirectML link-lib reasons), collapsing the two identical target blocks
into one dependency. ORT is now only dlopen'd at first use, where the
AVX2 guard falls back to the non-ONNX detection tiers.

To keep Magika/fastembed working out of the box, extend the Windows-only
ORT_DYLIB_PATH auto-pin in headroom/_ort.py to all platforms: it now
resolves the pip onnxruntime package's shared library (.dll/.so/.dylib),
whose CPU wheels use runtime dispatch and run on pre-AVX2 machines too.

Fixes headroomlabs-ai#1278

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

PR governance

This PR follows the template and is marked ready for human review.

@github-actions github-actions Bot added the status: ready for review Pull request body is complete and the author marked it ready for human review label Jul 2, 2026

@JerrettDavis JerrettDavis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review looks good. The change removes the static ONNX Runtime download path from the Rust extension, keeps the Windows dynamic-load behavior, and extends the import-time ORT_DYLIB_PATH pinning in a way that is covered by Linux/macOS/Windows-oriented tests. I did not find a correctness blocker in the diff.

One process note before merge: the current check rollup is not fully green because several test jobs are marked cancelled rather than successful. I am treating that as CI state, not a requested code change.

@github-actions github-actions Bot added status: ci failing Required or reported CI checks are failing and removed status: ready for review Pull request body is complete and the author marked it ready for human review labels Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: ci failing Required or reported CI checks are failing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] SIGILL crash on headroom._core import: prebuilt wheel requires AVX2, incompatible with non-AVX2 x86-64 CPUs

2 participants