Fix Qwen Omni batched text post-processing by Sunt-ing · Pull Request #47060 · huggingface/transformers

Sunt-ing · 2026-07-04T08:56:43Z

What does this PR do?

Qwen2_5OmniProcessor and Qwen3OmniMoeProcessor decoded generated_outputs[0] for text post-processing. That is correct when generate() returns a tuple-like multimodal result, where the text sequences are the first element. It is wrong for text-only generation, where generate() returns the full (batch, seq) tensor directly.

For batched pipeline("any-to-any") text calls, the model generates a two-row tensor but the processor decodes only the first row, so the pipeline returns one record for a two-item batch.

This keeps the tuple/list path for multimodal outputs and decodes the whole tensor for text-only outputs. The regression guard covers both Omni processors.

End-to-end pipeline repro (real AnyToAnyPipeline + real Qwen3-Omni processor)

Environment:

Hardware: AutoDL c4090, NVIDIA GeForce RTX 4090 present; repro run CPU-only with CUDA_VISIBLE_DEVICES=""
OS: Linux AutoDL container
Python: 3.12
PyTorch: 2.8.0+cu128
Transformers main: b70d02fc724d
This PR head: 651c109d67
Processor metadata: Qwen/Qwen3-Omni-30B-A3B-Instruct via AutoProcessor.from_pretrained
Model: small fake Qwen3OmniMoeForConditionalGeneration; it only replaces 30B weight download and still exercises pipeline("any-to-any") plus the real processor post-processing path
Command: CUDA_VISIBLE_DEVICES="" python repro.py

import torch

from transformers import AutoProcessor, GenerationConfig, pipeline


class Qwen3OmniMoeForConditionalGeneration(torch.nn.Module):
    input_modalities = ("image", "video", "audio", "text")
    output_modalities = ("text", "audio")

    def __init__(self):
        super().__init__()
        self.config = type("Cfg", (), {"_commit_hash": None, "model_type": "qwen3_omni_moe"})()
        self.generation_config = GenerationConfig(max_new_tokens=1)

    @property
    def device(self):
        return torch.device("cpu")

    def can_generate(self):
        return True

    def generate(self, input_ids=None, **kwargs):
        print("GEN_BATCH", tuple(input_ids.shape))
        extra = torch.arange(42, 42 + input_ids.shape[0], dtype=input_ids.dtype).unsqueeze(1)
        return torch.cat([input_ids, extra], dim=1)


processor = AutoProcessor.from_pretrained("Qwen/Qwen3-Omni-30B-A3B-Instruct")
pipe = pipeline("any-to-any", model=Qwen3OmniMoeForConditionalGeneration(), processor=processor)
out = pipe(text=["hello", "world"], return_full_text=False, max_new_tokens=1)
print("PIPE_OUT_LEN", len(out), out)

# main b70d02fc724d
GEN_BATCH (2, 1)
PIPE_OUT_LEN 1 [{'input_text': 'hello', 'generated_text': 'K'}]

# after this PR
GEN_BATCH (2, 1)
PIPE_OUT_LEN 2 [{'input_text': 'hello', 'generated_text': 'K'}, {'input_text': 'world', 'generated_text': 'L'}]

Regression tests and repository checks also passed.

Regression tests and repository checks

Tests:

python -m pytest -q \
  tests/models/qwen2_5_omni/test_processing_qwen2_5_omni.py::test_qwen2_5_omni_post_process_multimodal_output_keeps_text_batch \
  tests/models/qwen3_omni_moe/test_processing_qwen3_omni_moe.py::test_qwen3_omni_moe_post_process_multimodal_output_keeps_text_batch

2 passed in 0.16s

Also ran:

python -m ruff check src/transformers/models/qwen2_5_omni/processing_qwen2_5_omni.py src/transformers/models/qwen3_omni_moe/processing_qwen3_omni_moe.py tests/models/qwen2_5_omni/test_processing_qwen2_5_omni.py tests/models/qwen3_omni_moe/test_processing_qwen3_omni_moe.py
python -m ruff format --check src/transformers/models/qwen2_5_omni/processing_qwen2_5_omni.py src/transformers/models/qwen3_omni_moe/processing_qwen3_omni_moe.py tests/models/qwen2_5_omni/test_processing_qwen2_5_omni.py tests/models/qwen3_omni_moe/test_processing_qwen3_omni_moe.py
python utils/check_modular_conversion.py --files src/transformers/models/qwen3_omni_moe/modular_qwen3_omni_moe.py
python utils/check_copies.py --file src/transformers/models/qwen2_5_omni/processing_qwen2_5_omni.py
python utils/check_copies.py --file src/transformers/models/qwen3_omni_moe/processing_qwen3_omni_moe.py
git diff --check

Code Agent Policy

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline and the
Pull Request checks?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes according to the guidelines?
Did you write any new necessary tests?

Who can review?

cc @zucchini-nlp

github-actions · 2026-07-04T08:57:58Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen2_5_omni, qwen3_omni_moe

github-actions · 2026-07-04T09:04:43Z

CI recap

Dashboard: View test results in Grafana
Latest run: 28701194782:2
Result: success | Jobs: 3 | Tests: 140 | Failures: 0 | Duration: 15m 14s

Fix Qwen Omni batched text post-processing

651c109

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Qwen Omni batched text post-processing#47060

Fix Qwen Omni batched text post-processing#47060
Sunt-ing wants to merge 1 commit into
huggingface:mainfrom
Sunt-ing:15

Sunt-ing commented Jul 4, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Sunt-ing commented Jul 4, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Code Agent Policy

Before submitting

Who can review?

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

github-actions Bot commented Jul 4, 2026

CI recap

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sunt-ing commented Jul 4, 2026 •

edited by github-actions Bot

Loading