Skip to content

Fix Qwen Omni batched text post-processing#47060

Open
Sunt-ing wants to merge 1 commit into
huggingface:mainfrom
Sunt-ing:15
Open

Fix Qwen Omni batched text post-processing#47060
Sunt-ing wants to merge 1 commit into
huggingface:mainfrom
Sunt-ing:15

Conversation

@Sunt-ing

@Sunt-ing Sunt-ing commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

CI

What does this PR do?

Qwen2_5OmniProcessor and Qwen3OmniMoeProcessor decoded generated_outputs[0] for text post-processing. That is correct when generate() returns a tuple-like multimodal result, where the text sequences are the first element. It is wrong for text-only generation, where generate() returns the full (batch, seq) tensor directly.

For batched pipeline("any-to-any") text calls, the model generates a two-row tensor but the processor decodes only the first row, so the pipeline returns one record for a two-item batch.

This keeps the tuple/list path for multimodal outputs and decodes the whole tensor for text-only outputs. The regression guard covers both Omni processors.

End-to-end pipeline repro (real AnyToAnyPipeline + real Qwen3-Omni processor)

Environment:

Hardware: AutoDL c4090, NVIDIA GeForce RTX 4090 present; repro run CPU-only with CUDA_VISIBLE_DEVICES=""
OS: Linux AutoDL container
Python: 3.12
PyTorch: 2.8.0+cu128
Transformers main: b70d02fc724d
This PR head: 651c109d67
Processor metadata: Qwen/Qwen3-Omni-30B-A3B-Instruct via AutoProcessor.from_pretrained
Model: small fake Qwen3OmniMoeForConditionalGeneration; it only replaces 30B weight download and still exercises pipeline("any-to-any") plus the real processor post-processing path
Command: CUDA_VISIBLE_DEVICES="" python repro.py
import torch

from transformers import AutoProcessor, GenerationConfig, pipeline


class Qwen3OmniMoeForConditionalGeneration(torch.nn.Module):
    input_modalities = ("image", "video", "audio", "text")
    output_modalities = ("text", "audio")

    def __init__(self):
        super().__init__()
        self.config = type("Cfg", (), {"_commit_hash": None, "model_type": "qwen3_omni_moe"})()
        self.generation_config = GenerationConfig(max_new_tokens=1)

    @property
    def device(self):
        return torch.device("cpu")

    def can_generate(self):
        return True

    def generate(self, input_ids=None, **kwargs):
        print("GEN_BATCH", tuple(input_ids.shape))
        extra = torch.arange(42, 42 + input_ids.shape[0], dtype=input_ids.dtype).unsqueeze(1)
        return torch.cat([input_ids, extra], dim=1)


processor = AutoProcessor.from_pretrained("Qwen/Qwen3-Omni-30B-A3B-Instruct")
pipe = pipeline("any-to-any", model=Qwen3OmniMoeForConditionalGeneration(), processor=processor)
out = pipe(text=["hello", "world"], return_full_text=False, max_new_tokens=1)
print("PIPE_OUT_LEN", len(out), out)
# main b70d02fc724d
GEN_BATCH (2, 1)
PIPE_OUT_LEN 1 [{'input_text': 'hello', 'generated_text': 'K'}]

# after this PR
GEN_BATCH (2, 1)
PIPE_OUT_LEN 2 [{'input_text': 'hello', 'generated_text': 'K'}, {'input_text': 'world', 'generated_text': 'L'}]

Regression tests and repository checks also passed.

Regression tests and repository checks

Tests:

python -m pytest -q \
  tests/models/qwen2_5_omni/test_processing_qwen2_5_omni.py::test_qwen2_5_omni_post_process_multimodal_output_keeps_text_batch \
  tests/models/qwen3_omni_moe/test_processing_qwen3_omni_moe.py::test_qwen3_omni_moe_post_process_multimodal_output_keeps_text_batch

2 passed in 0.16s

Also ran:

python -m ruff check src/transformers/models/qwen2_5_omni/processing_qwen2_5_omni.py src/transformers/models/qwen3_omni_moe/processing_qwen3_omni_moe.py tests/models/qwen2_5_omni/test_processing_qwen2_5_omni.py tests/models/qwen3_omni_moe/test_processing_qwen3_omni_moe.py
python -m ruff format --check src/transformers/models/qwen2_5_omni/processing_qwen2_5_omni.py src/transformers/models/qwen3_omni_moe/processing_qwen3_omni_moe.py tests/models/qwen2_5_omni/test_processing_qwen2_5_omni.py tests/models/qwen3_omni_moe/test_processing_qwen3_omni_moe.py
python utils/check_modular_conversion.py --files src/transformers/models/qwen3_omni_moe/modular_qwen3_omni_moe.py
python utils/check_copies.py --file src/transformers/models/qwen2_5_omni/processing_qwen2_5_omni.py
python utils/check_copies.py --file src/transformers/models/qwen3_omni_moe/processing_qwen3_omni_moe.py
git diff --check

Code Agent Policy

  • I confirm that this is not a pure code agent PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline and the
    Pull Request checks?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes according to the guidelines?
  • Did you write any new necessary tests?

Who can review?

cc @zucchini-nlp

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen2_5_omni, qwen3_omni_moe

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

CI recap

Dashboard: View test results in Grafana
Latest run: 28701194782:2
Result: success | Jobs: 3 | Tests: 140 | Failures: 0 | Duration: 15m 14s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant