Skip to content

Flash attention broken in transformers 5.6.0 #10610

Description

@Aure20

Reminder

  • I have read the above rules and searched the existing issues.

System Info

  • Platform: Linux
  • Python version: 3.12
  • PyTorch version: 2.11.0 (CUDA 13.0)
  • Flash Attention version: 2.8.3
  • Transformers version: == 5.6.0
  • Datasets version: >=4.0.0
  • Accelerate version: 1.14.0 (override)
  • PEFT version: >=0.17.0
  • LLaMA Factory version: >=0.9.5

Reproduction

Run the CLI with the following config:

model_name_or_path: "Qwen/Qwen3-4B-Instruct-2507"
template: "qwen"
flash_attn: "fa2"
bf16: False
fp16: True

And encounter this problem:

File ".../transformers/integrations/flash_attention.py", line 84, in flash_attention_forward
    s_aux=s_aux.to(query.dtype),  # FA only accepts half precision
          ^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'to'

This is also described here:
huggingface/transformers#45588

And it's something that can be solved with upgrading to transformers 5.7.0

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpendingThis problem is yet to be addressed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions