Flash attention broken in transformers 5.6.0

### Reminder

- [x] I have read the above rules and searched the existing issues.

### System Info

- Platform: Linux 
- Python version: 3.12
- PyTorch version: 2.11.0 (CUDA 13.0)
- Flash Attention version: 2.8.3
- Transformers version: == 5.6.0 
- Datasets version: >=4.0.0 
- Accelerate version: 1.14.0 (override)
- PEFT version: >=0.17.0 
- LLaMA Factory version: >=0.9.5

### Reproduction

Run the CLI with the following config:
```yaml
model_name_or_path: "Qwen/Qwen3-4B-Instruct-2507"
template: "qwen"
flash_attn: "fa2"
bf16: False
fp16: True
```

And encounter this problem:
```python
File ".../transformers/integrations/flash_attention.py", line 84, in flash_attention_forward
    s_aux=s_aux.to(query.dtype),  # FA only accepts half precision
          ^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'to'
```

This is also described here:
https://github.com/huggingface/transformers/issues/45588

And it's something that can be solved with upgrading to transformers 5.7.0

### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash attention broken in transformers 5.6.0 #10610

Reminder

System Info

Reproduction

Others

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Flash attention broken in transformers 5.6.0 #10610

Description

Reminder

System Info

Reproduction

Others

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions