Reminder
System Info
- Platform: Linux
- Python version: 3.12
- PyTorch version: 2.11.0 (CUDA 13.0)
- Flash Attention version: 2.8.3
- Transformers version: == 5.6.0
- Datasets version: >=4.0.0
- Accelerate version: 1.14.0 (override)
- PEFT version: >=0.17.0
- LLaMA Factory version: >=0.9.5
Reproduction
Run the CLI with the following config:
model_name_or_path: "Qwen/Qwen3-4B-Instruct-2507"
template: "qwen"
flash_attn: "fa2"
bf16: False
fp16: True
And encounter this problem:
File ".../transformers/integrations/flash_attention.py", line 84, in flash_attention_forward
s_aux=s_aux.to(query.dtype), # FA only accepts half precision
^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'to'
This is also described here:
huggingface/transformers#45588
And it's something that can be solved with upgrading to transformers 5.7.0
Others
No response
Reminder
System Info
Reproduction
Run the CLI with the following config:
And encounter this problem:
This is also described here:
huggingface/transformers#45588
And it's something that can be solved with upgrading to transformers 5.7.0
Others
No response