Releases: hiyouga/LlamaFactory
Releases · hiyouga/LlamaFactory
Release list
v0.9.5: Qwen3.5/3.6, Gemma 4, Transformers v5
Added primary support for Qwen3.5/Qwen3.6/Gemma4 models and compatibility with Transformers v5.
What's Changed
- [misc] set dev version by @hiyouga in #9703
- fix(fp8): add Transformer Engine backend support by @sbhavani in #9705
- [misc] Compatible with an empty architectures field in config.json by @tangefly in #9709
- [model] support Youtu-LLM-2B by @isLinXu in #9707
- [misc] lint by @hiyouga in #9710
- Update pyproject.toml and requirements by @jiaqiw09 in #9714
- [v1] add init plugin by @hiyouga in #9716
- [misc] Add a PyTorch version warning for Conv3D. by @tangefly in #9715
- [feature] add support for EAFT loss by @ymxyll in #9720
- [v1] add cli sampler by @hiyouga in #9721
- [v1] add renderer ut by @hiyouga in #9722
- Update README.md by @tangefly in #9724
- [CI]improve cuda ci cache by @frozenleaves in #9725
- Add support for LiquidAI's LFM2.5 (Liquid Foundation Models) to LLaMA-Factory. by @vovanphuc in #9726
- Add support for LiquidAI's LFM2.5-VL vision-language model by @vovanphuc in #9729
- [misc] fix parser by @hiyouga in #9730
- [refactor] rename lfm template to lfm2 and add LFM 2.5 to README by @vovanphuc in #9731
- [fix] correct ktransformers example config paths and templates by @JimmyPeilinLi in #9732
- [model] support for microsoft's Phi-4-mini by @ctx289 in #9734
- [misc] fix fp8 by @hiyouga in #9742
- [v1] add batch generator by @hiyouga in #9744
- [deps] fix package by @hiyouga in #9745
- [model] support HY-MT model by @isLinXu in #9746
- [v1] upgrade batching by @hiyouga in #9751
- [model] fixed&added Hunyuan models by @isLinXu in #9750
- [v1] add sft by @hiyouga in #9752
- using mp to run kernel test by @frozenleaves in #9754
- [v1] fix kernel moe patch by @jiaqiw09 in #9867
- [misc] update mcore related docker and mca supported models by @Kuangdd01 in #10114
- [feat] support
all_exhausted_without_replacementin datasets.interleave_datasets by @Moenupa in #10112 - chore: Update outdated GitHub Actions versions by @pgoslatara in #10123
- [v1] support training with fsdp2 by @frozenleaves in #9773
- [v0] Fix reward model training safetensors saving by @jiaqiw09 in #10137
- Fix : add visual.pos_embed to Qwen3-VL visual model keys by @je1lee in #10139
- [feature] support using ray.remote to start distributed training. by @xvxuopop in #10109
- update peft, deepspeed, adapt transformers v5 by @frozenleaves in #10147
- [model] support youtu-vl model by @isLinXu in #10152
- Fix race condition in LoggerHandler during multi-GPU training by @yurekami in #10156
- [assets] update readme by @hiyouga in #10159
- [model] support MiniCPM-o-4.5 by @isLinXu in #10163
- add dpo/kto fsdp fsdp2 support by @UsernameFull in #10127
- [model] support GLM-4.7-Flash SFT by @Shanay-Mehta in #10173
- [v1] init commit for v1 docs by @frozenleaves in #10145
- [model] support GLM-OCR SFT by @Ataraxy33 in #10183
- [model] add liger kernel support for Qwen3-Next by @Shanay-Mehta in #10176
- [V1] Add v1 LoRA/Freeze support and merge workflow by @jiaqiw09 in #10157
- Add ASFT by @susjunyou in #10174
- [V1] support deepspeed by @frozenleaves in #10181
- [v1] support quantization by @sunyi0505 in #10161
- [v0/v1] fix ut huggingface hub 429 error when transformers>=5.0.0 by @jiaqiw09 in #10155
- [mca] update supported models by @Kuangdd01 in #10196
- fix: remove safe_serialization arg for transformers v5 compatibility by @Alm0stSurely in #10208
- Add DeepSpeed Z3 leaf module for Qwen3-Next by @Shanay-Mehta in #10194
- [model] Adapt Qwen3.5 by @frozenleaves in #10213
- [model] update constants by @hiyouga in #10220
- [model] support Aeva by @louzongzhi in #10214
- upgrade to ROCm 7.2 base image, drop PyTorch reinstall by @mjkvaak-amd in #10223
- [fix] register visual part for Qwen3.5 by @Kuangdd01 in #10227
- [V1] add seed for training and fix gradient checkpointing by @jiaqiw09 in #10211
- fix(vllm): support mixed multimodal payloads by @phiott in #10225
- [misc] fix constants by @hiyouga in #10232
- Add Trackio Integration for LlamaFactory by @ParagEkbote in #10165
- [model] support Qwen3.5 all series models by @isLinXu in #10237
- fix: qwen3.5 projector path by @LittleYanlin in #10242
- fix: get ray head ip by @SnowCharmQ in #10252
- [V1] Support meta loading for full and free by @jiaqiw09 in #10236
- fix: Fix compatibility issue with HuggingFace Dataset Column when sav… by @pyxnpyx in #10254
- docs: fix Python version requirement from 3.10 to >=3.11.0 by @ll0v0ll in #10259
- fix: convert filter() to list in read_cloud_json to fix broken empty-check by @jnMetaCode in #10260
- [mca] support qwen3.5 by @Kuangdd01 in #10265
- fix(mm): fallback to audio_processor when feature_extractor is missing by @xxddccaa in #10267
- update npu docker by @frozenleaves in #10268
- fix(template): correct gpt_oss format_assistant by @RuijieH in #10269
- fix: make position_id_per_seconds configurable for Qwen2OmniPlugin by @LincolnBurrows2017 in #10281
- fix: unused keys in ray example by @SnowCharmQ in #10290
- [v1] add qwen3 templates and fix rendering plugin. by @xvxuopop in #10212
- fix: handle empty content list in system message by @LincolnBurrows2017 in #10291
- fix(MiniCPMVPlugin): fix IndexError in process_messages when training with video by @xxddccaa in #10276
- feat(data): add SGSC zero-hallucination B2B dataset (NOO-Protocol) by @robertglools in #10284
- [fix] fit neat_packing & mrope model packing by @Kuangdd01 in #10283
- chore: mca workflow compatible with qwen-vl series by @Kuangdd01 in #10303
- [liger_kernel] support Qwen3.5. by @wyt2000 in #10313
- fix: mimo-v2 tool call by @isLinXu in #10315
- [v1] add callbacks by @jiaqiw09 in #10255
- ci: add nginx cache config for Ascend NPU CI environment by @Goalina in #10323
- [V1]add init on rank0 for fsdp2 by @jiaqiw09 in #10264
- [v1] support ulysses cp for fsdp2 by @sunyi0505 in #10262
- [feat] support LlamaFactory SFT training by HyperParallel FSDP2 backend by @Cui-yshoho in #10289
- fix moe by @frozenleaves in #10334
- fix: qwen3vl timest...
v0.9.4: Goodbye 2025
Farewell to 2025. Thank you to all contributors and supporters. We will continue to deliver an easy and efficient LLM fine-tuning framework to the community in 2026. Stay tuned.
Breaking
- Repository name updated: LLaMA-Factory → LlamaFactory
- Python 3.9–3.10 have been deprecated; LlamaFactory now requires Python 3.11–3.13
- Migrated from pip to uv; use
uv pip install llamafactory - The official LlamaFactory blog is now live: https://blog.llamafactory.net/en/
New features
- 🔥 Support Orthogononal Fine-Tuning (OFT) by @zqiu24 in #8623
- 🔥 Support Semantic Initialization for new added tokens by @ximinng in #9267
- 🔥 Support Megatron-LM training via MCoreAdapter by @Kuangdd01 in #9237
- 🔥 Support KTransformers backend by @JimmyPeilinLi in #9400
- Support MPO algorithm by @Kuangdd01 in #8930
- Support FP8 training by @penfever in #8960
- Support Transformers v5 by @tangefly in #9569
- Support reasoning and plaintext in function call message by @tangefly in #9610
- Support DeepSpeed AutoTP by @sunyi0505 in #9602
- Support efficient NPU fused kernels by @frozenleaves in #9520
- Support TRL 0.24 by @UsernameFull in #9617
Models
- Falcon H1 by @dhiaEddineRhaiem in #8403
- Kimi-VL and GLM-4.5V by @Kuangdd01 in #8462
- Gemma3n by @Kuangdd01 in #8509
- Granite4 by @Tuyohai in #8680
- Qwen3-2507 by @hiyouga in #8750
- MiniCPM-V 4.0 by @ZMXJJ in #8813
- Intern-S1-mini by @hhaAndroid in #8976
- Seed-OSS by @Kuangdd01 in #8992
- MiniCPM-V 4.5 by @tc-mb in #9022
- InternVL-3.5 by @Kuangdd01 in #9028
- ERNIE-4.5-Text and ERNIE-4.5-VL by @isLinXu in #9165
- Ling-V2 by @wangsff in #9188
- Qwen3-VL and Qwen3-Omni by @xvxuopop and @Kuangdd01 in #9196
- Hunyuan-mt by @wyfdgg in #9284
- GLM-4.6V by @isLinXu in #9586
- Ministral 3 by @tangefly in #9582
- VibeThinker by @isLinXu in #9616
- MiMo-V2-Flash by @isLinXu in #9637
- MiniMax-M1 and MiniMax-M2 by @isLinXu in #9680
Thanks to teams collaborating with LlamaFactory in 2025
- NPU Team: @jiaqiw09 @frozenleaves @xvxuopop @UsernameFull @codemayq
- KTransformers Team: @JimmyPeilinLi @poryfly @mrhaoxx
- ROLL Team
And to individuals who made significant contributions
Full Changelog: v0.9.3...v0.9.4
v0.9.3: Llama4, Gemma3, Qwen3, InternVL3, Qwen2.5-Omni
We will attend the AWS Summit Shanghai 2025 on June 20th! See you in Shanghai 👋
New features
- 🔥 InternVL2.5/InternVL3 model by @Kuangdd01 in #7258
- 🔥 Qwen2.5-Omni model by @Kuangdd01 in #7537
- 🔥 Llama 4 and Gemma 3 multimodal model by @hiyouga in #7273 and #7611
- 🔥 Official GPU docker image by @yzoaim in #8181
- 🔥 SGLang inference by @Qiaolin-Yu and @jhinpan in #7278
- GLM-4-0414 and GLM-Z1 model by @zRzRzRzRzRzRzR in #7695
- Kimi-VL model by @Kuangdd01 in #7719
- Qwen3 model by @hiyouga in #7885
- MiMo and MiMo-VL model by @Kuangdd01 in #7946 #8249
- SmolLM/SmolLM2 model by @akshatsehgal in #8050 #8220
- MiniCPM4 model by @LDLINGLINGLING in #8314
- Mistral-Small-3.1 model by @Kuangdd01 in #8335
- Add
scripts/eval_bleu_rouge.pyby @SnowFox4004 in #7419 - Add Muon optimizer by @tianshijing in #7749
- Support video/audio inference with vLLM by @hiyouga in #7566
- Support S3/GCS cloud data by @erictang000 in #7567
- Support vLLM-ascend by @leo-pony in #7739
- Support OmegaConf by @hiyouga in #7793
- Support early-stopping by @hiyouga in #7797
- Add
enable_thinkingargument for reasoning models by @hiyouga in #7928 - PyTorch-elastic and fault-tolerant launch by @hubutui in #8286
- Length Desensitization DPO (LD-DPO) by @amangup in #8362
New models
- Base models
- SmolLM/SmolLM2 (135M/360M/1.7B) 📄
- Qwen3 Base (0.6B/1.7B/4B/8B/14B/30B) 📄
- Gemma 3 (1B/4B/12B/27B) 📄🖼️
- MedGemma (4B) 📄🩺
- MiMo Base (7B) 📄
- Seed-Coder Base (8B) 📄⌨️
- Mistral-Small-3.1 Base (24B) 📄🖼️
- GLM-4-0414 Base (32B) 📄
- Llama 4 (109B/492B) 📄🖼️
- Instruct/Chat models
- SmolLM/SmolLM2 Instruct (135M/360M/1.7B) 📄🤖
- MiniCPM4 (0.5B/8B) 📄🤖
- Qwen3 (0.6B/1.7B/4B/8B/14B/32B/30B/235B) 📄🤖🧠
- Gemma 3 Instruct (1B/4B/12B/27B) 📄🤖🖼️
- InternVL2.5/3 Instruct/MPO (1B/2B/8B/14B/38B/78B) 📄🤖🖼️
- Qwen2.5-Omni (3B/7B) 📄🤖🖼️🔈
- MedGemma Instruct (4B/27B) 📄🤖🩺
- MiMo SFT/RL (7B) 📄🤖
- MiMo-VL SFT/RL (7B) 📄🤖🖼️
- Hunyuan Instruct (7B) 📄🤖
- Seed-Coder Instruct/Reasoning (8B) 📄🤖🧠⌨️
- GLM-4-0414/GLM-Z1 Instruct (9B/32B) 📄🤖🧠
- DeepSeek-R1-0528 (8B/671B) 📄🤖🧠
- Kimi-VL Instruct/Thinking (17B) 📄🤖🧠🖼️
- Mistral-Small-3.1 Instruct (24B) 📄🤖🖼️
- Qwen2.5-VL Instruct (32B) 📄🤖🖼️
- Llama 4 Instruct (109B/492B) 📄🤖🖼️
New datasets
- Preference datasets
- COIG-P (zh) 📄
Bug fix
- Fix add new tokens by @flashJd in #7253
- Fix ultrachat_200k dataset by @felladrin in #7259
- Add efficient 4D attention mask for neat packing by @BlackWingedKing in #7272
- Fix WSD lr scheduler by @x22x22 in #7304
- Fix position ids in neat packing by @BlackWingedKing in #7318
- Fix proxy setting in webui by @taoharry in #7332
- Improve entrypoint by @ENg-122 in #7345
- Fix ray destroy process group by @erictang000 in #7395
- Fix SGLang dependencies by @guoquan in #7432
- Upgrade docker package version by @rumichi2210 in #7442
- Update liger kernel for qwen2.5-vl by @xiaosu-zhu in #7453
- Fix lora on quant models by @GuoCoder in #7456
- Enable liger kernel for gemma3 by @kennylam777 in #7462
- Enable liger kernel for paligemma by @eljandoubi in #7466
- Add Swanlab lark notification by @Xu-pixel in #7481
- Fix gemma3 use cache attribute by @ysjprojects in #7500
- Fix pixtral plugin by @Kuangdd01 in #7505
- Fix KTO mismatch pair strategy by @himalalps in #7509
- Support
dataset_shardsby @aliencaocao in #7530 - Fix qwen2.5omni plugin by @Kuangdd01 in #7573 #7578 #7883
- Fix ppo trainer by @gechengze in #7576
- Fix workflow by @Shawn-Tao in #7635
- Support qwen2.5omni audio+video2text by @Kuangdd01 in #7638
- Upgrade deps for SGLang by @adarshxs in #7639
- Allow ray env setting by @erictang000 in #7647
- Fix CUDA warning on intel xpus by @jilongW in #7655
- Fix liger kernel patch by @danny980521 in #7660
- Fix rocm dockerfile by @fluidnumerics-joe in #7725
- Fix qwen2vl with neat packing by @GeoffreyChen777 in #7754
- Fix a constant by @AlphaBladez in #7765
- Fix autogptq for Gemma by @ddddng in #7786
- Fix internvl models by @Kuangdd01 in #7801 #7803 #7817 #8129
- Fix DeepSpeed ZeRO3 on moe models by @hiyouga in #7826 #7879
- Fix gradient checkpoint func for vit by @hiyouga in #7830
- Support S3 ray storage by @erictang000 in #7854
- Fix Kimi-VL attention by @Kuangdd01 in #7867
- Fix minicpm-o vllm inference by @hiyouga in #7870
- Unfreeze muiltimodal projector in freeze training by @zhaop-l in #7872
- Fix Qwen2.5-omni plugin by @hiyouga in #7875 #7962
- Add warp support link by @ericdachen in #7887
- Replace eos token for base model by @hiyouga in #7911
- Add
eval_on_each_datasetarg by @hiyouga in #7912 - Fix qwen3 loss by @hiyouga in #7923 #8109
- Add repetition_penalty to api by @wangzhanxd in #7958
- Add graphgen to readme by @tpoisonooo in #7974
- Support video params in vllm batch infer by @Kuangdd01 in #7992
- Fix tool formatter by @yunhao-tech in #8000
- Fix kimi vl plugin by @hiyouga in #8015
- Support batch preprocess in vllm batch infer by @Shawn-Tao in #8051
- Support loading remote folder by @erictang000 in #8078
- Fix video utils import by @Kuangdd01 in #8077
- Fix SGLang LoRA inference by @Kiko-RWan in #8067
- Fix cli by @Wangbiao2 in #8095
- Fix pretrain workflow by @SunnyHaze in #8099
- Fix rope args for yarn by @piamo in #8101
- Add no build isolation in installing by @hiyouga in #8103
- Switch to GPTQModel and deprecate AutoGPTQ by @hiyouga in #8108
- Support llama3 parallel function call by @hiyouga in #8124
- Add
data_shared_file_systemby @hiyouga in #8179 - Fix load remote files by @youngwookim in #8183
- Fix dataset info by @Muqi1029 in #8197
- Fix qwen2.5 omni merge script by @Kuangdd01 in #8227 #8293
- Add unittest for VLM save load by @Kuangdd01 in #8248
- Add tag in swanlab by @Zeyi-Lin in #8258
- Support input video frames by @Kuangdd01 in #8264
- Fix empty template by @hiyouga in #8312
- Support full-finetuning with unsloth by @Remorax in #8325
- Add awesome work by @MING-ZCH in #8333
- Release v0.9.3 by @hiyouga in #8386
- Fix qwen2vl position ids by @hiyouga in #8387
- Fix vlm utils by @hiyouga in #8388
- Fix #3802 #4443 #5548 #6236 #6322 #6432 #6708 #6739 #6881 #6919 #7080 #7105 #7119 #7225 #7267 #7327 #7389 #7416 #7427 #7428 #7443 #7447 #7454 #7490 #7501 #7502 #7513 #7520 #7541 #7545 #7552 #7563 #7598 #7600 #7613 #7636 #7678 #7680 #7687 #7688 #7730 #7743 #7772 #7791 #7800 #7816 #7829 #7845 #7865 #7874 #7889 #7905 #7906 #7907 #7909 #7916 #7918 #7919 #7939 #7953 #7965 #7990 #8008 #8056 #8061 #8066 #8069 #8087 #8091 #8092 #8096 #8097 #8111 #8119 #8147 #8166 #8169 #8174 #8182 #8189 #8223 #8241 #8247 #8253 #8294 #8309 #8324 #8326 #8332
Full Changelog: v0.9.2...v0.9.3
v0.9.2: MiniCPM-o, SwanLab, APOLLO
We will attend the vLLM Beijing Meetup on Mar 16th! See you in Beijing 👋
New features
- 🔥 APOLLO optimizer by @zhuhanqing in #6617
- 🔥 SwanLab experiment tracker by @Zeyi-Lin in #6401
- 🔥 Ray Trainer by @erictang000 in #6542
- Batch inference with vLLM TP by @JieShenAI in #6190
- QLoRA on Ascend NPU by @codemayq in #6601
- Yarn and Llama3 rope scaling by @hiyouga in #6693
- Support
uv runby @erictang000 in #6907 - Ollama modelfile auto-generation by @codemayq in #4686
- Mistral tool prompt by @AlongWY in #5473
- Llama3 and Qwen2 tool prompt by @hiyouga in #6367 and #6369
New models
- Base models
- GPT2 (0.1B/0.4B/0.8B/1.5B) 📄
- Granite 3.0-3.1 (1B/2B/3B/8B) 📄
- PaliGemma2 (3B/10B/28B) 📄🖼️
- Moonlight (16B) 📄
- DeepSeek V2-V2.5 Base (236B) 📄
- DeepSeek V3 Base (671B) 📄
- Instruct/Chat models
- Granite 3.0-3.1 (1B/2B/3B/8B) by @Tuyohai in #5922 📄🤖
- DeepSeek R1 (1.5B/7B/8B/14B/32B/70B/671B) by @Qwtdgh in #6767 📄🤖
- TeleChat2 (3B/7B/12B/35B/115B) @ge-xing in #6313 📄🤖
- Qwen2.5-VL (3B/7B/72B) by @hiyouga in #6779 📄🤖🖼️
- PaliGemma2-mix (3B/10B/28B) by @Kuangdd01 in #7060 📄🤖🖼️
- Qwen2 Audio (7B) by @BUAADreamer in #6701 📄🤖🔈
- MiniCPM-V/MiniCPM-o (8B) by @BUAADreamer in #6598 and #6631 📄🤖🖼️🔈
- InternLM3-Instruct (8B) by @hhaAndroid in #6640 📄🤖
- Marco-o1 (8B) 📄🤖
- Skywork-o1 (8B) 📄🤖
- Phi-4 (14B) 📄🤖
- Moonlight Instruct (16B) 📄
- Mistral Small (24B) 📄🤖
- QwQ (32B) 📄🤖
- Llama-3.3-Instruct (70B) 📄🤖
- QvQ (72B) 📄🤖🖼️
- DeepSeek V2-V2.5 (236B) 📄🤖
- DeepSeek V3 (671B) 📄🤖
New datasets
- Supervised fine-tuning datasets
- OpenO1 (en) 📄
- Open Thoughts (en) 📄
- Open-R1-Math (en) 📄
- Chinese-DeepSeek-R1-Distill (zh) 📄
Changes
- Refactor VLMs register by @hiyouga in #6600
- Refactor mm plugin by @hiyouga in #6895
- Refactor template by @hiyouga in #6896
- Refactor data pipeline by @hiyouga in #6901
- Update vlm arguments by @hiyouga in #6976
- We have cleaned large files in git history using BFG Repo-Cleaner, find the backup repo here
Bug fix
- Add
trust_remote_codeoption by @yafshar in #5819 - Fix mllama config by @hiyouga in #6137 and #6140
- Fix mllama pad by @hiyouga in #6151 and #6874
- Pin tokenizers version by @hiyouga in #6157
- Fix tokenized data loading by @village-way in #6160
- Show hostname in webui by @hykilpikonna in #6170
- Fix VLMs zero3 training by @hiyouga in #6233
- Add
skip_special_tokensby @hiyouga in #6363 - Support non-reenterent-gc by @hiyouga in #6364
- Add
disable_shufflingoption by @hiyouga in #6388 - Fix gen kwargs by @hiyouga in #6395
- Enable module run by @youkaichao in #6457
- Fix eval loss value by @hiyouga in #6465
- Fix paligemma inference by @hiyouga in #6483
- Add deepseek v3 template by @piamo in #5507
- Add http proxy argument in dockerfile by @shibingli in #6462
- Fix trainer generate by @hiyouga in #6512
- Fix pixtral DPO training by @hiyouga in #6547
- Fix ray args by @stephen-nju in #6564
- Fix minicpm template by @BUAADreamer in #6620
- Fix stop tokens for visual detection by @hiyouga in #6624
- Pin vllm version by @hiyouga in #6629
- Fix mllama any image by @hiyouga in #6637 and #7053
- Fix tokenizer max length by @xiaosu-zhu in #6632
- Fix webui locale by @steveepreston in #6653
- Fix MiniCPM-o DPO training by @BUAADreamer in #6657
- Fix Qwen2 MoE training by @hiyouga in #6684
- Upgrade to gradio 5 by @hiyouga in #6688
- Support Japanese local file by @engchina in #6698
- Fix DPO loss by @yinpu in #6722
- Webui thinking mode by @hiyouga in #6778
- Upgrade to transformers 4.48 by @hiyouga in #6628
- Fix ci by @hiyouga in #6787
- Fix instructions about installing fa2 on win platform in readme by @neavo in #6788
- Fix minicpmv plugin by @BUAADreamer in #6801, #6890, #6946 and #6998
- Fix qwen2 tool prompt by @yueqis in #6796
- Fix llama pro by @hiyouga in #6814
- Allow thought in function call by @yueqis in #6797
- Add
ALLOW_EXTRA_ARGSby @hiyouga in #6831 - Fix Qwen2vl plugin by @hiyouga in #6855
- Upgrade vllm to 0.7.2 by @hiyouga in #6857
- Fix unit test for tool using by @hiyouga in #6865
- Skip broken data in sharegpt converter by @JJJYmmm in #6879
- Fix qwen2.5 plugin for video by @JJJYmmm in #6868
- Parsing chat template from tokenizer by @hiyouga in #6905 (experimental)
- Fix mllama KTO training by @marko1616 in #6904
- Fix grad checkpointing by @hiyouga in #6916 and #6931
- Fix ollama template by @hiyouga in #6902
- Fix ray example by @erictang000 in #6906
- Improve error handling for media by @noahc1510 in #6128
- Support split on each dataset by @SrWYG in #5522
- Fix gen kwargs in training by @aliencaocao in #5451
- Liger kernel for qwen2.5vl by @hiyouga in #6930
- Fix lora target modules by @hiyouga in #6944
- Add
ray_storage_pathby @erictang000 in #6920 - Fix trainer.predict by @hiyouga in #6972
- Add min resolution control by @hiyouga in #6975
- Upgrade transformers to 4.49 by @hiyouga in #6982
- Add seed in vllm batch predict by @JieShenAI in #7058
- Fix pyproject.toml by @hiyouga in #7067
- Upgrade CANN images by @leo-pony in #7061
- Display swanlab link by @Zeyi-Lin in #7089
- Fix hf engine by @hiyouga in #7120
- Add bailing chat template by @oldstree in #7117
- Use bicubic resampler instead of nearest by @hiyouga in #7143
- Fix Qwen2Audio plugin by @lsrami in #7166
- Destroy process group by @hiyouga in #7174
- Fix swanlab callback by @Zeyi-Lin in #7176
- Fix paligemma plugin by @hiyouga in #7181
- Escape html tag in webui by @hiyouga in #7190
- Upgrade vllm to 0.7.3 by @hiyouga in #7183 and #7193
- Fix parser by @hiyouga in #7204
- Fix function formatter by @zhangch-ss in #7201
- Fix deepspeed config by @hiyouga in #7205
- Fix dataloader by @hiyouga in #7207
- Fix export tokenizer by @hiyouga in #7230
- Update arguments by @hiyouga in #7231
- Add
swanlab_logdirby @Zeyi-Lin in #7219 - Fix vllm batch prediction by @hiyouga in #7235
- Avoid exit after saving tokenized data by @hiyouga in #7244
- Support commit in env by @hiyouga in #7247
- Release v0.9.2 by @hiyouga in #7242
- Fix #1204 #3306 #3462 #5121 #5270 #5404 #5444 #5472 #5518 #5616 #5712 #5714 #5756 #5944 #5986 #6020 #6056 #6092 #6136 #6139 #6149 #6165 #6213 #6287 #6320 #6345 #6345 #6346 #6348 #6358 #6362 #6391 #6415 #6439 #6448 #6452 #6482 #6499 #6543 #6546 #6551 #6552 #6610 #6612 #6636 #6639 #6662 #6669 #6738 #6772 #6776 #6780 #6782 #6793 #6806 #6812 #6819 #6826 #6833 #6839 #6850 #6854 #6860 #6878 #6885 #6889 #6937 #6948 #6952 #6960 #6966 #6973 #6981 #7036 #7064 #7072 #7116 #7125 #7130 #7171 #7173 #7180 #7182 #7184 #7192 #7198 #7213 #7234 #7243
Full Changelog: v0.9.1...v0.9.2
v0.9.1: Many Vision Models, Qwen2.5 Coder, Gradient Fix
New features
- 🔥Support Llama-3.2 and Llama-3.2-Vision by @marko1616 in #5547 and #5555
- 🔥Support LLaVA-NeXT, LLaVA-NeXT-Video and Video-LLaVA by @BUAADreamer in #5574
- 🔥Support Pixtral model by @Kuangdd01 in #5581
- Support EXAONE3.0 by @shing100 in #5585
- Support Index-series models by @Cuiyn in #5910
- Support Liger-Kernel for Qwen2-VL by @aliencaocao in #5438
- Support download models from ModelHub by @huniu20 in #5642
- Fix abnormal loss values in transformers 4.46 by @hiyouga in #5852 #5871
- Support multi-image inference by @hiyouga in #5895
- Support calculating effective tokens for SFT and DPO by @wtmlon in #6078
Note: now you can install transformers>=4.46.0,<=4.46.1 to make the gradient accumulation fix enabled.
New models
- Base models
- Qwen2.5 (0.5B/1.5B/3B/7B/14B/32B/72B) 📄
- Qwen2.5-Coder (0.5B/1.5B/3B/7B/14B/32B) 📄🖥️
- Llama-3.2 (1B/3B) 📄
- OpenCoder (1.5B/8B) 📄🖥️
- Index (1.9B) 📄
- Instruct/Chat models
- Qwen2.5-Instruct (0.5B/1.5B/3B/7B/14B/32B/72B) 📄🤖
- Qwen2.5-Coder-Instruct (0.5B/1.5B/3B/7B/14B/32B) 📄🤖🖥️
- Llama-3.2-Instruct (1B/3B) 📄🤖
- OpenCoder-Instruct (1.5B/8B) 📄🤖🖥️
- Index-Chat (1.9B) 📄🤖
- LLaVA-NeXT (7B/8B/13B/34B/72B/110B) 📄🤖🖼️
- LLaVA-NeXT-Video (7B/34B) 📄🤖🖼️
- Video-LLaVA (7B) 📄🤖🖼️
- Pixtral (12B) 📄🤖🖼️
- EXAONE-3.0-Instruct (8B) 📄🤖
Security fix
- Fix CVE-2024-52803 by @superboy-zjc in aa6a174
Bug fix
- Update version of rocm docker by @HardAndHeavy in #5427
- Fix Phi-3-small template by @menibrief in #5475
- Fix function call dataset process function by @whybeyoung in #5483
- Add docker args by @StrangeBytesDev in #5533
- Fix logger by @chengchengpei in #5546
- Fix Gemma2 flash attention warning by @amrear in #5580
- Update setup by @johnnynunez in #5615 #5665
- Add project by @NLPJCL in #5801
- Fix saving Qwen2-VL processor by @hiyouga in #5857
- Support change base image in dockerfile by @sd3ntato in #5880
- Fix template replace behaviour by @hiyouga in #5907
- Add
image_dirargument by @hiyouga in #5909 - Add rank0 logger by @hiyouga in #5912
- Fix DPO metrics by @hiyouga in #5913 #6052
- Update datasets version by @hiyouga in #5926
- Fix chat engines by @hiyouga in #5927
- Fix vllm 0.6.3 by @hiyouga in #5970
- Fix extra args in llamaboard by @hiyouga in #5971
- Fix vllm input args by @JJJJerry in #5973
- Add
vllm_configargs by @hiyouga in #5982 #5990 - Add shm_size in docker compose config by @XYZliang in #6010
- Fix tyro version by @hiyouga in #6065
- Fix ci by @hiyouga in #6120
- Fix Qwen2-VL inference on vLLM by @hiyouga in #6123 #6126
- Release v0.9.1 by @hiyouga in #6124
- Fix #3881 #4712 #5411 #5542 #5549 #5611 #5668 #5705 #5747 #5749 #5768 #5796 #5797 #5883 #5904 #5966 #5988 #6050 #6061
Full Changelog: v0.9.0...v0.9.1
v0.9.0: Qwen2-VL, Liger-Kernel, Adam-mini
Congratulations on 30,000 stars 🎉 Follow us at X (twitter)
New features
- 🔥Support fine-tuning Qwen2-VL model on multi-image datasets by @simonJJJ in #5290
- 🔥Support time&memory-efficient Liger-Kernel via the
enable_liger_kernelargument by @hiyouga - 🔥Support memory-efficient Adam-mini optimizer via the
use_adam_miniargument by @relic-yuexi in #5095 - Support fine-tuning Qwen2-VL model on video datasets by @hiyouga in #5365 and @BUAADreamer in #4136 (needs patch huggingface/transformers#33307)
- Support fine-tuning vision language models (VLMs) using RLHF/DPO/ORPO/SimPO approaches by @hiyouga
- Support Unsloth's asynchronous activation offloading method via the
use_unsloth_gcargument - Support vLLM 0.6.0 version
- Support MFU calculation by @yzoaim in #5388
New models
- Base models
- Qwen2-Math (1.5B/7B/72B) 📄🔢
- Yi-Coder (1.5B/9B) 📄🖥️
- InternLM2.5 (1.8B/7B/20B) 📄
- Gemma-2-2B 📄
- Meta-Llama-3.1 (8B/70B) 📄
- Instruct/Chat models
- MiniCPM/MiniCPM3 (1B/2B/4B) by @LDLINGLINGLING in #4996 #5372 📄🤖
- Qwen2-Math-Instruct (1.5B/7B/72B) 📄🤖🔢
- Yi-Coder-Chat (1.5B/9B) 📄🤖🖥️
- InternLM2.5-Chat (1.8B/7B/20B) 📄🤖
- Qwen2-VL-Instruct (2B/7B) 📄🤖🖼️
- Gemma-2-2B-it by @codemayq in #5037 📄🤖
- Meta-Llama-3.1-Instruct (8B/70B) 📄🤖
- Mistral-Nemo-Instruct (12B) 📄🤖
New datasets
- Supervised fine-tuning datasets
- Magpie-ultra-v0.1 (en) 📄
- Pokemon-gpt4o-captions (en&zh) 📄🖼️
- Preference datasets
- RLHF-V (en) 📄🖼️
- VLFeedback (en) 📄🖼️
Changes
- Due to compatibility consideration, fine-tuning vision language models (VLMs) requires
transformers>=4.35.0.dev0, trypip install git+https://github.com/huggingface/transformers.gitto install it. visual_inputshas been deprecated, now you do not need to specify this argument.- LlamaFactory now adopts lazy loading for multimodal inputs, see #5346 for details. Please use
preprocessing_batch_sizeto restrict the batch size in dataset pre-processing (supported by @naem1023 in #5323 ). - LlamaFactory now supports
lmf(equivalent tollamafactory-cli) as a shortcut command.
Bug fix
- Fix LlamaBoard export by @liuwwang in #4950
- Add ROCm dockerfiles by @HardAndHeavy in #4970
- Fix deepseek template by @piamo in #4892
- Fix pissa savecallback by @codemayq in #4995
- Add Korean display language in LlamaBoard by @Eruly in #5010
- Fix deepseekcoder template by @relic-yuexi in #5072
- Fix examples by @codemayq in #5109
- Fix
mask_historytruncate from last by @YeQiuO in #5115 - Fix jinja template by @YeQiuO in #5156
- Fix PPO optimizer and lr scheduler by @liu-zichen in #5163
- Add SailorLLM template by @chenhuiyu in #5185
- Fix XPU device count by @Zxilly in #5188
- Fix bf16 check in NPU by @Ricardo-L-C in #5193
- Update NPU docker image by @MengqingCao in #5230
- Fix image input api by @marko1616 in #5237
- Add liger-kernel link by @ByronHsu in #5317
- Fix #4684 #4696 #4917 #4925 #4928 #4944 #4959 #4992 #5035 #5048 #5060 #5092 #5228 #5252 #5292 #5295 #5305 #5307 #5308 #5324 #5331 #5334 #5338 #5344 #5366 #5384
v0.8.3: Neat Packing, Split Evaluation
New features
- 🔥Support contamination-free packing via the
neat_packingargument by @chuan298 in #4224 - 🔥Support split evaluation via the
eval_datasetargument by @codemayq in #4691 - 🔥Support HQQ/EETQ quantization via the
quantization_methodargument by @hiyouga - 🔥Support ZeRO-3 when using BAdam by @Ledzy in #4352
- Support train on the last turn via the
mask_historyargument by @aofengdaxia in #4878 - Add NPU Dockerfile by @MengqingCao in #4355
- Support building FlashAttention2 in Dockerfile by @hzhaoy in #4461
- Support
batch_eval_metricsat evaluation by @hiyouga
New models
- Base models
- InternLM2.5-7B 📄
- Gemma2 (9B/27B) 📄
- Instruct/Chat models
Changes
- Fix DPO cutoff len and deprecate
reserved_label_lenargument - Improve loss function for reward modeling
Bug fix
- Fix numpy version by @MengqingCao in #4382
- Improve cli by @kno10 in #4409
- Add
tool_formatparameter to control prompt by @mMrBun in #4417 - Automatically label npu issue by @MengqingCao in #4445
- Fix flash_attn args by @stceum in #4446
- Fix docker-compose path by @MengqingCao in #4544
- Fix torch-npu dependency by @hashstone in #4561
- Fix deepspeed + pissa by @hzhaoy in #4580
- Improve cli by @injet-zhou in #4590
- Add project by @wzh1994 in #4662
- Fix docstring by @hzhaoy in #4673
- Fix Windows command preview in WebUI by @marko1616 in #4700
- Fix vllm 0.5.1 by @T-Atlas in #4706
- Fix save value head model callback by @yzoaim in #4746
- Fix CUDA Dockerfile by @hzhaoy in #4781
- Fix examples by @codemayq in #4804
- Fix evaluation data split by @codemayq in #4821
- Fix CI by @codemayq in #4822
- Fix #2290 #3974 #4113 #4379 #4398 #4402 #4410 #4419 #4432 #4456 #4458 #4549 #4556 #4579 #4592 #4609 #4617 #4674 #4677 #4683 #4684 #4699 #4705 #4731 #4742 #4779 #4780 #4786 #4792 #4820 #4826
v0.8.2: PiSSA, Parallel Functions
New features
- Support GLM-4 tools and parallel function calling by @mMrBun in #4173
- Support PiSSA fine-tuning by @hiyouga in #4307
New models
- Base models
- DeepSeek-Coder-V2 (16B MoE/236B MoE) 📄
- Instruct/Chat models
- MiniCPM-2B 📄🤖
- DeepSeek-Coder-V2-Instruct (16B MoE/236B MoE) 📄🤖
New datasets
- Supervised fine-tuning datasets
- Neo-sft (zh)
- Magpie-Pro-300K-Filtered (en) by @EliMCosta in #4309
- WebInstruct (en) by @EliMCosta in #4309
Bug fix
- Fix DPO+ZeRO3 problem by @hiyouga
- Add MANIFEST.in by @iamthebot in #4191
- Fix eos_token in llama3 pretrain by @dignfei in #4204
- Fix vllm version by @kimdwkimdw and @hzhaoy in #4234 and #4246
- Fix Dockerfile by @EliMCosta in #4314
- Fix pandas version by @zzxzz12345 in #4334
- Fix #3162 #3196 #3778 #4198 #4209 #4221 #4227 #4238 #4242 #4271 #4292 #4295 #4326 #4346 #4357 #4362
v0.8.1: Patch release
v0.8.0: GLM-4, Qwen2, PaliGemma, KTO, SimPO
Stronger LlamaBoard 💪😀
- Support single-node distributed training in Web UI
- Add dropdown menu for easily resuming from checkpoints and picking saved configurations by @hiyouga and @hzhaoy in #4053
- Support selecting checkpoints of full/freeze tuning
- Add throughput metrics to LlamaBoard by @injet-zhou in #4066
- Faster UI loading
New features
- Add KTO algorithm by @enji-zhou in #3785
- Add SimPO algorithm by @hiyouga
- Support passing
max_lora_rankto the vLLM backend by @jue-jue-zi in #3794 - Support preference datasets in sharegpt format and remove big files from git repo by @hiyouga in #3799
- Support setting system messages in CLI inference by @ycjcl868 in #3812
- Add
num_samplesoption indataset_info.jsonby @seanzhang-zhichen in #3829 - Add NPU docker image by @dongdongqiang2018 in #3876
- Improve NPU document by @MengqingCao in #3930
- Support SFT packing with greedy knapsack algorithm by @AlongWY in #4009
- Add
llamafactory-cli envfor bug report - Support image input in the API mode
- Support random initialization via the
train_from_scratchargument - Initialize CI
New models
- Base models
- Qwen2 (0.5B/1.5B/7B/72B/MoE) 📄
- PaliGemma-3B (pt/mix) 📄🖼️
- GLM-4-9B 📄
- Falcon-11B 📄
- DeepSeek-V2-Lite (16B) 📄
- Instruct/Chat models
New datasets
- Pre-training datasets
- FineWeb (en)
- FineWeb-Edu (en)
- Supervised fine-tuning datasets
- Ruozhiba-GPT4 (zh)
- STEM-Instruction (zh)
- Preference datasets
- Argilla-KTO-mix-15K (en)
- UltraFeedback (en)
Bug fix
- Fix RLHF for multimodal finetuning
- Fix LoRA target in multimodal finetuning by @BUAADreamer in #3835
- Fix
yitemplate by @Yimi81 in #3925 - Fix abort issue in LlamaBoard by @injet-zhou in #3987
- Pass
scheduler_specific_kwargstoget_schedulerby @Uminosachi in #4006 - Fix hyperparameters helps by @xu-song in #4007
- Update issue template by @statelesshz in #4011
- Fix vllm dtype parameter
- Fix exporting hyperparameters by @MengqingCao in #4080
- Fix DeepSpeed ZeRO3 in PPO trainer
- Fix #3108 #3387 #3646 #3717 #3764 #3769 #3803 #3807 #3818 #3837 #3847 #3853 #3873 #3900 #3931 #3965 #3971 #3978 #3992 #4005 #4012 #4013 #4022 #4033 #4043 #4061 #4075 #4077 #4079 #4085 #4090 #4120 #4132 #4137 #4139