Skip to content

Releases: hiyouga/LlamaFactory

v0.9.5: Qwen3.5/3.6, Gemma 4, Transformers v5

Choose a tag to compare

@hiyouga hiyouga released this 30 May 15:57
Immutable release. Only release title and notes can be modified.
7af9095

Added primary support for Qwen3.5/Qwen3.6/Gemma4 models and compatibility with Transformers v5.

What's Changed

Read more

v0.9.4: Goodbye 2025

Choose a tag to compare

@hiyouga hiyouga released this 31 Dec 15:00
Immutable release. Only release title and notes can be modified.
95ac3f2

Farewell to 2025. Thank you to all contributors and supporters. We will continue to deliver an easy and efficient LLM fine-tuning framework to the community in 2026. Stay tuned.

Breaking

  • Repository name updated: LLaMA-Factory → LlamaFactory
  • Python 3.9–3.10 have been deprecated; LlamaFactory now requires Python 3.11–3.13
  • Migrated from pip to uv; use uv pip install llamafactory
  • The official LlamaFactory blog is now live: https://blog.llamafactory.net/en/

New features

Models

Thanks to teams collaborating with LlamaFactory in 2025

And to individuals who made significant contributions

Full Changelog: v0.9.3...v0.9.4

v0.9.3: Llama4, Gemma3, Qwen3, InternVL3, Qwen2.5-Omni

Choose a tag to compare

@hiyouga hiyouga released this 16 Jun 17:21

We will attend the AWS Summit Shanghai 2025 on June 20th! See you in Shanghai 👋

New features

New models

  • Base models
    • SmolLM/SmolLM2 (135M/360M/1.7B) 📄
    • Qwen3 Base (0.6B/1.7B/4B/8B/14B/30B) 📄
    • Gemma 3 (1B/4B/12B/27B) 📄🖼️
    • MedGemma (4B) 📄🩺
    • MiMo Base (7B) 📄
    • Seed-Coder Base (8B) 📄⌨️
    • Mistral-Small-3.1 Base (24B) 📄🖼️
    • GLM-4-0414 Base (32B) 📄
    • Llama 4 (109B/492B) 📄🖼️
  • Instruct/Chat models
    • SmolLM/SmolLM2 Instruct (135M/360M/1.7B) 📄🤖
    • MiniCPM4 (0.5B/8B) 📄🤖
    • Qwen3 (0.6B/1.7B/4B/8B/14B/32B/30B/235B) 📄🤖🧠
    • Gemma 3 Instruct (1B/4B/12B/27B) 📄🤖🖼️
    • InternVL2.5/3 Instruct/MPO (1B/2B/8B/14B/38B/78B) 📄🤖🖼️
    • Qwen2.5-Omni (3B/7B) 📄🤖🖼️🔈
    • MedGemma Instruct (4B/27B) 📄🤖🩺
    • MiMo SFT/RL (7B) 📄🤖
    • MiMo-VL SFT/RL (7B) 📄🤖🖼️
    • Hunyuan Instruct (7B) 📄🤖
    • Seed-Coder Instruct/Reasoning (8B) 📄🤖🧠⌨️
    • GLM-4-0414/GLM-Z1 Instruct (9B/32B) 📄🤖🧠
    • DeepSeek-R1-0528 (8B/671B) 📄🤖🧠
    • Kimi-VL Instruct/Thinking (17B) 📄🤖🧠🖼️
    • Mistral-Small-3.1 Instruct (24B) 📄🤖🖼️
    • Qwen2.5-VL Instruct (32B) 📄🤖🖼️
    • Llama 4 Instruct (109B/492B) 📄🤖🖼️

New datasets

  • Preference datasets
    • COIG-P (zh) 📄

Bug fix

Full Changelog: v0.9.2...v0.9.3

v0.9.2: MiniCPM-o, SwanLab, APOLLO

Choose a tag to compare

@hiyouga hiyouga released this 11 Mar 13:47

We will attend the vLLM Beijing Meetup on Mar 16th! See you in Beijing 👋

New features

New models

  • Base models
    • GPT2 (0.1B/0.4B/0.8B/1.5B) 📄
    • Granite 3.0-3.1 (1B/2B/3B/8B) 📄
    • PaliGemma2 (3B/10B/28B) 📄🖼️
    • Moonlight (16B) 📄
    • DeepSeek V2-V2.5 Base (236B) 📄
    • DeepSeek V3 Base (671B) 📄
  • Instruct/Chat models
    • Granite 3.0-3.1 (1B/2B/3B/8B) by @Tuyohai in #5922 📄🤖
    • DeepSeek R1 (1.5B/7B/8B/14B/32B/70B/671B) by @Qwtdgh in #6767 📄🤖
    • TeleChat2 (3B/7B/12B/35B/115B) @ge-xing in #6313 📄🤖
    • Qwen2.5-VL (3B/7B/72B) by @hiyouga in #6779 📄🤖🖼️
    • PaliGemma2-mix (3B/10B/28B) by @Kuangdd01 in #7060 📄🤖🖼️
    • Qwen2 Audio (7B) by @BUAADreamer in #6701 📄🤖🔈
    • MiniCPM-V/MiniCPM-o (8B) by @BUAADreamer in #6598 and #6631 📄🤖🖼️🔈
    • InternLM3-Instruct (8B) by @hhaAndroid in #6640 📄🤖
    • Marco-o1 (8B) 📄🤖
    • Skywork-o1 (8B) 📄🤖
    • Phi-4 (14B) 📄🤖
    • Moonlight Instruct (16B) 📄
    • Mistral Small (24B) 📄🤖
    • QwQ (32B) 📄🤖
    • Llama-3.3-Instruct (70B) 📄🤖
    • QvQ (72B) 📄🤖🖼️
    • DeepSeek V2-V2.5 (236B) 📄🤖
    • DeepSeek V3 (671B) 📄🤖

New datasets

  • Supervised fine-tuning datasets
    • OpenO1 (en) 📄
    • Open Thoughts (en) 📄
    • Open-R1-Math (en) 📄
    • Chinese-DeepSeek-R1-Distill (zh) 📄

Changes

Bug fix

Full Changelog: v0.9.1...v0.9.2

v0.9.1: Many Vision Models, Qwen2.5 Coder, Gradient Fix

Choose a tag to compare

@hiyouga hiyouga released this 24 Nov 17:17

New features

Note: now you can install transformers>=4.46.0,<=4.46.1 to make the gradient accumulation fix enabled.

New models

  • Base models
    • Qwen2.5 (0.5B/1.5B/3B/7B/14B/32B/72B) 📄
    • Qwen2.5-Coder (0.5B/1.5B/3B/7B/14B/32B) 📄🖥️
    • Llama-3.2 (1B/3B) 📄
    • OpenCoder (1.5B/8B) 📄🖥️
    • Index (1.9B) 📄
  • Instruct/Chat models
    • Qwen2.5-Instruct (0.5B/1.5B/3B/7B/14B/32B/72B) 📄🤖
    • Qwen2.5-Coder-Instruct (0.5B/1.5B/3B/7B/14B/32B) 📄🤖🖥️
    • Llama-3.2-Instruct (1B/3B) 📄🤖
    • OpenCoder-Instruct (1.5B/8B) 📄🤖🖥️
    • Index-Chat (1.9B) 📄🤖
    • LLaVA-NeXT (7B/8B/13B/34B/72B/110B) 📄🤖🖼️
    • LLaVA-NeXT-Video (7B/34B) 📄🤖🖼️
    • Video-LLaVA (7B) 📄🤖🖼️
    • Pixtral (12B) 📄🤖🖼️
    • EXAONE-3.0-Instruct (8B) 📄🤖

Security fix

Bug fix

Full Changelog: v0.9.0...v0.9.1

v0.9.0: Qwen2-VL, Liger-Kernel, Adam-mini

Choose a tag to compare

@hiyouga hiyouga released this 08 Sep 17:14

Congratulations on 30,000 stars 🎉 Follow us at X (twitter)

New features

New models

  • Base models
    • Qwen2-Math (1.5B/7B/72B) 📄🔢
    • Yi-Coder (1.5B/9B) 📄🖥️
    • InternLM2.5 (1.8B/7B/20B) 📄
    • Gemma-2-2B 📄
    • Meta-Llama-3.1 (8B/70B) 📄
  • Instruct/Chat models
    • MiniCPM/MiniCPM3 (1B/2B/4B) by @LDLINGLINGLING in #4996 #5372 📄🤖
    • Qwen2-Math-Instruct (1.5B/7B/72B) 📄🤖🔢
    • Yi-Coder-Chat (1.5B/9B) 📄🤖🖥️
    • InternLM2.5-Chat (1.8B/7B/20B) 📄🤖
    • Qwen2-VL-Instruct (2B/7B) 📄🤖🖼️
    • Gemma-2-2B-it by @codemayq in #5037 📄🤖
    • Meta-Llama-3.1-Instruct (8B/70B) 📄🤖
    • Mistral-Nemo-Instruct (12B) 📄🤖

New datasets

  • Supervised fine-tuning datasets
    • Magpie-ultra-v0.1 (en) 📄
    • Pokemon-gpt4o-captions (en&zh) 📄🖼️
  • Preference datasets
    • RLHF-V (en) 📄🖼️
    • VLFeedback (en) 📄🖼️

Changes

  • Due to compatibility consideration, fine-tuning vision language models (VLMs) requires transformers>=4.35.0.dev0, try pip install git+https://github.com/huggingface/transformers.git to install it.
  • visual_inputs has been deprecated, now you do not need to specify this argument.
  • LlamaFactory now adopts lazy loading for multimodal inputs, see #5346 for details. Please use preprocessing_batch_size to restrict the batch size in dataset pre-processing (supported by @naem1023 in #5323 ).
  • LlamaFactory now supports lmf (equivalent to llamafactory-cli) as a shortcut command.

Bug fix

v0.8.3: Neat Packing, Split Evaluation

Choose a tag to compare

@hiyouga hiyouga released this 18 Jul 18:00

New features

New models

  • Base models
    • InternLM2.5-7B 📄
    • Gemma2 (9B/27B) 📄
  • Instruct/Chat models
    • TeleChat-1B-Chat by @hzhaoy in #4651 📄🤖
    • InternLM2.5-7B-Chat 📄🤖
    • CodeGeeX4-9B-Chat 📄🤖
    • Gemma2-it (9B/27B) 📄🤖

Changes

  • Fix DPO cutoff len and deprecate reserved_label_len argument
  • Improve loss function for reward modeling

Bug fix

v0.8.2: PiSSA, Parallel Functions

Choose a tag to compare

@hiyouga hiyouga released this 19 Jun 13:06

New features

New models

  • Base models
    • DeepSeek-Coder-V2 (16B MoE/236B MoE) 📄
  • Instruct/Chat models
    • MiniCPM-2B 📄🤖
    • DeepSeek-Coder-V2-Instruct (16B MoE/236B MoE) 📄🤖

New datasets

Bug fix

v0.8.1: Patch release

Choose a tag to compare

@hiyouga hiyouga released this 10 Jun 16:50
  • Fix #2666: Unsloth+DoRA
  • Fix #4145: The PyTorch version of the docker image does not match the vLLM requirement
  • Fix #4160: The problem in LongLoRA implementation with the help of @f-q23
  • Fix #4167: The installation problem in the Windows system by @yzoaim

v0.8.0: GLM-4, Qwen2, PaliGemma, KTO, SimPO

Choose a tag to compare

@hiyouga hiyouga released this 07 Jun 22:26

Stronger LlamaBoard 💪😀

  • Support single-node distributed training in Web UI
  • Add dropdown menu for easily resuming from checkpoints and picking saved configurations by @hiyouga and @hzhaoy in #4053
  • Support selecting checkpoints of full/freeze tuning
  • Add throughput metrics to LlamaBoard by @injet-zhou in #4066
  • Faster UI loading

New features

  • Add KTO algorithm by @enji-zhou in #3785
  • Add SimPO algorithm by @hiyouga
  • Support passing max_lora_rank to the vLLM backend by @jue-jue-zi in #3794
  • Support preference datasets in sharegpt format and remove big files from git repo by @hiyouga in #3799
  • Support setting system messages in CLI inference by @ycjcl868 in #3812
  • Add num_samples option in dataset_info.json by @seanzhang-zhichen in #3829
  • Add NPU docker image by @dongdongqiang2018 in #3876
  • Improve NPU document by @MengqingCao in #3930
  • Support SFT packing with greedy knapsack algorithm by @AlongWY in #4009
  • Add llamafactory-cli env for bug report
  • Support image input in the API mode
  • Support random initialization via the train_from_scratch argument
  • Initialize CI

New models

  • Base models
    • Qwen2 (0.5B/1.5B/7B/72B/MoE) 📄
    • PaliGemma-3B (pt/mix) 📄🖼️
    • GLM-4-9B 📄
    • Falcon-11B 📄
    • DeepSeek-V2-Lite (16B) 📄
  • Instruct/Chat models
    • Qwen2-Instruct (0.5B/1.5B/7B/72B/MoE) 📄🤖
    • Mistral-7B-Instruct-v0.3 📄🤖
    • Phi-3-small-8k-instruct (7B) 📄🤖
    • Aya-23 (8B/35B) 📄🤖
    • OpenChat-3.6-8B 📄🤖
    • GLM-4-9B-Chat 📄🤖
    • TeleChat-12B-Chat by @hzhaoy in #3958 📄🤖
    • Phi-3-medium-8k-instruct (14B) 📄🤖
    • DeepSeek-V2-Lite-Chat (16B) 📄🤖
    • Codestral-22B-v0.1 📄🤖

New datasets

  • Pre-training datasets
    • FineWeb (en)
    • FineWeb-Edu (en)
  • Supervised fine-tuning datasets
    • Ruozhiba-GPT4 (zh)
    • STEM-Instruction (zh)
  • Preference datasets
    • Argilla-KTO-mix-15K (en)
    • UltraFeedback (en)

Bug fix