12 - Fine-Tuning Generation Models

Qianqian included in Hands-on Large Language Models

2025-10-09 About 300 words 2 minutes

Contents

There are two most common methods for fine-tuning text generation models: supervised fine-tuning (SFT) and preference tuning.

The Three LLM Training Steps: Pretraining, Supervised Fine-Tuning, and Preference Tuning

Pretraining: Pretain on one or more massive text datasets, and the goal is to predict the next token to accurately learn linguistic and semantic representations in the text. It is a self-supervised method, and output a base model or foundation model.
Supervised fine-tuning (fine-tuning 1): Adapt the base model to follow instructions. The goal is still to predict the next token but now also based on the user input. It is often used to go from a base generative model to an instruction (or chat) generation mdoel.
Preference tuning (fine-tuning 2): Improves the quality of the model and makes it more aligned with the expected behavior of AI safety or human preferences.

Parameter-Efficient Fine-Tuning (PEFT)

During fine-tuning, updating ALL parameters of a model has a large potential of increasing its performance but very costly and slow to train, and requires significant storage, so we propose parameter-efficient fine-tuning (PEFT) to focus on fine-tuning pre-trained models at higher computational efficiency.

Adapters: A core component of many PEFT-based techniques, they add a small number of weights in certain places in the network that can be efficiently fine-tuned while leaving the majority of the model weights frozen.
Low-Rank Adaption (LoRA): A technique that only requires updating a small set of parameters by creating a small subset of the base model to fine-tune.