11 - Fine-Tuning Representation Models for Classification

Qianqian included in Hands-on Large Language Models

2025-10-06 About 300 words 2 minutes

Contents

Supervised Classification

In this section, we take a supervised approach to allow both the model and the classification head to be udpated during traning . For example, for a task-specific model, we can fine-tune both the representation model and the classification head as a single architecture as shown below.

trainable representation models — Fig. A task-specific model architecture with a pretained representation model (BERT) and an additional classification head for the specific task (both are trainable).

Few-Shot Classification

Few-show classification is a technique within supervised classification to have a classifier learn target labels based on only a few labeled examples. It is great when we need to you a classification task but do not have many labeled data points.

An exmple efficient framework of the few-shot text classifiction is SetFit has the following 3 steps:

Sampling training data based on in-class and out-class selection of labeled data
Fine-tuning a pretrained embedding model based on the above training data. The goal is to create embeddings that are tuned to the classification task
Train a classifier by creating a classification head on top of the embedding model and train it using the above training data

SetFit Steps — Fig. The 3 main steps of the SetFit (an efficient fine-tuning framework)

Continued Pretraining with Masked Language Modeling

Instead of adopting the two-step approach (e.g., pretain, then fine-tune) above, continued pretaining with masked language modeling add another step between them to continue training the model with data from our domain.

Named-Entity Recognition

Name-entity recoginition (NER) is helpful for de-identification and anonymization tasks when there is sensitive data. It allows for the classification of individual tokens and/or words, including people and locations.