10 - Creating Text Embedding Models

Qianqian included in Hands-on Large Language Models

2025-10-05 About 200 words One minute

Contents

Embedding Models

Embedding models are used to process the textual inputs into numerical representations (embeddings) for further processing. One major technique to training, find-tune, and guide embedding models is the contrastive learning. Its idea is that the best way to learn and model similarity/dissimilarity between documents is by feeding a model examples of similar and dissimilar paris, so the model can learn from on to keep similar docs closer in vector space while dissimilar ones further apart. Two things are needed in order to perform contrastive learning:

The data that constitues similar/dissimilar paris
How the model defines and optimizes similarity