Embeddings Explained: How AI Converts Words to Numbers That Mean Something
Embeddings explained — how LLMs convert text, images, and code into vector representations that capture meaning, enable semantic search, and power recommendation systems.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Embeddings Explained: How AI Converts Words to Numbers That Mean Something
The moment that made embeddings click for me was this: type "cat" and "feline" into a similarity checker, and they score 0.87. Type "cat" and "automobile" and they score 0.12. The model has never been told these words are related — it learned the relationships from billions of texts where they appear in similar contexts.
That's what embeddings are: a learned geometric representation of meaning. And once you understand them, you realize they're the foundation of semantic search, RAG systems, recommendation engines, anomaly detection, and nearly every modern AI application that handles text.
This guide explains how embeddings work, how to create and use them, and the practical differences between embedding models that matter for production systems.
The Core Idea: Meaning as Geometry
Traditional text processing treated words as arbitrary symbols. "cat" was just a string — no relationship to "feline" or "kitten." Search required exact matches.
Embeddings encode meaning geometrically. Words and sentences become points in a high-dimensional space, where:
- Similar meanings → nearby points
- Different meanings → distant points
- Relationships → consistent directions
import numpy as np
# Illustrative example of what embedding vectors look like
# (not actual values — real embeddings have 768-3072 dimensions)
king = np.array([0.7, 0.1, 0.9, 0.2, ...]) # 1536 values
man = np.array([0.6, 0.2, 0.8, 0.1, ...])
woman = np.array([0.5, 0.8, 0.7, 0.3, ...])
queen = np.array([0.4, 0.9, 0.8, 0.4, ...])
# The famous word analogy: king - man + woman ≈ queen
analogy = king - man + woman
similarity = np.dot(analogy, queen) / (np.linalg.norm(analogy) * np.linalg.norm(queen))
# similarity ≈ 0.89 (very close to queen)
This geometric property means you can do arithmetic with meaning. King − Man + Woman ≈ Queen. Paris − France + Germany ≈ Berlin. The spatial structure of the embedding space reflects conceptual structure.
How Embeddings Are Created
Word2Vec (2013): Static Embeddings
from gensim.models import Word2Vec
# Training corpus
sentences = [
["machine", "learning", "is", "powerful"],
["deep", "learning", "uses", "neural", "networks"],
["python", "is", "used", "for", "machine", "learning"],
]
# Train Word2Vec
model = Word2Vec(
sentences,
vector_size=100, # Embedding dimensions
window=5, # Context window
min_count=1, # Minimum word frequency
workers=4
)
# Access word vectors
king_vec = model.wv['machine']
print(f"Vector shape: {king_vec.shape}") # (100,)
# Find similar words
similar = model.wv.most_similar("learning", topn=5)
print(similar) # [('machine', 0.95), ('deep', 0.88), ...]
# Limitation: "bank" has ONE embedding regardless of context
# "river bank" and "bank account" get the same vector
BERT: Contextual Embeddings
from transformers import AutoTokenizer, AutoModel
import torch
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
def get_bert_embedding(text: str) -> np.ndarray:
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
# Mean pooling of last hidden states (better than CLS for sentence similarity)
token_embeddings = outputs.last_hidden_state
attention_mask = inputs["attention_mask"]
# Mask padding tokens
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
mean_pooled = torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
return mean_pooled.numpy()[0] # Shape: (768,)
# Same word, different context = different embedding
bank_river = get_bert_embedding("She sat by the river bank")
bank_money = get_bert_embedding("I opened a new bank account")
from numpy.linalg import norm
def cosine_similarity(a, b):
return np.dot(a, b) / (norm(a) * norm(b))
# These should be less similar than you'd expect if context matters
print(cosine_similarity(bank_river, bank_money)) # ~0.82 (still somewhat similar)
Modern Embedding Models (2024-2025)
from openai import OpenAI
client = OpenAI()
def embed_texts(texts: list[str], model: str = "text-embedding-3-small") -> list[list[float]]:
response = client.embeddings.create(
model=model,
input=texts,
encoding_format="float"
)
return [item.embedding for item in response.data]
# Batch embedding (more efficient)
texts = [
"machine learning is a subset of AI",
"deep learning uses neural networks",
"I love cooking pasta",
"the stock market crashed today"
]
embeddings = embed_texts(texts)
print(f"Embedding dimensions: {len(embeddings[0])}") # 1536
# Semantic similarity matrix
import numpy as np
def cosine_similarity_matrix(embeddings):
norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
normalized = embeddings / norms
return np.dot(normalized, normalized.T)
emb_array = np.array(embeddings)
sim_matrix = cosine_similarity_matrix(emb_array)
print("\nSimilarity Matrix:")
print(f"ML ↔ Deep Learning: {sim_matrix[0, 1]:.3f}") # High: ~0.88
print(f"ML ↔ Cooking: {sim_matrix[0, 2]:.3f}") # Low: ~0.12
Semantic Search Pipeline
import numpy as np
from openai import OpenAI
client = OpenAI()
class SemanticSearch:
def __init__(self, model: str = "text-embedding-3-small"):
self.model = model
self.documents = []
self.embeddings = []
def add_documents(self, documents: list[str]):
"""Add documents to the search index."""
response = client.embeddings.create(
model=self.model,
input=documents
)
new_embeddings = [item.embedding for item in response.data]
self.documents.extend(documents)
self.embeddings.extend(new_embeddings)
print(f"Indexed {len(documents)} documents. Total: {len(self.documents)}")
def search(self, query: str, top_k: int = 5) -> list[dict]:
"""Find most similar documents to query."""
query_response = client.embeddings.create(
model=self.model,
input=[query]
)
query_embedding = np.array(query_response.data[0].embedding)
doc_embeddings = np.array(self.embeddings)
# Cosine similarity
doc_norms = np.linalg.norm(doc_embeddings, axis=1)
query_norm = np.linalg.norm(query_embedding)
similarities = np.dot(doc_embeddings, query_embedding) / (doc_norms * query_norm)
top_indices = np.argsort(similarities)[::-1][:top_k]
return [
{
"document": self.documents[i],
"similarity": float(similarities[i]),
"rank": rank + 1
}
for rank, i in enumerate(top_indices)
]
# Example usage
search = SemanticSearch()
# Index documents about AI topics
docs = [
"Transformers use self-attention mechanisms to process sequences.",
"BERT is a bidirectional encoder trained on masked language modeling.",
"GPT models are autoregressive — they predict the next token.",
"The vanishing gradient problem affects deep recurrent networks.",
"Fine-tuning adapts pre-trained models to specific tasks.",
"RAG combines retrieval with language model generation.",
"Vector databases store embeddings for fast similarity search.",
]
search.add_documents(docs)
# Semantic search — finds related content without exact word matches
results = search.search("how do attention-based models work?", top_k=3)
for r in results:
print(f"Rank {r['rank']} ({r['similarity']:.3f}): {r['document']}")
# Output:
# Rank 1 (0.847): Transformers use self-attention mechanisms to process sequences.
# Rank 2 (0.721): BERT is a bidirectional encoder trained on masked language modeling.
# Rank 3 (0.698): GPT models are autoregressive — they predict the next token.
Open-Source Embedding Models
from sentence_transformers import SentenceTransformer
import numpy as np
# BAAI/bge-large-en-v1.5 — top of MTEB leaderboard (free, local)
model = SentenceTransformer("BAAI/bge-large-en-v1.5")
sentences = [
"The cat sat on the mat.",
"A feline rested on the rug.", # Semantically similar
"Python is a programming language.", # Unrelated
]
# BGE models work better with a prefix for queries
query = "bge instruction: Retrieve relevant passages\nQuery: animal resting on floor covering"
docs_for_embedding = sentences
query_embedding = model.encode(query, normalize_embeddings=True)
doc_embeddings = model.encode(docs_for_embedding, normalize_embeddings=True)
# Dot product gives cosine similarity when normalized
similarities = doc_embeddings @ query_embedding
for sent, sim in zip(sentences, similarities):
print(f"Score {sim:.3f}: {sent}")
# Multilingual embeddings
multilingual_model = SentenceTransformer("paraphrase-multilingual-mpnet-base-v2")
mixed_languages = [
"How do I cancel my subscription?", # English
"¿Cómo cancelo mi suscripción?", # Spanish — same meaning
"Comment annuler mon abonnement?", # French — same meaning
"What is the weather today?", # Different topic
]
embeddings = multilingual_model.encode(mixed_languages, normalize_embeddings=True)
sim_matrix = embeddings @ embeddings.T
# Cross-lingual similarity: English/Spanish/French versions should score ~0.85+
print(f"EN ↔ ES: {sim_matrix[0,1]:.3f}") # ~0.87
print(f"EN ↔ FR: {sim_matrix[0,2]:.3f}") # ~0.85
print(f"EN ↔ different topic: {sim_matrix[0,3]:.3f}") # ~0.12
Embedding Models Comparison
| Model | Dimensions | MTEB Score | Cost | Best For |
|---|---|---|---|---|
| OpenAI text-embedding-3-large | 3072 | 64.6 | $0.13/1M tokens | Best quality, OpenAI ecosystem |
| OpenAI text-embedding-3-small | 1536 | 62.3 | $0.02/1M tokens | Cost-efficient, good quality |
| Cohere embed-v3.0 | 1024 | 64.5 | $0.10/1M tokens | Multilingual, task-aware |
| BAAI/bge-large-en | 1024 | 64.2 | Free (local) | Best free English model |
| E5-large-v2 | 1024 | 62.2 | Free (local) | Good quality, open source |
| all-mpnet-base-v2 | 768 | 57.8 | Free (local) | Lightweight, fast |
Practical Applications
Document Clustering
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
# Embed a corpus
texts = [...] # Your documents
embeddings = np.array(embed_texts(texts))
# Cluster by semantic content
n_clusters = 5
kmeans = KMeans(n_clusters=n_clusters, random_state=42)
labels = kmeans.fit_predict(embeddings)
# Visualize with UMAP (better than t-SNE for embeddings)
import umap
reducer = umap.UMAP(n_components=2, random_state=42)
reduced = reducer.fit_transform(embeddings)
plt.scatter(reduced[:, 0], reduced[:, 1], c=labels, cmap="tab10")
plt.title("Document Clusters")
Anomaly Detection
def find_outliers(texts: list[str], threshold: float = 0.3) -> list[int]:
"""Find documents that don't fit the main cluster."""
embeddings = np.array(embed_texts(texts))
centroid = embeddings.mean(axis=0)
centroid_norm = centroid / np.linalg.norm(centroid)
emb_norms = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
similarities = emb_norms @ centroid_norm
return [i for i, sim in enumerate(similarities) if sim < threshold]
Conclusion
Embeddings are the fundamental data structure of modern AI — they're how neural networks understand meaning. Once you understand that meaning is encoded geometrically, the applications follow naturally: semantic search, clustering, anomaly detection, recommendation, and the retrieval step in every RAG system.
The practical lesson: embedding model choice matters as much as algorithm choice. A better embedding model consistently outperforms a better retrieval algorithm with worse embeddings. Test on your domain, use the MTEB leaderboard as a starting point, and benchmark before choosing.
For using embeddings in a complete retrieval system, see our RAG guide. For the underlying transformer architecture that creates these embeddings, see our transformer architecture guide.
Frequently Asked Questions
What are embeddings in AI?
Dense numerical vectors that represent the meaning of text. Semantically similar items have nearby vectors; unrelated items are far apart. King − Man + Woman ≈ Queen — relationships encode geometrically. Modern embeddings have 768–3072 dimensions and are created by training transformers on massive text corpora.
How are embeddings created?
Text is passed through a transformer encoder (like BERT). The final layer's hidden states are pooled (mean or CLS token) to produce one fixed-length vector per text. The model is trained using contrastive learning on (query, relevant document) pairs — similar texts get similar vectors.
What is the difference between word2vec, BERT, and modern embedding models?
Word2Vec: static, one vector per word regardless of context. BERT: contextual, different vectors for same word in different contexts. Modern embedding models (OpenAI, Cohere, BGE): optimized specifically for semantic similarity, 3-5× better than BERT for retrieval tasks.
How do I use embeddings for semantic search?
Embed all documents and store in a vector database. Embed incoming queries with the same model. Find top-k nearest documents by cosine similarity. Return matching documents — finds semantic matches without exact keyword overlap.
Which embedding model should I use?
OpenAI text-embedding-3-small for OpenAI ecosystem applications. BAAI/bge-large-en-v1.5 for free local use. Cohere embed-v3 for multilingual. Always test on your specific domain — general benchmarks (MTEB) don't always predict domain performance.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
AI Hallucination Explained: Why LLMs Make Things Up (and How to Fix It)
AI hallucination explained — why large language models confidently generate false facts, how to detect it, and practical mitigation strategies for production systems.
Fine-Tuning LLMs: When to Do It and How to Do It Right
Fine-tuning LLMs explained — when fine-tuning beats prompting, how to prepare data, run LoRA fine-tuning with minimal GPU, and evaluate results with real cost and time estimates.
GPT-4 vs Claude vs Gemini: Which AI Model Is Best in 2025?
GPT-4 vs Claude vs Gemini comparison for 2025 — honest benchmarks, real-world performance across coding, writing, analysis, and reasoning, and which model to use for each task.
How Large Language Models Work: A Clear Technical Explanation
How large language models work explained clearly — from tokenization and transformers to training on billions of tokens, RLHF alignment, and why they sometimes hallucinate.