Semantic Search Tutorial: Build Search That Understands Meaning, Not Just Keywords
Semantic search tutorial — build a search system that finds results by meaning using embeddings and vector databases, with Python implementation and production architecture.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Semantic Search Tutorial: Build Search That Understands Meaning, Not Just Keywords
The search box on most internal tools is embarrassingly bad. You search for "cancel subscription" and get zero results because the document says "terminate membership." Keyword search fails whenever users don't know the exact terminology.
Semantic search fixes this by understanding meaning, not matching characters. I rebuilt a customer support search system from keyword to semantic and watched the "no results" rate drop from 34% to 4%. The implementation took a week. The user experience improvement was immediate.
Here's how to build it.
The Architecture
Semantic Search System:
Index Time (one-time):
Documents → Embedding Model → Vectors
Vectors → Vector Database (with metadata)
Query Time (real-time):
User Query → Embedding Model → Query Vector
Query Vector → Vector Database → Top-K Similar Vectors
→ Return Documents + Similarity Scores
Optional Improvements:
→ Hybrid: BM25 keyword + semantic fusion
→ Reranking: cross-encoder for better precision
→ Query expansion: multiple phrasings
Part 1: Basic Semantic Search
# pip install openai chromadb sentence-transformers
import numpy as np
from openai import OpenAI
client = OpenAI()
class SemanticSearchEngine:
def __init__(self, model: str = "text-embedding-3-small"):
self.model = model
self.documents = []
self.embeddings = []
self.metadata = []
def embed(self, texts: list[str]) -> np.ndarray:
response = client.embeddings.create(model=self.model, input=texts)
return np.array([item.embedding for item in response.data])
def add_documents(self, documents: list[str], metadata: list[dict] | None = None):
print(f"Embedding {len(documents)} documents...")
new_embeddings = self.embed(documents)
self.documents.extend(documents)
self.embeddings.extend(new_embeddings)
self.metadata.extend(metadata or [{}] * len(documents))
print(f"Total indexed: {len(self.documents)}")
def search(self, query: str, top_k: int = 5) -> list[dict]:
query_emb = self.embed([query])[0]
doc_embs = np.array(self.embeddings)
# Cosine similarity
query_norm = query_emb / np.linalg.norm(query_emb)
doc_norms = doc_embs / np.linalg.norm(doc_embs, axis=1, keepdims=True)
similarities = doc_norms @ query_norm
top_indices = np.argsort(similarities)[::-1][:top_k]
return [
{
"document": self.documents[i],
"similarity": float(similarities[i]),
"metadata": self.metadata[i],
"rank": rank + 1
}
for rank, i in enumerate(top_indices)
]
# Example
search_engine = SemanticSearchEngine()
docs = [
"How to cancel your subscription and get a refund.",
"Troubleshooting network connectivity issues.",
"Setting up two-factor authentication on your account.",
"How to export your data before closing your account.",
"Upgrading your plan to access premium features.",
"Contacting customer support for billing inquiries.",
"Password reset instructions for locked accounts.",
]
search_engine.add_documents(docs, metadata=[{"category": "support"} for _ in docs])
# Semantic matches: finds results even without exact keywords
queries = [
"end my membership", # → finds "cancel subscription"
"wifi not working", # → finds "network connectivity"
"secure my login", # → finds "two-factor authentication"
]
for query in queries:
print(f"\nQuery: '{query}'")
results = search_engine.search(query, top_k=2)
for r in results:
print(f" {r['rank']}. [{r['similarity']:.3f}] {r['document']}")
Part 2: Free Local Embeddings with Sentence-Transformers
from sentence_transformers import SentenceTransformer, util
import torch
# BAAI/bge-large-en-v1.5 — top of MTEB leaderboard, free
model = SentenceTransformer("BAAI/bge-large-en-v1.5")
def semantic_search_local(
query: str,
corpus: list[str],
top_k: int = 5
) -> list[dict]:
# BGE models work better with a query prefix
prefixed_query = f"Represent this sentence for searching relevant passages: {query}"
# Encode query and corpus
query_emb = model.encode(prefixed_query, normalize_embeddings=True)
corpus_embs = model.encode(corpus, normalize_embeddings=True, batch_size=32)
# Cosine similarity (dot product since normalized)
scores = corpus_embs @ query_emb
# Top-k results
top_indices = np.argsort(scores)[::-1][:top_k]
return [
{
"document": corpus[i],
"score": float(scores[i]),
"rank": rank + 1
}
for rank, i in enumerate(top_indices)
]
results = semantic_search_local("how to terminate account", docs)
Part 3: Hybrid Search (Semantic + BM25)
from rank_bm25 import BM25Okapi # pip install rank-bm25
import re
from typing import Optional
class HybridSearchEngine:
def __init__(self, semantic_weight: float = 0.6):
"""
semantic_weight: 0.0 = pure keyword, 1.0 = pure semantic
0.6 is a good starting point for most use cases
"""
self.semantic_weight = semantic_weight
self.bm25_weight = 1 - semantic_weight
self.documents = []
# Semantic components
self.embed_model = SentenceTransformer("BAAI/bge-large-en-v1.5")
self.doc_embeddings = None
# BM25 components
self.bm25 = None
def tokenize(self, text: str) -> list[str]:
"""Simple tokenizer for BM25."""
return re.sub(r'[^a-z0-9\s]', '', text.lower()).split()
def index(self, documents: list[str]):
self.documents = documents
# Build semantic index
self.doc_embeddings = self.embed_model.encode(
documents, normalize_embeddings=True, batch_size=32
)
# Build BM25 index
tokenized = [self.tokenize(doc) for doc in documents]
self.bm25 = BM25Okapi(tokenized)
print(f"Indexed {len(documents)} documents")
def search(self, query: str, top_k: int = 5) -> list[dict]:
n = len(self.documents)
# Semantic scores
query_prefix = f"Represent this sentence for searching relevant passages: {query}"
query_emb = self.embed_model.encode(query_prefix, normalize_embeddings=True)
semantic_scores = self.doc_embeddings @ query_emb
# Normalize to [0, 1]
semantic_min, semantic_max = semantic_scores.min(), semantic_scores.max()
if semantic_max > semantic_min:
semantic_normalized = (semantic_scores - semantic_min) / (semantic_max - semantic_min)
else:
semantic_normalized = semantic_scores
# BM25 keyword scores
tokenized_query = self.tokenize(query)
bm25_scores = np.array(self.bm25.get_scores(tokenized_query))
# Normalize BM25
bm25_min, bm25_max = bm25_scores.min(), bm25_scores.max()
if bm25_max > bm25_min:
bm25_normalized = (bm25_scores - bm25_min) / (bm25_max - bm25_min)
else:
bm25_normalized = bm25_scores
# Combine scores
combined = (
self.semantic_weight * semantic_normalized +
self.bm25_weight * bm25_normalized
)
top_indices = np.argsort(combined)[::-1][:top_k]
return [
{
"document": self.documents[i],
"combined_score": float(combined[i]),
"semantic_score": float(semantic_normalized[i]),
"bm25_score": float(bm25_normalized[i]),
"rank": rank + 1
}
for rank, i in enumerate(top_indices)
]
hybrid = HybridSearchEngine(semantic_weight=0.6)
hybrid.index(docs)
# Test: hybrid finds both semantic matches AND exact keyword matches
results = hybrid.search("cancel account", top_k=3)
for r in results:
print(f"Rank {r['rank']}: Sem={r['semantic_score']:.2f}, BM25={r['bm25_score']:.2f}")
print(f" {r['document']}")
Part 4: Reranking for Better Precision
from sentence_transformers import CrossEncoder
class RerankedSearchEngine:
def __init__(self):
# Bi-encoder for fast initial retrieval
self.bi_encoder = SentenceTransformer("BAAI/bge-large-en-v1.5")
# Cross-encoder for accurate reranking
self.cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
self.documents = []
self.doc_embeddings = None
def index(self, documents: list[str]):
self.documents = documents
self.doc_embeddings = self.bi_encoder.encode(
documents, normalize_embeddings=True
)
def search(self, query: str, top_k: int = 5, initial_k: int = 20) -> list[dict]:
# Stage 1: Fast semantic retrieval (get more candidates than needed)
query_emb = self.bi_encoder.encode(
f"Represent this sentence for searching relevant passages: {query}",
normalize_embeddings=True
)
scores = self.doc_embeddings @ query_emb
top_initial_indices = np.argsort(scores)[::-1][:initial_k]
# Stage 2: Accurate cross-encoder reranking
candidates = [(query, self.documents[i]) for i in top_initial_indices]
rerank_scores = self.cross_encoder.predict(candidates)
# Sort by rerank scores
sorted_indices = np.argsort(rerank_scores)[::-1][:top_k]
return [
{
"document": self.documents[top_initial_indices[i]],
"rerank_score": float(rerank_scores[i]),
"initial_rank": rank_in_initial + 1,
"final_rank": final_rank + 1
}
for final_rank, (rank_in_initial, i) in enumerate(
sorted(enumerate(sorted_indices), key=lambda x: rerank_scores[x[1]], reverse=True)
)
]
Part 5: Production with Qdrant
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
class ProductionSemanticSearch:
def __init__(self):
self.qdrant = QdrantClient(host="localhost", port=6333)
self.embed_model = SentenceTransformer("BAAI/bge-large-en-v1.5")
self.collection = "search_index"
def create_collection(self, dimension: int = 1024):
self.qdrant.recreate_collection(
collection_name=self.collection,
vectors_config=VectorParams(size=dimension, distance=Distance.COSINE)
)
def index_documents(self, documents: list[dict]):
embeddings = self.embed_model.encode(
[d["text"] for d in documents],
normalize_embeddings=True,
batch_size=32,
show_progress_bar=True
)
points = [
PointStruct(
id=i,
vector=emb.tolist(),
payload={k: v for k, v in doc.items() if k != "text"}
)
for i, (doc, emb) in enumerate(zip(documents, embeddings))
]
self.qdrant.upsert(collection_name=self.collection, points=points)
def search(self, query: str, top_k: int = 10) -> list[dict]:
query_emb = self.embed_model.encode(
f"Represent this sentence for searching relevant passages: {query}",
normalize_embeddings=True
)
results = self.qdrant.search(
collection_name=self.collection,
query_vector=query_emb.tolist(),
limit=top_k,
with_payload=True
)
return [{"score": r.score, **r.payload} for r in results]
Conclusion
Semantic search transforms user experience in any search-heavy application. The implementation path is clear: start with simple bi-encoder search, add hybrid BM25 fusion for better recall, add reranking for precision, and scale with Qdrant or Pinecone when document volumes grow.
For most use cases, BAAI/bge-large-en-v1.5 (free) with hybrid search achieves 90% of what OpenAI's embedding API provides at zero marginal cost.
For the vector database that stores these embeddings at scale, see our vector database guide. For building a complete RAG system on top of this search layer, see our RAG system tutorial.
Further Reading
- Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment
- Vector Database Guide: Pinecone, Weaviate, Chroma, and pgvector Compared
- Streamlit Tutorial: Build and Deploy AI Apps with Python in Minutes
- Hugging Face Transformers Tutorial: Complete Guide to Using Pretrained Models
- AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality
- Python Web Scraping Guide 2026 — BeautifulSoup, Requests & Playwright
- Computer Vision Tutorial: Build an Image Classifier from Scratch
- How Large Language Models Work: A Clear Technical Explanation
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality
AI API cost management — practical strategies to reduce OpenAI, Claude, and Gemini API costs by 80% using model selection, caching, RAG, prompt optimization, and batch processing.
Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment
Build an AI chatbot with Python — complete tutorial from OpenAI API integration to conversation memory, streaming responses, and deploying a production-ready chatbot application.
Build a Personal AI Assistant: Complete Python Project with Memory and Tools
Build a personal AI assistant in Python with persistent memory, web search, file access, and calendar integration — a complete project from architecture to working prototype.
CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together
CrewAI tutorial — build multi-agent AI systems where specialized agents collaborate to complete complex tasks, with practical Python examples for research, coding, and content workflows.