Follow AiTechWorlds on LinkedIn for professional AI content!Follow Now →

Semantic Search Tutorial: Build Search That Understands Meaning, Not Just Keywords

Semantic search tutorial — build a search system that finds results by meaning using embeddings and vector databases, with Python implementation and production architecture.

A
AiTechWorlds Team
May 27, 2026 7 min read
📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Semantic Search Tutorial: Build Search That Understands Meaning, Not Just Keywords

The search box on most internal tools is embarrassingly bad. You search for "cancel subscription" and get zero results because the document says "terminate membership." Keyword search fails whenever users don't know the exact terminology.

Semantic search fixes this by understanding meaning, not matching characters. I rebuilt a customer support search system from keyword to semantic and watched the "no results" rate drop from 34% to 4%. The implementation took a week. The user experience improvement was immediate.

Here's how to build it.


The Architecture

Semantic Search System:

Index Time (one-time):
  Documents → Embedding Model → Vectors
  Vectors → Vector Database (with metadata)

Query Time (real-time):
  User Query → Embedding Model → Query Vector
  Query Vector → Vector Database → Top-K Similar Vectors
  → Return Documents + Similarity Scores

Optional Improvements:
  → Hybrid: BM25 keyword + semantic fusion
  → Reranking: cross-encoder for better precision
  → Query expansion: multiple phrasings

# pip install openai chromadb sentence-transformers

import numpy as np
from openai import OpenAI

client = OpenAI()

class SemanticSearchEngine:
    def __init__(self, model: str = "text-embedding-3-small"):
        self.model = model
        self.documents = []
        self.embeddings = []
        self.metadata = []
    
    def embed(self, texts: list[str]) -> np.ndarray:
        response = client.embeddings.create(model=self.model, input=texts)
        return np.array([item.embedding for item in response.data])
    
    def add_documents(self, documents: list[str], metadata: list[dict] | None = None):
        print(f"Embedding {len(documents)} documents...")
        new_embeddings = self.embed(documents)
        
        self.documents.extend(documents)
        self.embeddings.extend(new_embeddings)
        self.metadata.extend(metadata or [{}] * len(documents))
        
        print(f"Total indexed: {len(self.documents)}")
    
    def search(self, query: str, top_k: int = 5) -> list[dict]:
        query_emb = self.embed([query])[0]
        doc_embs = np.array(self.embeddings)
        
        # Cosine similarity
        query_norm = query_emb / np.linalg.norm(query_emb)
        doc_norms = doc_embs / np.linalg.norm(doc_embs, axis=1, keepdims=True)
        similarities = doc_norms @ query_norm
        
        top_indices = np.argsort(similarities)[::-1][:top_k]
        
        return [
            {
                "document": self.documents[i],
                "similarity": float(similarities[i]),
                "metadata": self.metadata[i],
                "rank": rank + 1
            }
            for rank, i in enumerate(top_indices)
        ]

# Example
search_engine = SemanticSearchEngine()

docs = [
    "How to cancel your subscription and get a refund.",
    "Troubleshooting network connectivity issues.",
    "Setting up two-factor authentication on your account.",
    "How to export your data before closing your account.",
    "Upgrading your plan to access premium features.",
    "Contacting customer support for billing inquiries.",
    "Password reset instructions for locked accounts.",
]

search_engine.add_documents(docs, metadata=[{"category": "support"} for _ in docs])

# Semantic matches: finds results even without exact keywords
queries = [
    "end my membership",           # → finds "cancel subscription"
    "wifi not working",            # → finds "network connectivity"
    "secure my login",             # → finds "two-factor authentication"
]

for query in queries:
    print(f"\nQuery: '{query}'")
    results = search_engine.search(query, top_k=2)
    for r in results:
        print(f"  {r['rank']}. [{r['similarity']:.3f}] {r['document']}")

Part 2: Free Local Embeddings with Sentence-Transformers

from sentence_transformers import SentenceTransformer, util
import torch

# BAAI/bge-large-en-v1.5 — top of MTEB leaderboard, free
model = SentenceTransformer("BAAI/bge-large-en-v1.5")

def semantic_search_local(
    query: str,
    corpus: list[str],
    top_k: int = 5
) -> list[dict]:
    # BGE models work better with a query prefix
    prefixed_query = f"Represent this sentence for searching relevant passages: {query}"
    
    # Encode query and corpus
    query_emb = model.encode(prefixed_query, normalize_embeddings=True)
    corpus_embs = model.encode(corpus, normalize_embeddings=True, batch_size=32)
    
    # Cosine similarity (dot product since normalized)
    scores = corpus_embs @ query_emb
    
    # Top-k results
    top_indices = np.argsort(scores)[::-1][:top_k]
    
    return [
        {
            "document": corpus[i],
            "score": float(scores[i]),
            "rank": rank + 1
        }
        for rank, i in enumerate(top_indices)
    ]

results = semantic_search_local("how to terminate account", docs)

Part 3: Hybrid Search (Semantic + BM25)

from rank_bm25 import BM25Okapi  # pip install rank-bm25
import re
from typing import Optional

class HybridSearchEngine:
    def __init__(self, semantic_weight: float = 0.6):
        """
        semantic_weight: 0.0 = pure keyword, 1.0 = pure semantic
        0.6 is a good starting point for most use cases
        """
        self.semantic_weight = semantic_weight
        self.bm25_weight = 1 - semantic_weight
        self.documents = []
        
        # Semantic components
        self.embed_model = SentenceTransformer("BAAI/bge-large-en-v1.5")
        self.doc_embeddings = None
        
        # BM25 components
        self.bm25 = None
    
    def tokenize(self, text: str) -> list[str]:
        """Simple tokenizer for BM25."""
        return re.sub(r'[^a-z0-9\s]', '', text.lower()).split()
    
    def index(self, documents: list[str]):
        self.documents = documents
        
        # Build semantic index
        self.doc_embeddings = self.embed_model.encode(
            documents, normalize_embeddings=True, batch_size=32
        )
        
        # Build BM25 index
        tokenized = [self.tokenize(doc) for doc in documents]
        self.bm25 = BM25Okapi(tokenized)
        
        print(f"Indexed {len(documents)} documents")
    
    def search(self, query: str, top_k: int = 5) -> list[dict]:
        n = len(self.documents)
        
        # Semantic scores
        query_prefix = f"Represent this sentence for searching relevant passages: {query}"
        query_emb = self.embed_model.encode(query_prefix, normalize_embeddings=True)
        semantic_scores = self.doc_embeddings @ query_emb
        
        # Normalize to [0, 1]
        semantic_min, semantic_max = semantic_scores.min(), semantic_scores.max()
        if semantic_max > semantic_min:
            semantic_normalized = (semantic_scores - semantic_min) / (semantic_max - semantic_min)
        else:
            semantic_normalized = semantic_scores
        
        # BM25 keyword scores
        tokenized_query = self.tokenize(query)
        bm25_scores = np.array(self.bm25.get_scores(tokenized_query))
        
        # Normalize BM25
        bm25_min, bm25_max = bm25_scores.min(), bm25_scores.max()
        if bm25_max > bm25_min:
            bm25_normalized = (bm25_scores - bm25_min) / (bm25_max - bm25_min)
        else:
            bm25_normalized = bm25_scores
        
        # Combine scores
        combined = (
            self.semantic_weight * semantic_normalized +
            self.bm25_weight * bm25_normalized
        )
        
        top_indices = np.argsort(combined)[::-1][:top_k]
        
        return [
            {
                "document": self.documents[i],
                "combined_score": float(combined[i]),
                "semantic_score": float(semantic_normalized[i]),
                "bm25_score": float(bm25_normalized[i]),
                "rank": rank + 1
            }
            for rank, i in enumerate(top_indices)
        ]

hybrid = HybridSearchEngine(semantic_weight=0.6)
hybrid.index(docs)

# Test: hybrid finds both semantic matches AND exact keyword matches
results = hybrid.search("cancel account", top_k=3)
for r in results:
    print(f"Rank {r['rank']}: Sem={r['semantic_score']:.2f}, BM25={r['bm25_score']:.2f}")
    print(f"  {r['document']}")

Part 4: Reranking for Better Precision

from sentence_transformers import CrossEncoder

class RerankedSearchEngine:
    def __init__(self):
        # Bi-encoder for fast initial retrieval
        self.bi_encoder = SentenceTransformer("BAAI/bge-large-en-v1.5")
        # Cross-encoder for accurate reranking
        self.cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
        self.documents = []
        self.doc_embeddings = None
    
    def index(self, documents: list[str]):
        self.documents = documents
        self.doc_embeddings = self.bi_encoder.encode(
            documents, normalize_embeddings=True
        )
    
    def search(self, query: str, top_k: int = 5, initial_k: int = 20) -> list[dict]:
        # Stage 1: Fast semantic retrieval (get more candidates than needed)
        query_emb = self.bi_encoder.encode(
            f"Represent this sentence for searching relevant passages: {query}",
            normalize_embeddings=True
        )
        scores = self.doc_embeddings @ query_emb
        top_initial_indices = np.argsort(scores)[::-1][:initial_k]
        
        # Stage 2: Accurate cross-encoder reranking
        candidates = [(query, self.documents[i]) for i in top_initial_indices]
        rerank_scores = self.cross_encoder.predict(candidates)
        
        # Sort by rerank scores
        sorted_indices = np.argsort(rerank_scores)[::-1][:top_k]
        
        return [
            {
                "document": self.documents[top_initial_indices[i]],
                "rerank_score": float(rerank_scores[i]),
                "initial_rank": rank_in_initial + 1,
                "final_rank": final_rank + 1
            }
            for final_rank, (rank_in_initial, i) in enumerate(
                sorted(enumerate(sorted_indices), key=lambda x: rerank_scores[x[1]], reverse=True)
            )
        ]

Part 5: Production with Qdrant

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

class ProductionSemanticSearch:
    def __init__(self):
        self.qdrant = QdrantClient(host="localhost", port=6333)
        self.embed_model = SentenceTransformer("BAAI/bge-large-en-v1.5")
        self.collection = "search_index"
    
    def create_collection(self, dimension: int = 1024):
        self.qdrant.recreate_collection(
            collection_name=self.collection,
            vectors_config=VectorParams(size=dimension, distance=Distance.COSINE)
        )
    
    def index_documents(self, documents: list[dict]):
        embeddings = self.embed_model.encode(
            [d["text"] for d in documents],
            normalize_embeddings=True,
            batch_size=32,
            show_progress_bar=True
        )
        
        points = [
            PointStruct(
                id=i,
                vector=emb.tolist(),
                payload={k: v for k, v in doc.items() if k != "text"}
            )
            for i, (doc, emb) in enumerate(zip(documents, embeddings))
        ]
        
        self.qdrant.upsert(collection_name=self.collection, points=points)
    
    def search(self, query: str, top_k: int = 10) -> list[dict]:
        query_emb = self.embed_model.encode(
            f"Represent this sentence for searching relevant passages: {query}",
            normalize_embeddings=True
        )
        
        results = self.qdrant.search(
            collection_name=self.collection,
            query_vector=query_emb.tolist(),
            limit=top_k,
            with_payload=True
        )
        
        return [{"score": r.score, **r.payload} for r in results]

Conclusion

Semantic search transforms user experience in any search-heavy application. The implementation path is clear: start with simple bi-encoder search, add hybrid BM25 fusion for better recall, add reranking for precision, and scale with Qdrant or Pinecone when document volumes grow.

For most use cases, BAAI/bge-large-en-v1.5 (free) with hybrid search achieves 90% of what OpenAI's embedding API provides at zero marginal cost.

For the vector database that stores these embeddings at scale, see our vector database guide. For building a complete RAG system on top of this search layer, see our RAG system tutorial.


Frequently Asked Questions

What is semantic search?

Finds documents by meaning, not exact keyword match. Uses embeddings — neural representations of text — to measure semantic similarity. Handles synonyms and paraphrases that keyword search misses. "Cancel account" finds "terminate subscription."

What embedding model should I use?

OpenAI text-embedding-3-small for paid cloud. BAAI/bge-large-en-v1.5 for free local use. Use the same model for documents and queries — never mix models. Some models (BGE) need query prefixes.

How do I improve semantic search relevance?

Better embedding model, hybrid search (add BM25), reranking (cross-encoder), better chunking, query expansion. Hybrid search alone typically improves recall by 10-20%.

What is the difference between bi-encoder and cross-encoder models?

Bi-encoder: embeds query and document separately, fast, used for initial retrieval. Cross-encoder: processes query+document together, much more accurate but slow. Two-stage: retrieve 50 with bi-encoder, rerank top-10 with cross-encoder.

How do I handle a multilingual corpus?

Use multilingual embedding models: paraphrase-multilingual-mpnet-base-v2 or intfloat/multilingual-e5-large. These create a shared multilingual embedding space — English queries find French/German/Japanese results. Or translate queries first for higher accuracy.

Share this article:

Frequently Asked Questions

Keyword search finds documents containing the exact query words. Semantic search finds documents with the same meaning, even with different words. Example: keyword search for 'car maintenance' won't find 'vehicle upkeep tips'; semantic search will. Semantic search uses embeddings — neural network representations of text meaning — to measure conceptual similarity. It works by: embedding the query and all documents into the same vector space, then finding documents whose embeddings are closest to the query embedding. This handles synonyms, paraphrases, and concept-level similarity that keyword search misses.
A

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

Related Articles

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources
Join Free Channel

No spam. Leave anytime.

!