What embedding model should I use for semantic search?

For most applications: OpenAI text-embedding-3-small (1536 dims, $0.02/1M tokens) — excellent quality, fast, OpenAI ecosystem. For free/local: BAAI/bge-large-en-v1.5 (top of MTEB leaderboard). For multilingual: intfloat/multilingual-e5-large or paraphrase-multilingual-mpnet-base-v2. Critical rule: always use the same embedding model for documents and queries — mixing models breaks semantic search completely. Also critical: some models (like BGE) require a task prefix for queries ('Represent this sentence for searching relevant passages:') but not for documents.

How do I improve semantic search relevance?

Common improvements in order: 1) Try a better embedding model — quality varies significantly, and MTEB leaderboard rankings strongly predict real performance. 2) Add hybrid search — combine semantic with BM25 keyword search. Hybrid consistently outperforms either alone. 3) Add reranking — cross-encoder models rerank top results more accurately than bi-encoder similarity. 4) Improve chunking — better chunk boundaries preserve semantic context. 5) Query expansion — generate alternative phrasings, retrieve for all, merge results. 6) Fine-tune embeddings on domain data — significant gains for specialized domains (legal, medical, technical).

What is the difference between bi-encoder and cross-encoder models?

Bi-encoder: encodes query and document separately, computes similarity of embeddings. Fast — documents can be pre-embedded. Used in vector databases. Good for initial retrieval. Cross-encoder: takes (query, document) as a pair and computes relevance score jointly. Much more accurate because it sees both texts together. But slow — must process each (query, document) pair at inference time. Can't pre-compute. Used for reranking: first retrieve 50-100 candidates with bi-encoder (fast), then rerank with cross-encoder (slow but accurate) to get top-10 results. This two-stage approach combines the speed of bi-encoders with the accuracy of cross-encoders.

How do I handle a multilingual corpus in semantic search?

Use a multilingual embedding model: paraphrase-multilingual-mpnet-base-v2 (50 languages), intfloat/multilingual-e5-large (100 languages), or Cohere's multilingual embed-v3. These produce embeddings in a shared multilingual space — a query in English finds results in French, German, or Japanese with the same semantic meaning. Store language metadata and filter if needed. For very high-quality cross-lingual search, translate queries to the corpus's primary language first (using DeepL or Google Translate API), then do monolingual search. This avoids cross-lingual embedding quality limitations at the cost of translation API calls.

AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

AI application development code in Python editor — semantic search tutorial

Ai Development

Semantic Search Tutorial: Build Search That Understands Meaning, Not Just Keywords

⚡ Quick Answer

Semantic search tutorial — build a search system that finds results by meaning using embeddings and vector databases, with Python implementation and production architecture.

AiTechWorlds Team May 27, 2026 7 min read

#semantic-search-tutorial #semantic-search-python #vector-search #ai-development

📚Part of the Ai Development guide — explore all Ai Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Semantic Search Tutorial: Build Search That Understands Meaning, Not Just Keywords

The search box on most internal tools is embarrassingly bad. You search for "cancel subscription" and get zero results because the document says "terminate membership." Keyword search fails whenever users don't know the exact terminology.

Semantic search fixes this by understanding meaning, not matching characters. I rebuilt a customer support search system from keyword to semantic and watched the "no results" rate drop from 34% to 4%. The implementation took a week. The user experience improvement was immediate.

Here's how to build it.

The Architecture

Semantic Search System:

Index Time (one-time):
  Documents → Embedding Model → Vectors
  Vectors → Vector Database (with metadata)

Query Time (real-time):
  User Query → Embedding Model → Query Vector
  Query Vector → Vector Database → Top-K Similar Vectors
  → Return Documents + Similarity Scores

Optional Improvements:
  → Hybrid: BM25 keyword + semantic fusion
  → Reranking: cross-encoder for better precision
  → Query expansion: multiple phrasings

Part 1: Basic Semantic Search

# pip install openai chromadb sentence-transformers

import numpy as np
from openai import OpenAI

client = OpenAI()

class SemanticSearchEngine:
    def __init__(self, model: str = "text-embedding-3-small"):
        self.model = model
        self.documents = []
        self.embeddings = []
        self.metadata = []
    
    def embed(self, texts: list[str]) -> np.ndarray:
        response = client.embeddings.create(model=self.model, input=texts)
        return np.array([item.embedding for item in response.data])
    
    def add_documents(self, documents: list[str], metadata: list[dict] | None = None):
        print(f"Embedding {len(documents)} documents...")
        new_embeddings = self.embed(documents)
        
        self.documents.extend(documents)
        self.embeddings.extend(new_embeddings)
        self.metadata.extend(metadata or [{}] * len(documents))
        
        print(f"Total indexed: {len(self.documents)}")
    
    def search(self, query: str, top_k: int = 5) -> list[dict]:
        query_emb = self.embed([query])[0]
        doc_embs = np.array(self.embeddings)
        
        # Cosine similarity
        query_norm = query_emb / np.linalg.norm(query_emb)
        doc_norms = doc_embs / np.linalg.norm(doc_embs, axis=1, keepdims=True)
        similarities = doc_norms @ query_norm
        
        top_indices = np.argsort(similarities)[::-1][:top_k]
        
        return [
            {
                "document": self.documents[i],
                "similarity": float(similarities[i]),
                "metadata": self.metadata[i],
                "rank": rank + 1
            }
            for rank, i in enumerate(top_indices)
        ]

# Example
search_engine = SemanticSearchEngine()

docs = [
    "How to cancel your subscription and get a refund.",
    "Troubleshooting network connectivity issues.",
    "Setting up two-factor authentication on your account.",
    "How to export your data before closing your account.",
    "Upgrading your plan to access premium features.",
    "Contacting customer support for billing inquiries.",
    "Password reset instructions for locked accounts.",
]

search_engine.add_documents(docs, metadata=[{"category": "support"} for _ in docs])

# Semantic matches: finds results even without exact keywords
queries = [
    "end my membership",           # → finds "cancel subscription"
    "wifi not working",            # → finds "network connectivity"
    "secure my login",             # → finds "two-factor authentication"
]

for query in queries:
    print(f"\nQuery: '{query}'")
    results = search_engine.search(query, top_k=2)
    for r in results:
        print(f"  {r['rank']}. [{r['similarity']:.3f}] {r['document']}")

Part 2: Free Local Embeddings with Sentence-Transformers

from sentence_transformers import SentenceTransformer, util
import torch

# BAAI/bge-large-en-v1.5 — top of MTEB leaderboard, free
model = SentenceTransformer("BAAI/bge-large-en-v1.5")

def semantic_search_local(
    query: str,
    corpus: list[str],
    top_k: int = 5
) -> list[dict]:
    # BGE models work better with a query prefix
    prefixed_query = f"Represent this sentence for searching relevant passages: {query}"
    
    # Encode query and corpus
    query_emb = model.encode(prefixed_query, normalize_embeddings=True)
    corpus_embs = model.encode(corpus, normalize_embeddings=True, batch_size=32)
    
    # Cosine similarity (dot product since normalized)
    scores = corpus_embs @ query_emb
    
    # Top-k results
    top_indices = np.argsort(scores)[::-1][:top_k]
    
    return [
        {
            "document": corpus[i],
            "score": float(scores[i]),
            "rank": rank + 1
        }
        for rank, i in enumerate(top_indices)
    ]

results = semantic_search_local("how to terminate account", docs)

Part 3: Hybrid Search (Semantic + BM25)

from rank_bm25 import BM25Okapi  # pip install rank-bm25
import re
from typing import Optional

class HybridSearchEngine:
    def __init__(self, semantic_weight: float = 0.6):
        """
        semantic_weight: 0.0 = pure keyword, 1.0 = pure semantic
        0.6 is a good starting point for most use cases
        """
        self.semantic_weight = semantic_weight
        self.bm25_weight = 1 - semantic_weight
        self.documents = []
        
        # Semantic components
        self.embed_model = SentenceTransformer("BAAI/bge-large-en-v1.5")
        self.doc_embeddings = None
        
        # BM25 components
        self.bm25 = None
    
    def tokenize(self, text: str) -> list[str]:
        """Simple tokenizer for BM25."""
        return re.sub(r'[^a-z0-9\s]', '', text.lower()).split()
    
    def index(self, documents: list[str]):
        self.documents = documents
        
        # Build semantic index
        self.doc_embeddings = self.embed_model.encode(
            documents, normalize_embeddings=True, batch_size=32
        )
        
        # Build BM25 index
        tokenized = [self.tokenize(doc) for doc in documents]
        self.bm25 = BM25Okapi(tokenized)
        
        print(f"Indexed {len(documents)} documents")
    
    def search(self, query: str, top_k: int = 5) -> list[dict]:
        n = len(self.documents)
        
        # Semantic scores
        query_prefix = f"Represent this sentence for searching relevant passages: {query}"
        query_emb = self.embed_model.encode(query_prefix, normalize_embeddings=True)
        semantic_scores = self.doc_embeddings @ query_emb
        
        # Normalize to [0, 1]
        semantic_min, semantic_max = semantic_scores.min(), semantic_scores.max()
        if semantic_max > semantic_min:
            semantic_normalized = (semantic_scores - semantic_min) / (semantic_max - semantic_min)
        else:
            semantic_normalized = semantic_scores
        
        # BM25 keyword scores
        tokenized_query = self.tokenize(query)
        bm25_scores = np.array(self.bm25.get_scores(tokenized_query))
        
        # Normalize BM25
        bm25_min, bm25_max = bm25_scores.min(), bm25_scores.max()
        if bm25_max > bm25_min:
            bm25_normalized = (bm25_scores - bm25_min) / (bm25_max - bm25_min)
        else:
            bm25_normalized = bm25_scores
        
        # Combine scores
        combined = (
            self.semantic_weight * semantic_normalized +
            self.bm25_weight * bm25_normalized
        )
        
        top_indices = np.argsort(combined)[::-1][:top_k]
        
        return [
            {
                "document": self.documents[i],
                "combined_score": float(combined[i]),
                "semantic_score": float(semantic_normalized[i]),
                "bm25_score": float(bm25_normalized[i]),
                "rank": rank + 1
            }
            for rank, i in enumerate(top_indices)
        ]

hybrid = HybridSearchEngine(semantic_weight=0.6)
hybrid.index(docs)

# Test: hybrid finds both semantic matches AND exact keyword matches
results = hybrid.search("cancel account", top_k=3)
for r in results:
    print(f"Rank {r['rank']}: Sem={r['semantic_score']:.2f}, BM25={r['bm25_score']:.2f}")
    print(f"  {r['document']}")

Part 4: Reranking for Better Precision

from sentence_transformers import CrossEncoder

class RerankedSearchEngine:
    def __init__(self):
        # Bi-encoder for fast initial retrieval
        self.bi_encoder = SentenceTransformer("BAAI/bge-large-en-v1.5")
        # Cross-encoder for accurate reranking
        self.cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
        self.documents = []
        self.doc_embeddings = None
    
    def index(self, documents: list[str]):
        self.documents = documents
        self.doc_embeddings = self.bi_encoder.encode(
            documents, normalize_embeddings=True
        )
    
    def search(self, query: str, top_k: int = 5, initial_k: int = 20) -> list[dict]:
        # Stage 1: Fast semantic retrieval (get more candidates than needed)
        query_emb = self.bi_encoder.encode(
            f"Represent this sentence for searching relevant passages: {query}",
            normalize_embeddings=True
        )
        scores = self.doc_embeddings @ query_emb
        top_initial_indices = np.argsort(scores)[::-1][:initial_k]
        
        # Stage 2: Accurate cross-encoder reranking
        candidates = [(query, self.documents[i]) for i in top_initial_indices]
        rerank_scores = self.cross_encoder.predict(candidates)
        
        # Sort by rerank scores
        sorted_indices = np.argsort(rerank_scores)[::-1][:top_k]
        
        return [
            {
                "document": self.documents[top_initial_indices[i]],
                "rerank_score": float(rerank_scores[i]),
                "initial_rank": rank_in_initial + 1,
                "final_rank": final_rank + 1
            }
            for final_rank, (rank_in_initial, i) in enumerate(
                sorted(enumerate(sorted_indices), key=lambda x: rerank_scores[x[1]], reverse=True)
            )
        ]

Part 5: Production with Qdrant

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

class ProductionSemanticSearch:
    def __init__(self):
        self.qdrant = QdrantClient(host="localhost", port=6333)
        self.embed_model = SentenceTransformer("BAAI/bge-large-en-v1.5")
        self.collection = "search_index"
    
    def create_collection(self, dimension: int = 1024):
        self.qdrant.recreate_collection(
            collection_name=self.collection,
            vectors_config=VectorParams(size=dimension, distance=Distance.COSINE)
        )
    
    def index_documents(self, documents: list[dict]):
        embeddings = self.embed_model.encode(
            [d["text"] for d in documents],
            normalize_embeddings=True,
            batch_size=32,
            show_progress_bar=True
        )
        
        points = [
            PointStruct(
                id=i,
                vector=emb.tolist(),
                payload={k: v for k, v in doc.items() if k != "text"}
            )
            for i, (doc, emb) in enumerate(zip(documents, embeddings))
        ]
        
        self.qdrant.upsert(collection_name=self.collection, points=points)
    
    def search(self, query: str, top_k: int = 10) -> list[dict]:
        query_emb = self.embed_model.encode(
            f"Represent this sentence for searching relevant passages: {query}",
            normalize_embeddings=True
        )
        
        results = self.qdrant.search(
            collection_name=self.collection,
            query_vector=query_emb.tolist(),
            limit=top_k,
            with_payload=True
        )
        
        return [{"score": r.score, **r.payload} for r in results]

Conclusion

Semantic search transforms user experience in any search-heavy application. The implementation path is clear: start with simple bi-encoder search, add hybrid BM25 fusion for better recall, add reranking for precision, and scale with Qdrant or Pinecone when document volumes grow.

For most use cases, BAAI/bge-large-en-v1.5 (free) with hybrid search achieves 90% of what OpenAI's embedding API provides at zero marginal cost.

For the vector database that stores these embeddings at scale, see our vector database guide. For building a complete RAG system on top of this search layer, see our RAG system tutorial.

Frequently Asked Questions

Keyword search finds documents containing the exact query words. Semantic search finds documents with the same meaning, even with different words. Example: keyword search for 'car maintenance' won't find 'vehicle upkeep tips'; semantic search will. Semantic search uses embeddings — neural network representations of text meaning — to measure conceptual similarity. It works by: embedding the query and all documents into the same vector space, then finding documents whose embeddings are closest to the query embedding. This handles synonyms, paraphrases, and concept-level similarity that keyword search misses.

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

AI application development code in Python editor — ai api cost management

AI Learning

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

AI API cost management — practical strategies to reduce OpenAI, Claude, and Gemini API costs by 80% using model selection, caching, RAG, prompt optimization, and batch processing.

May 27, 2026 7 min read

AI application development code in Python editor — build an ai chatbot with python build ai chatbot python

AI Learning

🔥 Trending

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Build an AI chatbot with Python — complete tutorial from OpenAI API integration to conversation memory, streaming responses, and deploying a production-ready chatbot application.

May 27, 2026 7 min read

AI application development code in Python editor — build a personal ai assistant build personal ai assistant

AI Learning

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

Build a personal AI assistant in Python with persistent memory, web search, file access, and calendar integration — a complete project from architecture to working prototype.

May 27, 2026 7 min read

AI application development code in Python editor — crewai tutorial

AI Learning

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

CrewAI tutorial — build multi-agent AI systems where specialized agents collaborate to complete complex tasks, with practical Python examples for research, coding, and content workflows.

May 27, 2026 8 min read

Go deeper on this topic

NotesPrompt Engineering Cheat Sheet NotesLLM Core Concepts Explained NotesChatGPT Tips & Tricks Cheat Sheet NotesAI Agent Development Notes NotesTransformer Architecture Cheat Sheet NotesPrompt Engineering vs Fine-Tuning vs RLHF

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Ai Development

Semantic Search Tutorial: Build Search That Understands Meaning, Not Just Keywords

⚡ Quick Answer

Semantic search tutorial — build a search system that finds results by meaning using embeddings and vector databases, with Python implementation and production architecture.

AiTechWorlds Team May 27, 2026 7 min read

#semantic-search-tutorial #semantic-search-python #vector-search #ai-development

📚Part of the Ai Development guide — explore all Ai Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Semantic Search Tutorial: Build Search That Understands Meaning, Not Just Keywords

Here's how to build it.

The Architecture

Semantic Search System:

Index Time (one-time):
  Documents → Embedding Model → Vectors
  Vectors → Vector Database (with metadata)

Query Time (real-time):
  User Query → Embedding Model → Query Vector
  Query Vector → Vector Database → Top-K Similar Vectors
  → Return Documents + Similarity Scores

Optional Improvements:
  → Hybrid: BM25 keyword + semantic fusion
  → Reranking: cross-encoder for better precision
  → Query expansion: multiple phrasings

Part 1: Basic Semantic Search

# pip install openai chromadb sentence-transformers

import numpy as np
from openai import OpenAI

client = OpenAI()

class SemanticSearchEngine:
    def __init__(self, model: str = "text-embedding-3-small"):
        self.model = model
        self.documents = []
        self.embeddings = []
        self.metadata = []
    
    def embed(self, texts: list[str]) -> np.ndarray:
        response = client.embeddings.create(model=self.model, input=texts)
        return np.array([item.embedding for item in response.data])
    
    def add_documents(self, documents: list[str], metadata: list[dict] | None = None):
        print(f"Embedding {len(documents)} documents...")
        new_embeddings = self.embed(documents)
        
        self.documents.extend(documents)
        self.embeddings.extend(new_embeddings)
        self.metadata.extend(metadata or [{}] * len(documents))
        
        print(f"Total indexed: {len(self.documents)}")
    
    def search(self, query: str, top_k: int = 5) -> list[dict]:
        query_emb = self.embed([query])[0]
        doc_embs = np.array(self.embeddings)
        
        # Cosine similarity
        query_norm = query_emb / np.linalg.norm(query_emb)
        doc_norms = doc_embs / np.linalg.norm(doc_embs, axis=1, keepdims=True)
        similarities = doc_norms @ query_norm
        
        top_indices = np.argsort(similarities)[::-1][:top_k]
        
        return [
            {
                "document": self.documents[i],
                "similarity": float(similarities[i]),
                "metadata": self.metadata[i],
                "rank": rank + 1
            }
            for rank, i in enumerate(top_indices)
        ]

# Example
search_engine = SemanticSearchEngine()

docs = [
    "How to cancel your subscription and get a refund.",
    "Troubleshooting network connectivity issues.",
    "Setting up two-factor authentication on your account.",
    "How to export your data before closing your account.",
    "Upgrading your plan to access premium features.",
    "Contacting customer support for billing inquiries.",
    "Password reset instructions for locked accounts.",
]

search_engine.add_documents(docs, metadata=[{"category": "support"} for _ in docs])

# Semantic matches: finds results even without exact keywords
queries = [
    "end my membership",           # → finds "cancel subscription"
    "wifi not working",            # → finds "network connectivity"
    "secure my login",             # → finds "two-factor authentication"
]

for query in queries:
    print(f"\nQuery: '{query}'")
    results = search_engine.search(query, top_k=2)
    for r in results:
        print(f"  {r['rank']}. [{r['similarity']:.3f}] {r['document']}")

Part 2: Free Local Embeddings with Sentence-Transformers

from sentence_transformers import SentenceTransformer, util
import torch

# BAAI/bge-large-en-v1.5 — top of MTEB leaderboard, free
model = SentenceTransformer("BAAI/bge-large-en-v1.5")

def semantic_search_local(
    query: str,
    corpus: list[str],
    top_k: int = 5
) -> list[dict]:
    # BGE models work better with a query prefix
    prefixed_query = f"Represent this sentence for searching relevant passages: {query}"
    
    # Encode query and corpus
    query_emb = model.encode(prefixed_query, normalize_embeddings=True)
    corpus_embs = model.encode(corpus, normalize_embeddings=True, batch_size=32)
    
    # Cosine similarity (dot product since normalized)
    scores = corpus_embs @ query_emb
    
    # Top-k results
    top_indices = np.argsort(scores)[::-1][:top_k]
    
    return [
        {
            "document": corpus[i],
            "score": float(scores[i]),
            "rank": rank + 1
        }
        for rank, i in enumerate(top_indices)
    ]

results = semantic_search_local("how to terminate account", docs)

Part 3: Hybrid Search (Semantic + BM25)

from rank_bm25 import BM25Okapi  # pip install rank-bm25
import re
from typing import Optional

class HybridSearchEngine:
    def __init__(self, semantic_weight: float = 0.6):
        """
        semantic_weight: 0.0 = pure keyword, 1.0 = pure semantic
        0.6 is a good starting point for most use cases
        """
        self.semantic_weight = semantic_weight
        self.bm25_weight = 1 - semantic_weight
        self.documents = []
        
        # Semantic components
        self.embed_model = SentenceTransformer("BAAI/bge-large-en-v1.5")
        self.doc_embeddings = None
        
        # BM25 components
        self.bm25 = None
    
    def tokenize(self, text: str) -> list[str]:
        """Simple tokenizer for BM25."""
        return re.sub(r'[^a-z0-9\s]', '', text.lower()).split()
    
    def index(self, documents: list[str]):
        self.documents = documents
        
        # Build semantic index
        self.doc_embeddings = self.embed_model.encode(
            documents, normalize_embeddings=True, batch_size=32
        )
        
        # Build BM25 index
        tokenized = [self.tokenize(doc) for doc in documents]
        self.bm25 = BM25Okapi(tokenized)
        
        print(f"Indexed {len(documents)} documents")
    
    def search(self, query: str, top_k: int = 5) -> list[dict]:
        n = len(self.documents)
        
        # Semantic scores
        query_prefix = f"Represent this sentence for searching relevant passages: {query}"
        query_emb = self.embed_model.encode(query_prefix, normalize_embeddings=True)
        semantic_scores = self.doc_embeddings @ query_emb
        
        # Normalize to [0, 1]
        semantic_min, semantic_max = semantic_scores.min(), semantic_scores.max()
        if semantic_max > semantic_min:
            semantic_normalized = (semantic_scores - semantic_min) / (semantic_max - semantic_min)
        else:
            semantic_normalized = semantic_scores
        
        # BM25 keyword scores
        tokenized_query = self.tokenize(query)
        bm25_scores = np.array(self.bm25.get_scores(tokenized_query))
        
        # Normalize BM25
        bm25_min, bm25_max = bm25_scores.min(), bm25_scores.max()
        if bm25_max > bm25_min:
            bm25_normalized = (bm25_scores - bm25_min) / (bm25_max - bm25_min)
        else:
            bm25_normalized = bm25_scores
        
        # Combine scores
        combined = (
            self.semantic_weight * semantic_normalized +
            self.bm25_weight * bm25_normalized
        )
        
        top_indices = np.argsort(combined)[::-1][:top_k]
        
        return [
            {
                "document": self.documents[i],
                "combined_score": float(combined[i]),
                "semantic_score": float(semantic_normalized[i]),
                "bm25_score": float(bm25_normalized[i]),
                "rank": rank + 1
            }
            for rank, i in enumerate(top_indices)
        ]

hybrid = HybridSearchEngine(semantic_weight=0.6)
hybrid.index(docs)

# Test: hybrid finds both semantic matches AND exact keyword matches
results = hybrid.search("cancel account", top_k=3)
for r in results:
    print(f"Rank {r['rank']}: Sem={r['semantic_score']:.2f}, BM25={r['bm25_score']:.2f}")
    print(f"  {r['document']}")

Part 4: Reranking for Better Precision

from sentence_transformers import CrossEncoder

class RerankedSearchEngine:
    def __init__(self):
        # Bi-encoder for fast initial retrieval
        self.bi_encoder = SentenceTransformer("BAAI/bge-large-en-v1.5")
        # Cross-encoder for accurate reranking
        self.cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
        self.documents = []
        self.doc_embeddings = None
    
    def index(self, documents: list[str]):
        self.documents = documents
        self.doc_embeddings = self.bi_encoder.encode(
            documents, normalize_embeddings=True
        )
    
    def search(self, query: str, top_k: int = 5, initial_k: int = 20) -> list[dict]:
        # Stage 1: Fast semantic retrieval (get more candidates than needed)
        query_emb = self.bi_encoder.encode(
            f"Represent this sentence for searching relevant passages: {query}",
            normalize_embeddings=True
        )
        scores = self.doc_embeddings @ query_emb
        top_initial_indices = np.argsort(scores)[::-1][:initial_k]
        
        # Stage 2: Accurate cross-encoder reranking
        candidates = [(query, self.documents[i]) for i in top_initial_indices]
        rerank_scores = self.cross_encoder.predict(candidates)
        
        # Sort by rerank scores
        sorted_indices = np.argsort(rerank_scores)[::-1][:top_k]
        
        return [
            {
                "document": self.documents[top_initial_indices[i]],
                "rerank_score": float(rerank_scores[i]),
                "initial_rank": rank_in_initial + 1,
                "final_rank": final_rank + 1
            }
            for final_rank, (rank_in_initial, i) in enumerate(
                sorted(enumerate(sorted_indices), key=lambda x: rerank_scores[x[1]], reverse=True)
            )
        ]

Part 5: Production with Qdrant

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

class ProductionSemanticSearch:
    def __init__(self):
        self.qdrant = QdrantClient(host="localhost", port=6333)
        self.embed_model = SentenceTransformer("BAAI/bge-large-en-v1.5")
        self.collection = "search_index"
    
    def create_collection(self, dimension: int = 1024):
        self.qdrant.recreate_collection(
            collection_name=self.collection,
            vectors_config=VectorParams(size=dimension, distance=Distance.COSINE)
        )
    
    def index_documents(self, documents: list[dict]):
        embeddings = self.embed_model.encode(
            [d["text"] for d in documents],
            normalize_embeddings=True,
            batch_size=32,
            show_progress_bar=True
        )
        
        points = [
            PointStruct(
                id=i,
                vector=emb.tolist(),
                payload={k: v for k, v in doc.items() if k != "text"}
            )
            for i, (doc, emb) in enumerate(zip(documents, embeddings))
        ]
        
        self.qdrant.upsert(collection_name=self.collection, points=points)
    
    def search(self, query: str, top_k: int = 10) -> list[dict]:
        query_emb = self.embed_model.encode(
            f"Represent this sentence for searching relevant passages: {query}",
            normalize_embeddings=True
        )
        
        results = self.qdrant.search(
            collection_name=self.collection,
            query_vector=query_emb.tolist(),
            limit=top_k,
            with_payload=True
        )
        
        return [{"score": r.score, **r.payload} for r in results]

Conclusion

For most use cases, BAAI/bge-large-en-v1.5 (free) with hybrid search achieves 90% of what OpenAI's embedding API provides at zero marginal cost.

For the vector database that stores these embeddings at scale, see our vector database guide. For building a complete RAG system on top of this search layer, see our RAG system tutorial.

Frequently Asked Questions

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

AI Learning

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

AI API cost management — practical strategies to reduce OpenAI, Claude, and Gemini API costs by 80% using model selection, caching, RAG, prompt optimization, and batch processing.

May 27, 2026 7 min read

AI Learning

🔥 Trending

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Build an AI chatbot with Python — complete tutorial from OpenAI API integration to conversation memory, streaming responses, and deploying a production-ready chatbot application.

May 27, 2026 7 min read

AI Learning

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

Build a personal AI assistant in Python with persistent memory, web search, file access, and calendar integration — a complete project from architecture to working prototype.

May 27, 2026 7 min read

AI Learning

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

CrewAI tutorial — build multi-agent AI systems where specialized agents collaborate to complete complex tasks, with practical Python examples for research, coding, and content workflows.

May 27, 2026 8 min read

Go deeper on this topic

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Semantic Search Tutorial: Build Search That Understands Meaning, Not Just Keywords

Semantic Search Tutorial: Build Search That Understands Meaning, Not Just Keywords

The Architecture

Part 1: Basic Semantic Search

Part 2: Free Local Embeddings with Sentence-Transformers

Part 3: Hybrid Search (Semantic + BM25)

Part 4: Reranking for Better Precision

Part 5: Production with Qdrant

Conclusion

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

Go deeper on this topic

Get Free AI Notes Daily

Semantic Search Tutorial: Build Search That Understands Meaning, Not Just Keywords

Semantic Search Tutorial: Build Search That Understands Meaning, Not Just Keywords

The Architecture

Part 1: Basic Semantic Search

Part 2: Free Local Embeddings with Sentence-Transformers

Part 3: Hybrid Search (Semantic + BM25)

Part 4: Reranking for Better Precision

Part 5: Production with Qdrant

Conclusion

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

Go deeper on this topic

Get Free AI Notes Daily