Follow AiTechWorlds on LinkedIn for professional AI content!Follow Now →
18 minLesson 11 of 23
Vector Databases & RAG

Pinecone & Chroma: Vector DB Setup

Pinecone and Chroma: Vector Databases for Production Agents

Choosing the right vector database is one of the most consequential infrastructure decisions in agent development. Chroma is the best tool for development and small-scale production. Pinecone is the managed solution for serious production workloads. This lesson covers both — when to use each and how to implement them correctly.

The Vector Database Decision

Use Chroma when:

  • Local development and testing
  • < 100K vectors (scales well at this level)
  • You want zero external dependencies
  • Open-source, self-hosted deployment
  • Metadata filtering is important

Use Pinecone when:

  • 100K vectors

  • You need serverless, managed infrastructure (no ops)
  • High-traffic production applications
  • Low-latency requirements at scale
  • You don't want to manage infrastructure

Also worth knowing:

  • pgvector — PostgreSQL extension, great if you're already on PostgreSQL
  • Weaviate — Open-source, good for hybrid search (keyword + semantic)
  • Qdrant — High-performance open-source, Docker-friendly

Chroma: Deep Dive

Basic Setup and Configuration

import chromadb
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Persistent client (saves to disk)
client = chromadb.PersistentClient(path="./chroma_db")

# LangChain wrapper
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma(
    client=client,
    collection_name="my_documents",
    embedding_function=embeddings
)

Adding Documents Incrementally

You don't have to re-build the entire database when adding new documents:

from langchain_core.documents import Document

# Add new documents (won't duplicate if you use IDs)
new_docs = [
    Document(
        page_content="New policy update: vacation days increased to 20 per year.",
        metadata={"source": "hr_update_2024.pdf", "date": "2024-11", "department": "hr"}
    )
]

# Add with explicit IDs to enable idempotent updates
doc_ids = vectorstore.add_documents(new_docs, ids=["hr_update_2024_p1"])

# Update an existing document (delete + re-add)
vectorstore.delete(ids=["hr_update_2024_p1"])
vectorstore.add_documents(new_docs, ids=["hr_update_2024_p1"])

Chroma Collection Management

# List all collections
collections = client.list_collections()
for col in collections:
    print(f"{col.name}: {col.count()} vectors")

# Get collection stats
collection = client.get_collection("my_documents")
print(f"Total vectors: {collection.count()}")

# Delete a collection
client.delete_collection("old_documents")

# Reset (delete all data in a collection)
collection.delete(where={"source": "outdated_file.pdf"})

Chroma for Multi-Tenant Applications

Each user or organization gets their own collection:

def get_user_vectorstore(user_id: str) -> Chroma:
    """Get or create a vector store for a specific user."""
    return Chroma(
        client=chromadb.PersistentClient(path=f"./user_stores/{user_id}"),
        collection_name=f"docs_{user_id}",
        embedding_function=embeddings
    )

Pinecone: Deep Dive

Setup

pip install pinecone-client
import os
from pinecone import Pinecone, ServerlessSpec

# Initialize
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# Create an index (one-time setup)
if "my-agent-index" not in pc.list_indexes().names():
    pc.create_index(
        name="my-agent-index",
        dimension=1536,  # Must match embedding model dimensions (text-embedding-3-small = 1536)
        metric="cosine", # "cosine" for most NLP tasks
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        )
    )

# Connect to index
index = pc.Index("my-agent-index")

LangChain Integration

from langchain_pinecone import PineconeVectorStore

# Create from documents
vectorstore = PineconeVectorStore.from_documents(
    documents=chunks,
    embedding=embeddings,
    index_name="my-agent-index"
)

# Connect to existing index
vectorstore = PineconeVectorStore.from_existing_index(
    index_name="my-agent-index",
    embedding=embeddings
)

# Standard retrieval
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
docs = retriever.invoke("What is the refund policy?")

Pinecone Namespaces (Multi-Tenancy)

Pinecone namespaces let you partition a single index for multiple customers/tenants without managing separate indexes:

# Store documents in a customer-specific namespace
vectorstore = PineconeVectorStore.from_existing_index(
    index_name="my-agent-index",
    embedding=embeddings,
    namespace=f"customer_{customer_id}"  # Isolated data per customer
)

# Search only within that customer's data
docs = vectorstore.similarity_search("refund policy", k=4)
# Only returns results from customer_{customer_id} namespace

Pinecone Metadata Filtering

# Filter by metadata during search
docs = vectorstore.similarity_search(
    "pricing for enterprise tier",
    k=4,
    filter={
        "department": "sales",
        "year": {"$gte": 2023},
        "document_type": {"$in": ["contract", "proposal"]}
    }
)

Batching for Large Ingestion Jobs

When ingesting tens of thousands of documents, batch carefully to avoid rate limits:

import time
from tqdm import tqdm

def batch_ingest(documents: list, vectorstore, batch_size: int = 100):
    """Ingest documents in batches with progress tracking."""
    total_batches = (len(documents) + batch_size - 1) // batch_size
    
    for i in tqdm(range(0, len(documents), batch_size), total=total_batches):
        batch = documents[i:i + batch_size]
        
        try:
            vectorstore.add_documents(batch)
        except Exception as e:
            print(f"Batch {i//batch_size + 1} failed: {e}")
            time.sleep(5)  # Wait before retry
            vectorstore.add_documents(batch)  # Retry once
        
        # Rate limiting for embedding API
        time.sleep(0.5)
    
    print(f"Ingested {len(documents)} documents")

# Use it
batch_ingest(all_chunks, vectorstore, batch_size=50)

Evaluating Retrieval Quality

Bad retrieval leads to bad RAG output regardless of how good your LLM is. Evaluate your retriever:

def evaluate_retriever(retriever, test_cases: list[dict]) -> dict:
    """
    Test cases format: [{"query": "...", "expected_source": "..."}]
    """
    correct = 0
    
    for case in test_cases:
        docs = retriever.invoke(case["query"])
        sources = [doc.metadata.get("source", "") for doc in docs]
        
        if case["expected_source"] in sources:
            correct += 1
        else:
            print(f"MISS: '{case['query']}'")
            print(f"Expected: {case['expected_source']}")
            print(f"Got: {sources}\n")
    
    precision = correct / len(test_cases)
    print(f"Retrieval accuracy: {precision:.1%} ({correct}/{len(test_cases)})")
    return {"precision": precision}

test_cases = [
    {"query": "How many vacation days do employees get?", "expected_source": "hr_policy.pdf"},
    {"query": "What is the software return policy?", "expected_source": "returns_policy.pdf"},
]

evaluate_retriever(retriever, test_cases)

Common Retrieval Improvements

If retrieval accuracy is poor:

  1. Smaller chunk size — try 500 instead of 1000 characters
  2. More overlap — increase chunk_overlap to 300-400
  3. Better metadata — add section titles, document type, date as metadata
  4. Hypothetical document embeddings (HyDE) — generate a hypothetical answer, embed it, search with that vector
  5. Re-ranking — use a cross-encoder to re-rank retrieval results
  6. MMR search — reduce duplicate chunks with Maximum Marginal Relevance

Next lesson: Building a RAG agent — combining retrieval with agent reasoning.

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →
!