Pinecone & Chroma: Vector DB Setup
Pinecone and Chroma: Vector Databases for Production Agents
Choosing the right vector database is one of the most consequential infrastructure decisions in agent development. Chroma is the best tool for development and small-scale production. Pinecone is the managed solution for serious production workloads. This lesson covers both — when to use each and how to implement them correctly.
The Vector Database Decision
Use Chroma when:
- Local development and testing
- < 100K vectors (scales well at this level)
- You want zero external dependencies
- Open-source, self-hosted deployment
- Metadata filtering is important
Use Pinecone when:
-
100K vectors
- You need serverless, managed infrastructure (no ops)
- High-traffic production applications
- Low-latency requirements at scale
- You don't want to manage infrastructure
Also worth knowing:
- pgvector — PostgreSQL extension, great if you're already on PostgreSQL
- Weaviate — Open-source, good for hybrid search (keyword + semantic)
- Qdrant — High-performance open-source, Docker-friendly
Chroma: Deep Dive
Basic Setup and Configuration
import chromadb
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
# Persistent client (saves to disk)
client = chromadb.PersistentClient(path="./chroma_db")
# LangChain wrapper
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma(
client=client,
collection_name="my_documents",
embedding_function=embeddings
)
Adding Documents Incrementally
You don't have to re-build the entire database when adding new documents:
from langchain_core.documents import Document
# Add new documents (won't duplicate if you use IDs)
new_docs = [
Document(
page_content="New policy update: vacation days increased to 20 per year.",
metadata={"source": "hr_update_2024.pdf", "date": "2024-11", "department": "hr"}
)
]
# Add with explicit IDs to enable idempotent updates
doc_ids = vectorstore.add_documents(new_docs, ids=["hr_update_2024_p1"])
# Update an existing document (delete + re-add)
vectorstore.delete(ids=["hr_update_2024_p1"])
vectorstore.add_documents(new_docs, ids=["hr_update_2024_p1"])
Chroma Collection Management
# List all collections
collections = client.list_collections()
for col in collections:
print(f"{col.name}: {col.count()} vectors")
# Get collection stats
collection = client.get_collection("my_documents")
print(f"Total vectors: {collection.count()}")
# Delete a collection
client.delete_collection("old_documents")
# Reset (delete all data in a collection)
collection.delete(where={"source": "outdated_file.pdf"})
Chroma for Multi-Tenant Applications
Each user or organization gets their own collection:
def get_user_vectorstore(user_id: str) -> Chroma:
"""Get or create a vector store for a specific user."""
return Chroma(
client=chromadb.PersistentClient(path=f"./user_stores/{user_id}"),
collection_name=f"docs_{user_id}",
embedding_function=embeddings
)
Pinecone: Deep Dive
Setup
pip install pinecone-client
import os
from pinecone import Pinecone, ServerlessSpec
# Initialize
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
# Create an index (one-time setup)
if "my-agent-index" not in pc.list_indexes().names():
pc.create_index(
name="my-agent-index",
dimension=1536, # Must match embedding model dimensions (text-embedding-3-small = 1536)
metric="cosine", # "cosine" for most NLP tasks
spec=ServerlessSpec(
cloud="aws",
region="us-east-1"
)
)
# Connect to index
index = pc.Index("my-agent-index")
LangChain Integration
from langchain_pinecone import PineconeVectorStore
# Create from documents
vectorstore = PineconeVectorStore.from_documents(
documents=chunks,
embedding=embeddings,
index_name="my-agent-index"
)
# Connect to existing index
vectorstore = PineconeVectorStore.from_existing_index(
index_name="my-agent-index",
embedding=embeddings
)
# Standard retrieval
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
docs = retriever.invoke("What is the refund policy?")
Pinecone Namespaces (Multi-Tenancy)
Pinecone namespaces let you partition a single index for multiple customers/tenants without managing separate indexes:
# Store documents in a customer-specific namespace
vectorstore = PineconeVectorStore.from_existing_index(
index_name="my-agent-index",
embedding=embeddings,
namespace=f"customer_{customer_id}" # Isolated data per customer
)
# Search only within that customer's data
docs = vectorstore.similarity_search("refund policy", k=4)
# Only returns results from customer_{customer_id} namespace
Pinecone Metadata Filtering
# Filter by metadata during search
docs = vectorstore.similarity_search(
"pricing for enterprise tier",
k=4,
filter={
"department": "sales",
"year": {"$gte": 2023},
"document_type": {"$in": ["contract", "proposal"]}
}
)
Batching for Large Ingestion Jobs
When ingesting tens of thousands of documents, batch carefully to avoid rate limits:
import time
from tqdm import tqdm
def batch_ingest(documents: list, vectorstore, batch_size: int = 100):
"""Ingest documents in batches with progress tracking."""
total_batches = (len(documents) + batch_size - 1) // batch_size
for i in tqdm(range(0, len(documents), batch_size), total=total_batches):
batch = documents[i:i + batch_size]
try:
vectorstore.add_documents(batch)
except Exception as e:
print(f"Batch {i//batch_size + 1} failed: {e}")
time.sleep(5) # Wait before retry
vectorstore.add_documents(batch) # Retry once
# Rate limiting for embedding API
time.sleep(0.5)
print(f"Ingested {len(documents)} documents")
# Use it
batch_ingest(all_chunks, vectorstore, batch_size=50)
Evaluating Retrieval Quality
Bad retrieval leads to bad RAG output regardless of how good your LLM is. Evaluate your retriever:
def evaluate_retriever(retriever, test_cases: list[dict]) -> dict:
"""
Test cases format: [{"query": "...", "expected_source": "..."}]
"""
correct = 0
for case in test_cases:
docs = retriever.invoke(case["query"])
sources = [doc.metadata.get("source", "") for doc in docs]
if case["expected_source"] in sources:
correct += 1
else:
print(f"MISS: '{case['query']}'")
print(f"Expected: {case['expected_source']}")
print(f"Got: {sources}\n")
precision = correct / len(test_cases)
print(f"Retrieval accuracy: {precision:.1%} ({correct}/{len(test_cases)})")
return {"precision": precision}
test_cases = [
{"query": "How many vacation days do employees get?", "expected_source": "hr_policy.pdf"},
{"query": "What is the software return policy?", "expected_source": "returns_policy.pdf"},
]
evaluate_retriever(retriever, test_cases)
Common Retrieval Improvements
If retrieval accuracy is poor:
- Smaller chunk size — try 500 instead of 1000 characters
- More overlap — increase chunk_overlap to 300-400
- Better metadata — add section titles, document type, date as metadata
- Hypothetical document embeddings (HyDE) — generate a hypothetical answer, embed it, search with that vector
- Re-ranking — use a cross-encoder to re-rank retrieval results
- MMR search — reduce duplicate chunks with Maximum Marginal Relevance
Next lesson: Building a RAG agent — combining retrieval with agent reasoning.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises