How to Use LangChain with Redis (Cache and Vector Store)
Use LangChain with Redis for low-latency AI responses. Covers RedisCache, RedisSemanticCache, RedisVL vector search, and a Redis vs alternatives comparison.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Speed matters in AI applications. Users abandon chatbot responses that take more than two or three seconds. Every repeated LLM call that could be served from cache is wasted money and added latency. Redis — with its sub-millisecond read times and built-in vector search — is one of the best tools in the LangChain ecosystem for both caching LLM responses and powering low-latency retrieval.
This guide covers four Redis integration patterns in LangChain: exact-match caching, semantic caching, vector store operations, and a full production pipeline that combines all three. You will also get a comparison table showing when Redis makes more sense than Memcached or DynamoDB for these workloads.
For context on how these patterns fit into a broader architecture, see the RAG system tutorial and the vector database guide.
Installation and Setup
pip install langchain langchain-openai langchain-community redis redisvl
You also need Redis Stack running locally or a Redis Cloud instance with RediSearch enabled:
# Docker — easiest way to get Redis Stack locally
docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack:latest
Verify the connection:
import redis
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
print(client.ping()) # True
print(client.info("server")["redis_version"]) # e.g., "7.2.0"
Pattern 1: Exact-Match Caching with RedisCache
The simplest form of caching stores prompt/response pairs and returns the cached response when an identical prompt is received again.
import langchain
from langchain.cache import RedisCache
from langchain_openai import ChatOpenAI
import redis
# Set up the Redis cache globally
redis_client = redis.Redis(host="localhost", port=6379)
langchain.llm_cache = RedisCache(redis_=redis_client)
# All LLM calls now check the cache first
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
import time
# First call — goes to OpenAI
start = time.time()
response1 = llm.invoke("What is the boiling point of water?")
print(f"First call: {time.time() - start:.2f}s")
print(response1.content)
# Second call — served from Redis cache
start = time.time()
response2 = llm.invoke("What is the boiling point of water?")
print(f"Cached call: {time.time() - start:.4f}s")
# Cached call: 0.0012s — three orders of magnitude faster
You can set a TTL on cached responses to prevent stale data:
from langchain.cache import RedisCache
# Cache entries expire after 1 hour
langchain.llm_cache = RedisCache(
redis_=redis_client,
ttl=3600
)
Pattern 2: Semantic Caching with RedisSemanticCache
Exact-match caching misses a huge category of reusable responses: semantically equivalent questions that are worded differently. RedisSemanticCache solves this by embedding the incoming prompt and searching for a close enough match in the cache.
from langchain_community.cache import RedisSemanticCache
from langchain_openai import OpenAIEmbeddings
import langchain
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
langchain.llm_cache = RedisSemanticCache(
redis_url="redis://localhost:6379",
embedding=embeddings,
score_threshold=0.2 # lower = stricter matching; 0.2 works well for factual questions
)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# First call — embeds the prompt and stores the result
response1 = llm.invoke("What is the capital of France?")
print(response1.content) # "The capital of France is Paris."
# Semantically equivalent prompt — hits the cache
response2 = llm.invoke("Tell me the capital city of France.")
print(response2.content) # "The capital of France is Paris." — from cache!
# Completely different topic — cache miss, goes to OpenAI
response3 = llm.invoke("Who wrote Hamlet?")
print(response3.content)
The score_threshold parameter controls how similar two prompts need to be to count as a cache hit. A threshold of 0.2 means the cosine distance between the two embeddings must be 0.2 or less. Lower values are stricter. For customer support applications where questions are highly repetitive, a threshold of 0.15–0.25 typically gives a cache hit rate of 30–50%.
Pattern 3: Redis as a LangChain Vector Store
Redis with the RediSearch module functions as a full-featured vector store for RAG systems. It supports filtered search, full-text search combined with vector search, and TTL-based document expiry.
from langchain_community.vectorstores import Redis
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Create the vector store and add documents
texts = [
"Redis is an in-memory data store with sub-millisecond latency.",
"RediSearch adds full-text search and vector similarity search to Redis.",
"Redis Stack bundles Redis with RediSearch, RedisJSON, and other modules.",
"Redis Cloud offers managed Redis with high availability and automatic scaling.",
]
metadatas = [
{"source": "redis_docs", "topic": "overview"},
{"source": "redis_docs", "topic": "search"},
{"source": "redis_docs", "topic": "stack"},
{"source": "redis_docs", "topic": "cloud"},
]
vector_store = Redis.from_texts(
texts=texts,
embedding=embeddings,
metadatas=metadatas,
redis_url="redis://localhost:6379",
index_name="langchain_docs"
)
# Simple similarity search
results = vector_store.similarity_search("What modules does Redis Stack include?", k=2)
for doc in results:
print(f"[{doc.metadata['topic']}] {doc.page_content}")
Filtered Vector Search
One of Redis's key advantages over simpler vector stores is support for filtered search — you can combine metadata filters with vector similarity in a single query:
from langchain_community.vectorstores.redis.filters import RedisText, RedisNum
# Search only within documents tagged "search" or "stack"
results = vector_store.similarity_search(
query="vector search capabilities",
k=3,
filter=RedisText("topic") == "search"
)
for doc in results:
print(doc.page_content)
Loading an Existing Index
If you have already created a Redis index and want to reconnect to it:
# Reconnect to an existing index
vector_store = Redis.from_existing_index(
embedding=embeddings,
redis_url="redis://localhost:6379",
index_name="langchain_docs",
schema="redis_schema.yaml" # optional: specify schema for typed fields
)
# Use as a retriever
retriever = vector_store.as_retriever(
search_type="similarity",
search_kwargs={"k": 5}
)
Pattern 4: RedisVL for Advanced Index Control
When you need fine-grained control over your Redis vector index — custom field types, hybrid search scoring, or batch upserts — use RedisVL directly alongside LangChain:
from redisvl.index import SearchIndex
from redisvl.schema import IndexSchema
from redisvl.query import VectorQuery
from langchain_openai import OpenAIEmbeddings
import numpy as np
# Define the index schema
schema = IndexSchema.from_dict({
"index": {
"name": "products",
"prefix": "product"
},
"fields": [
{"name": "name", "type": "text"},
{"name": "category", "type": "tag"},
{"name": "price", "type": "numeric"},
{
"name": "description_embedding",
"type": "vector",
"attrs": {
"algorithm": "HNSW",
"dims": 1536,
"distance_metric": "COSINE",
"datatype": "FLOAT32"
}
}
]
})
# Create the index
index = SearchIndex(schema)
index.connect("redis://localhost:6379")
index.create(overwrite=True)
# Generate embeddings and insert documents
embeddings_model = OpenAIEmbeddings(model="text-embedding-3-small")
products = [
{"name": "Widget Pro", "category": "tools", "price": 29.99,
"description": "Professional-grade widget for precision work."},
{"name": "Budget Widget", "category": "tools", "price": 9.99,
"description": "Entry-level widget for casual users."},
{"name": "Gadget Plus", "category": "electronics", "price": 49.99,
"description": "Enhanced electronic gadget with smart features."},
]
for product in products:
embedding = embeddings_model.embed_query(product["description"])
index.load([{
**product,
"description_embedding": np.array(embedding, dtype=np.float32).tobytes()
}])
# Hybrid search: vector similarity + category filter
query_text = "precision professional tool"
query_embedding = embeddings_model.embed_query(query_text)
query = VectorQuery(
vector=query_embedding,
vector_field_name="description_embedding",
return_fields=["name", "category", "price"],
filter_expression="@category:{tools}",
num_results=3
)
results = index.query(query)
for result in results:
print(f"{result['name']} ({result['category']}) - ${result['price']}")
Building a Complete RAG Pipeline with Redis
Here is a production-grade setup that uses Redis for both semantic caching and vector retrieval:
import langchain
from langchain_community.cache import RedisSemanticCache
from langchain_community.vectorstores import Redis
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
REDIS_URL = "redis://localhost:6379"
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Enable semantic caching globally
langchain.llm_cache = RedisSemanticCache(
redis_url=REDIS_URL,
embedding=embeddings,
score_threshold=0.2
)
# Connect to the Redis vector store
vector_store = Redis.from_existing_index(
embedding=embeddings,
redis_url=REDIS_URL,
index_name="knowledge_base"
)
retriever = vector_store.as_retriever(search_kwargs={"k": 4})
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_template("""
You are a helpful assistant. Answer based on the context below.
Context:
{context}
Question: {question}
Answer:
""")
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
{
"context": retriever | format_docs,
"question": RunnablePassthrough()
}
| prompt
| llm
| StrOutputParser()
)
# First call — retrieves from Redis vector store, calls OpenAI
answer1 = rag_chain.invoke("How does Redis vector search work?")
print(answer1)
# Similar follow-up — hits the semantic cache, skips OpenAI entirely
answer2 = rag_chain.invoke("Explain how Redis does vector similarity search.")
print(answer2) # Same answer, sub-millisecond latency
For a deeper understanding of how this retrieval chain connects to agent architectures, see Build AI agent with LangChain and the semantic search tutorial.
Redis vs Memcached vs DynamoDB for AI Caching
| Feature | Redis | Memcached | DynamoDB |
|---|---|---|---|
| Vector search | Yes (RediSearch) | No | No (requires DAX) |
| Semantic cache | Yes (native) | No | No |
| Read latency | Sub-millisecond | Sub-millisecond | Single-digit ms |
| Write throughput | Very high | Very high | High (provisioned) |
| TTL support | Yes | Yes | Yes |
| Persistence | Optional (RDB/AOF) | None | Yes (always) |
| Max value size | 512 MB | 1 MB | 400 KB |
| Horizontal scale | Redis Cluster | Yes | Automatic |
| Managed cloud | Redis Cloud | ElastiCache | AWS native |
| Cost model | Memory-based | Memory-based | Request-based |
Redis is the clear choice when you need vector search alongside caching. Memcached is faster at pure key/value operations when all values are small. DynamoDB makes sense when you need persistence guarantees and are already deep in the AWS ecosystem.
Configuring Redis for Production
A few configuration settings make a significant difference in production:
import redis
# Production connection with connection pooling and retry logic
pool = redis.ConnectionPool(
host="your-redis-host",
port=6379,
password="your-password",
max_connections=50,
socket_timeout=5,
socket_connect_timeout=5,
retry_on_timeout=True
)
client = redis.Redis(connection_pool=pool)
# For TLS (Redis Cloud / ElastiCache)
client = redis.Redis(
host="your-redis-host",
port=6380,
password="your-password",
ssl=True,
ssl_cert_reqs="required",
ssl_ca_certs="/path/to/ca-cert.pem"
)
Monitoring Cache Performance
Track cache hit rates to measure the value Redis is delivering:
from functools import wraps
import time
class CacheMonitor:
def __init__(self, redis_client: redis.Redis):
self.client = redis_client
self.hits = 0
self.misses = 0
def get_stats(self) -> dict:
info = self.client.info("stats")
return {
"keyspace_hits": info["keyspace_hits"],
"keyspace_misses": info["keyspace_misses"],
"hit_rate": info["keyspace_hits"] / max(
info["keyspace_hits"] + info["keyspace_misses"], 1
),
"used_memory_human": self.client.info("memory")["used_memory_human"]
}
monitor = CacheMonitor(redis_client)
# Check stats periodically
stats = monitor.get_stats()
print(f"Hit rate: {stats['hit_rate']:.1%}")
print(f"Memory used: {stats['used_memory_human']}")
A well-configured semantic cache for a customer support chatbot should achieve a 30–60% hit rate, cutting LLM costs by the same proportion. For more on integrating this into a deployed system, see Deploy AI model to production.
Common Pitfalls
Setting score_threshold too low for semantic cache. A threshold near 0.0 means only nearly identical prompts match. You lose most of the value of semantic caching. Start at 0.2 and adjust based on your false-positive rate.
Not using TTLs on cached LLM responses. Facts change. A cached response about stock prices or news events becomes wrong quickly. Set appropriate TTLs based on how time-sensitive your content is.
Running Redis without persistence for vector stores. If Redis restarts without RDB or AOF persistence enabled, your vector index is gone. Enable persistence with appendonly yes in your Redis configuration or use Redis Stack with the default settings.
Forgetting to set decode_responses=False for vector data. When storing binary embeddings directly in Redis, make sure your connection is not decoding binary values as UTF-8 strings. Use separate clients: one with decode_responses=True for string data, one without for binary vector data.
Frequently Asked Questions
What is the difference between RedisCache and RedisSemanticCache in LangChain? RedisCache performs exact-match caching — it returns a cached response only when the input prompt is byte-for-byte identical to a previous one. RedisSemanticCache uses vector similarity to match semantically equivalent prompts, so "What is the capital of France?" and "Tell me the capital city of France" can both hit the same cached answer.
How does RedisVL differ from using Redis as a LangChain VectorStore? The LangChain Redis VectorStore integration is a high-level wrapper that manages index creation and similarity search through the LangChain interface. RedisVL (Redis Vector Library) is a lower-level Python library that gives you more control over index schemas, hybrid search (text + vector), and batch operations — useful when you need fine-grained control or high-throughput indexing.
Is Redis suitable for production RAG systems? Yes. Redis with the RediSearch module (built into Redis Stack and Redis Cloud) supports filtered vector search, full-text search, and sub-millisecond query latency at scale. Teams have deployed RAG systems with millions of documents on Redis Cloud. The main consideration is that Redis is an in-memory store, so large corpora require appropriately sized instances.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
AutoGen vs LangChain: Which for Multi-Agent Systems in 2026?
AutoGen vs LangChain for multi-agent systems in 2026 — feature comparison, same use case in both frameworks, and an honest verdict on when each wins.
5 AutoGPT Memory Types (Vector, Redis, File, Conversation)
Compare AutoGPT's 5 memory backends — local file, Redis, Pinecone, Milvus, and Weaviate. Choose the right one for speed, cost, and persistence needs.
AutoGPT vs LangChain Agents: Which is More Autonomous?
Compare AutoGPT's zero-shot autonomy against LangChain's ReAct agents. Discover which handles complex tasks better and when to choose each framework.
10 LangChain Retrieval Strategies for Better RAG Results
Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.