AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

Redis caching AI responses with low latency — LangChain Redis cache vector store

How to Use LangChain with Redis (Cache and Vector Store)

⚡ Quick Answer

Use LangChain with Redis for low-latency AI responses. Covers RedisCache, RedisSemanticCache, RedisVL vector search, and a Redis vs alternatives comparison.

AiTechWorlds Team May 31, 2026 10 min read

#LangChain #Redis #caching #vector store #low-latency

📚Part of the Langchain guide — explore all Langchain articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Speed matters in AI applications. Users abandon chatbot responses that take more than two or three seconds. Every repeated LLM call that could be served from cache is wasted money and added latency. Redis — with its sub-millisecond read times and built-in vector search — is one of the best tools in the LangChain ecosystem for both caching LLM responses and powering low-latency retrieval.

This guide covers four Redis integration patterns in LangChain: exact-match caching, semantic caching, vector store operations, and a full production pipeline that combines all three. You will also get a comparison table showing when Redis makes more sense than Memcached or DynamoDB for these workloads.

For context on how these patterns fit into a broader architecture, see the RAG system tutorial and the vector database guide.

Installation and Setup

pip install langchain langchain-openai langchain-community redis redisvl

You also need Redis Stack running locally or a Redis Cloud instance with RediSearch enabled:

# Docker — easiest way to get Redis Stack locally
docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack:latest

Verify the connection:

import redis

client = redis.Redis(host="localhost", port=6379, decode_responses=True)
print(client.ping())  # True
print(client.info("server")["redis_version"])  # e.g., "7.2.0"

Pattern 1: Exact-Match Caching with RedisCache

The simplest form of caching stores prompt/response pairs and returns the cached response when an identical prompt is received again.

import langchain
from langchain.cache import RedisCache
from langchain_openai import ChatOpenAI
import redis

# Set up the Redis cache globally
redis_client = redis.Redis(host="localhost", port=6379)
langchain.llm_cache = RedisCache(redis_=redis_client)

# All LLM calls now check the cache first
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

import time

# First call — goes to OpenAI
start = time.time()
response1 = llm.invoke("What is the boiling point of water?")
print(f"First call: {time.time() - start:.2f}s")
print(response1.content)

# Second call — served from Redis cache
start = time.time()
response2 = llm.invoke("What is the boiling point of water?")
print(f"Cached call: {time.time() - start:.4f}s")
# Cached call: 0.0012s — three orders of magnitude faster

You can set a TTL on cached responses to prevent stale data:

from langchain.cache import RedisCache

# Cache entries expire after 1 hour
langchain.llm_cache = RedisCache(
    redis_=redis_client,
    ttl=3600
)

Pattern 2: Semantic Caching with RedisSemanticCache

Exact-match caching misses a huge category of reusable responses: semantically equivalent questions that are worded differently. RedisSemanticCache solves this by embedding the incoming prompt and searching for a close enough match in the cache.

from langchain_community.cache import RedisSemanticCache
from langchain_openai import OpenAIEmbeddings
import langchain

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

langchain.llm_cache = RedisSemanticCache(
    redis_url="redis://localhost:6379",
    embedding=embeddings,
    score_threshold=0.2  # lower = stricter matching; 0.2 works well for factual questions
)

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# First call — embeds the prompt and stores the result
response1 = llm.invoke("What is the capital of France?")
print(response1.content)  # "The capital of France is Paris."

# Semantically equivalent prompt — hits the cache
response2 = llm.invoke("Tell me the capital city of France.")
print(response2.content)  # "The capital of France is Paris." — from cache!

# Completely different topic — cache miss, goes to OpenAI
response3 = llm.invoke("Who wrote Hamlet?")
print(response3.content)

The score_threshold parameter controls how similar two prompts need to be to count as a cache hit. A threshold of 0.2 means the cosine distance between the two embeddings must be 0.2 or less. Lower values are stricter. For customer support applications where questions are highly repetitive, a threshold of 0.15–0.25 typically gives a cache hit rate of 30–50%.

Pattern 3: Redis as a LangChain Vector Store

Redis with the RediSearch module functions as a full-featured vector store for RAG systems. It supports filtered search, full-text search combined with vector search, and TTL-based document expiry.

from langchain_community.vectorstores import Redis
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create the vector store and add documents
texts = [
    "Redis is an in-memory data store with sub-millisecond latency.",
    "RediSearch adds full-text search and vector similarity search to Redis.",
    "Redis Stack bundles Redis with RediSearch, RedisJSON, and other modules.",
    "Redis Cloud offers managed Redis with high availability and automatic scaling.",
]

metadatas = [
    {"source": "redis_docs", "topic": "overview"},
    {"source": "redis_docs", "topic": "search"},
    {"source": "redis_docs", "topic": "stack"},
    {"source": "redis_docs", "topic": "cloud"},
]

vector_store = Redis.from_texts(
    texts=texts,
    embedding=embeddings,
    metadatas=metadatas,
    redis_url="redis://localhost:6379",
    index_name="langchain_docs"
)

# Simple similarity search
results = vector_store.similarity_search("What modules does Redis Stack include?", k=2)
for doc in results:
    print(f"[{doc.metadata['topic']}] {doc.page_content}")

Filtered Vector Search

One of Redis's key advantages over simpler vector stores is support for filtered search — you can combine metadata filters with vector similarity in a single query:

from langchain_community.vectorstores.redis.filters import RedisText, RedisNum

# Search only within documents tagged "search" or "stack"
results = vector_store.similarity_search(
    query="vector search capabilities",
    k=3,
    filter=RedisText("topic") == "search"
)

for doc in results:
    print(doc.page_content)

Loading an Existing Index

If you have already created a Redis index and want to reconnect to it:

# Reconnect to an existing index
vector_store = Redis.from_existing_index(
    embedding=embeddings,
    redis_url="redis://localhost:6379",
    index_name="langchain_docs",
    schema="redis_schema.yaml"  # optional: specify schema for typed fields
)

# Use as a retriever
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5}
)

Pattern 4: RedisVL for Advanced Index Control

When you need fine-grained control over your Redis vector index — custom field types, hybrid search scoring, or batch upserts — use RedisVL directly alongside LangChain:

from redisvl.index import SearchIndex
from redisvl.schema import IndexSchema
from redisvl.query import VectorQuery
from langchain_openai import OpenAIEmbeddings
import numpy as np

# Define the index schema
schema = IndexSchema.from_dict({
    "index": {
        "name": "products",
        "prefix": "product"
    },
    "fields": [
        {"name": "name", "type": "text"},
        {"name": "category", "type": "tag"},
        {"name": "price", "type": "numeric"},
        {
            "name": "description_embedding",
            "type": "vector",
            "attrs": {
                "algorithm": "HNSW",
                "dims": 1536,
                "distance_metric": "COSINE",
                "datatype": "FLOAT32"
            }
        }
    ]
})

# Create the index
index = SearchIndex(schema)
index.connect("redis://localhost:6379")
index.create(overwrite=True)

# Generate embeddings and insert documents
embeddings_model = OpenAIEmbeddings(model="text-embedding-3-small")

products = [
    {"name": "Widget Pro", "category": "tools", "price": 29.99,
     "description": "Professional-grade widget for precision work."},
    {"name": "Budget Widget", "category": "tools", "price": 9.99,
     "description": "Entry-level widget for casual users."},
    {"name": "Gadget Plus", "category": "electronics", "price": 49.99,
     "description": "Enhanced electronic gadget with smart features."},
]

for product in products:
    embedding = embeddings_model.embed_query(product["description"])
    index.load([{
        **product,
        "description_embedding": np.array(embedding, dtype=np.float32).tobytes()
    }])

# Hybrid search: vector similarity + category filter
query_text = "precision professional tool"
query_embedding = embeddings_model.embed_query(query_text)

query = VectorQuery(
    vector=query_embedding,
    vector_field_name="description_embedding",
    return_fields=["name", "category", "price"],
    filter_expression="@category:{tools}",
    num_results=3
)

results = index.query(query)
for result in results:
    print(f"{result['name']} ({result['category']}) - ${result['price']}")

Building a Complete RAG Pipeline with Redis

Here is a production-grade setup that uses Redis for both semantic caching and vector retrieval:

import langchain
from langchain_community.cache import RedisSemanticCache
from langchain_community.vectorstores import Redis
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

REDIS_URL = "redis://localhost:6379"

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Enable semantic caching globally
langchain.llm_cache = RedisSemanticCache(
    redis_url=REDIS_URL,
    embedding=embeddings,
    score_threshold=0.2
)

# Connect to the Redis vector store
vector_store = Redis.from_existing_index(
    embedding=embeddings,
    redis_url=REDIS_URL,
    index_name="knowledge_base"
)

retriever = vector_store.as_retriever(search_kwargs={"k": 4})

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_template("""
You are a helpful assistant. Answer based on the context below.

Context:
{context}

Question: {question}

Answer:
""")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    }
    | prompt
    | llm
    | StrOutputParser()
)

# First call — retrieves from Redis vector store, calls OpenAI
answer1 = rag_chain.invoke("How does Redis vector search work?")
print(answer1)

# Similar follow-up — hits the semantic cache, skips OpenAI entirely
answer2 = rag_chain.invoke("Explain how Redis does vector similarity search.")
print(answer2)  # Same answer, sub-millisecond latency

For a deeper understanding of how this retrieval chain connects to agent architectures, see Build AI agent with LangChain and the semantic search tutorial.

Redis vs Memcached vs DynamoDB for AI Caching

Feature	Redis	Memcached	DynamoDB
Vector search	Yes (RediSearch)	No	No (requires DAX)
Semantic cache	Yes (native)	No	No
Read latency	Sub-millisecond	Sub-millisecond	Single-digit ms
Write throughput	Very high	Very high	High (provisioned)
TTL support	Yes	Yes	Yes
Persistence	Optional (RDB/AOF)	None	Yes (always)
Max value size	512 MB	1 MB	400 KB
Horizontal scale	Redis Cluster	Yes	Automatic
Managed cloud	Redis Cloud	ElastiCache	AWS native
Cost model	Memory-based	Memory-based	Request-based

Redis is the clear choice when you need vector search alongside caching. Memcached is faster at pure key/value operations when all values are small. DynamoDB makes sense when you need persistence guarantees and are already deep in the AWS ecosystem.

Configuring Redis for Production

A few configuration settings make a significant difference in production:

import redis

# Production connection with connection pooling and retry logic
pool = redis.ConnectionPool(
    host="your-redis-host",
    port=6379,
    password="your-password",
    max_connections=50,
    socket_timeout=5,
    socket_connect_timeout=5,
    retry_on_timeout=True
)

client = redis.Redis(connection_pool=pool)

# For TLS (Redis Cloud / ElastiCache)
client = redis.Redis(
    host="your-redis-host",
    port=6380,
    password="your-password",
    ssl=True,
    ssl_cert_reqs="required",
    ssl_ca_certs="/path/to/ca-cert.pem"
)

Monitoring Cache Performance

Track cache hit rates to measure the value Redis is delivering:

from functools import wraps
import time

class CacheMonitor:
    def __init__(self, redis_client: redis.Redis):
        self.client = redis_client
        self.hits = 0
        self.misses = 0

    def get_stats(self) -> dict:
        info = self.client.info("stats")
        return {
            "keyspace_hits": info["keyspace_hits"],
            "keyspace_misses": info["keyspace_misses"],
            "hit_rate": info["keyspace_hits"] / max(
                info["keyspace_hits"] + info["keyspace_misses"], 1
            ),
            "used_memory_human": self.client.info("memory")["used_memory_human"]
        }

monitor = CacheMonitor(redis_client)

# Check stats periodically
stats = monitor.get_stats()
print(f"Hit rate: {stats['hit_rate']:.1%}")
print(f"Memory used: {stats['used_memory_human']}")

A well-configured semantic cache for a customer support chatbot should achieve a 30–60% hit rate, cutting LLM costs by the same proportion. For more on integrating this into a deployed system, see Deploy AI model to production.

Common Pitfalls

Setting score_threshold too low for semantic cache. A threshold near 0.0 means only nearly identical prompts match. You lose most of the value of semantic caching. Start at 0.2 and adjust based on your false-positive rate.

Not using TTLs on cached LLM responses. Facts change. A cached response about stock prices or news events becomes wrong quickly. Set appropriate TTLs based on how time-sensitive your content is.

Running Redis without persistence for vector stores. If Redis restarts without RDB or AOF persistence enabled, your vector index is gone. Enable persistence with appendonly yes in your Redis configuration or use Redis Stack with the default settings.

Forgetting to set decode_responses=False for vector data. When storing binary embeddings directly in Redis, make sure your connection is not decoding binary values as UTF-8 strings. Use separate clients: one with decode_responses=True for string data, one without for binary vector data.

Frequently Asked Questions

What is the difference between RedisCache and RedisSemanticCache in LangChain? RedisCache performs exact-match caching — it returns a cached response only when the input prompt is byte-for-byte identical to a previous one. RedisSemanticCache uses vector similarity to match semantically equivalent prompts, so "What is the capital of France?" and "Tell me the capital city of France" can both hit the same cached answer.

How does RedisVL differ from using Redis as a LangChain VectorStore? The LangChain Redis VectorStore integration is a high-level wrapper that manages index creation and similarity search through the LangChain interface. RedisVL (Redis Vector Library) is a lower-level Python library that gives you more control over index schemas, hybrid search (text + vector), and batch operations — useful when you need fine-grained control or high-throughput indexing.

Is Redis suitable for production RAG systems? Yes. Redis with the RediSearch module (built into Redis Stack and Redis Cloud) supports filtered vector search, full-text search, and sub-millisecond query latency at scale. Teams have deployed RAG systems with millions of documents on Redis Cloud. The main consideration is that Redis is an in-memory store, so large corpora require appropriately sized instances.

Share this article:Facebook Twitter/X LinkedIn Telegram WhatsApp

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

RedisCache performs exact-match caching — it returns a cached response only when the input prompt is byte-for-byte identical to a previous one. RedisSemanticCache uses vector similarity to match semantically equivalent prompts, so 'What is the capital of France?' and 'Tell me the capital city of France' can both hit the same cached answer.

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

search relevance ranking showing scores — LangChain advanced RAG retrieval strategies

Agent Development

10 LangChain Retrieval Strategies for Better RAG Results

Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.

May 31, 2026 13 min read

AI agent architecture with memory and tool connections — LangChain agent memory tools

Agent Development

Build a LangChain Agent with Memory and Tools (Full Example)

Build a complete LangChain conversational agent with persistent memory, multiple tools, and step-by-step trace — from setup to a production-ready implementation with code.

May 31, 2026 14 min read

developer coding AI agent decision loop — LangChain agent types ZeroShot ReAct Conversational

Agent Development

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

Understand every major LangChain agent type — ZeroShotAgent, ReAct, ConversationalAgent, and more — with Python code and agent trace walkthroughs.

May 31, 2026 13 min read

FastAPI server running LangChain endpoint — deploy LangChain FastAPI REST streaming

Agent Development

How to Deploy a LangChain App as a FastAPI REST Endpoint

Serve a LangChain app as a production FastAPI REST endpoint with streaming, async chains, error handling, and Docker deployment — full Python code included.

May 31, 2026 11 min read

Go deeper on this topic

InterviewSystem Design NotesAI Agent Development Notes NotesRAG: Retrieval-Augmented Generation Guide NotesPWA & Service Workers: Complete Guide BookAI Agent Development Guide BookBuilding AI Apps: Developer's Guide

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Langchain

How to Use LangChain with Redis (Cache and Vector Store)

⚡ Quick Answer

Use LangChain with Redis for low-latency AI responses. Covers RedisCache, RedisSemanticCache, RedisVL vector search, and a Redis vs alternatives comparison.

AiTechWorlds Team May 31, 2026 10 min read

#LangChain #Redis #caching #vector store #low-latency

📚Part of the Langchain guide — explore all Langchain articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

For context on how these patterns fit into a broader architecture, see the RAG system tutorial and the vector database guide.

Installation and Setup

pip install langchain langchain-openai langchain-community redis redisvl

You also need Redis Stack running locally or a Redis Cloud instance with RediSearch enabled:

# Docker — easiest way to get Redis Stack locally
docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack:latest

Verify the connection:

import redis

client = redis.Redis(host="localhost", port=6379, decode_responses=True)
print(client.ping())  # True
print(client.info("server")["redis_version"])  # e.g., "7.2.0"

Pattern 1: Exact-Match Caching with RedisCache

The simplest form of caching stores prompt/response pairs and returns the cached response when an identical prompt is received again.

import langchain
from langchain.cache import RedisCache
from langchain_openai import ChatOpenAI
import redis

# Set up the Redis cache globally
redis_client = redis.Redis(host="localhost", port=6379)
langchain.llm_cache = RedisCache(redis_=redis_client)

# All LLM calls now check the cache first
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

import time

# First call — goes to OpenAI
start = time.time()
response1 = llm.invoke("What is the boiling point of water?")
print(f"First call: {time.time() - start:.2f}s")
print(response1.content)

# Second call — served from Redis cache
start = time.time()
response2 = llm.invoke("What is the boiling point of water?")
print(f"Cached call: {time.time() - start:.4f}s")
# Cached call: 0.0012s — three orders of magnitude faster

You can set a TTL on cached responses to prevent stale data:

from langchain.cache import RedisCache

# Cache entries expire after 1 hour
langchain.llm_cache = RedisCache(
    redis_=redis_client,
    ttl=3600
)

Pattern 2: Semantic Caching with RedisSemanticCache

from langchain_community.cache import RedisSemanticCache
from langchain_openai import OpenAIEmbeddings
import langchain

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

langchain.llm_cache = RedisSemanticCache(
    redis_url="redis://localhost:6379",
    embedding=embeddings,
    score_threshold=0.2  # lower = stricter matching; 0.2 works well for factual questions
)

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# First call — embeds the prompt and stores the result
response1 = llm.invoke("What is the capital of France?")
print(response1.content)  # "The capital of France is Paris."

# Semantically equivalent prompt — hits the cache
response2 = llm.invoke("Tell me the capital city of France.")
print(response2.content)  # "The capital of France is Paris." — from cache!

# Completely different topic — cache miss, goes to OpenAI
response3 = llm.invoke("Who wrote Hamlet?")
print(response3.content)

Pattern 3: Redis as a LangChain Vector Store

Redis with the RediSearch module functions as a full-featured vector store for RAG systems. It supports filtered search, full-text search combined with vector search, and TTL-based document expiry.

from langchain_community.vectorstores import Redis
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create the vector store and add documents
texts = [
    "Redis is an in-memory data store with sub-millisecond latency.",
    "RediSearch adds full-text search and vector similarity search to Redis.",
    "Redis Stack bundles Redis with RediSearch, RedisJSON, and other modules.",
    "Redis Cloud offers managed Redis with high availability and automatic scaling.",
]

metadatas = [
    {"source": "redis_docs", "topic": "overview"},
    {"source": "redis_docs", "topic": "search"},
    {"source": "redis_docs", "topic": "stack"},
    {"source": "redis_docs", "topic": "cloud"},
]

vector_store = Redis.from_texts(
    texts=texts,
    embedding=embeddings,
    metadatas=metadatas,
    redis_url="redis://localhost:6379",
    index_name="langchain_docs"
)

# Simple similarity search
results = vector_store.similarity_search("What modules does Redis Stack include?", k=2)
for doc in results:
    print(f"[{doc.metadata['topic']}] {doc.page_content}")

Filtered Vector Search

One of Redis's key advantages over simpler vector stores is support for filtered search — you can combine metadata filters with vector similarity in a single query:

from langchain_community.vectorstores.redis.filters import RedisText, RedisNum

# Search only within documents tagged "search" or "stack"
results = vector_store.similarity_search(
    query="vector search capabilities",
    k=3,
    filter=RedisText("topic") == "search"
)

for doc in results:
    print(doc.page_content)

Loading an Existing Index

If you have already created a Redis index and want to reconnect to it:

# Reconnect to an existing index
vector_store = Redis.from_existing_index(
    embedding=embeddings,
    redis_url="redis://localhost:6379",
    index_name="langchain_docs",
    schema="redis_schema.yaml"  # optional: specify schema for typed fields
)

# Use as a retriever
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5}
)

Pattern 4: RedisVL for Advanced Index Control

When you need fine-grained control over your Redis vector index — custom field types, hybrid search scoring, or batch upserts — use RedisVL directly alongside LangChain:

from redisvl.index import SearchIndex
from redisvl.schema import IndexSchema
from redisvl.query import VectorQuery
from langchain_openai import OpenAIEmbeddings
import numpy as np

# Define the index schema
schema = IndexSchema.from_dict({
    "index": {
        "name": "products",
        "prefix": "product"
    },
    "fields": [
        {"name": "name", "type": "text"},
        {"name": "category", "type": "tag"},
        {"name": "price", "type": "numeric"},
        {
            "name": "description_embedding",
            "type": "vector",
            "attrs": {
                "algorithm": "HNSW",
                "dims": 1536,
                "distance_metric": "COSINE",
                "datatype": "FLOAT32"
            }
        }
    ]
})

# Create the index
index = SearchIndex(schema)
index.connect("redis://localhost:6379")
index.create(overwrite=True)

# Generate embeddings and insert documents
embeddings_model = OpenAIEmbeddings(model="text-embedding-3-small")

products = [
    {"name": "Widget Pro", "category": "tools", "price": 29.99,
     "description": "Professional-grade widget for precision work."},
    {"name": "Budget Widget", "category": "tools", "price": 9.99,
     "description": "Entry-level widget for casual users."},
    {"name": "Gadget Plus", "category": "electronics", "price": 49.99,
     "description": "Enhanced electronic gadget with smart features."},
]

for product in products:
    embedding = embeddings_model.embed_query(product["description"])
    index.load([{
        **product,
        "description_embedding": np.array(embedding, dtype=np.float32).tobytes()
    }])

# Hybrid search: vector similarity + category filter
query_text = "precision professional tool"
query_embedding = embeddings_model.embed_query(query_text)

query = VectorQuery(
    vector=query_embedding,
    vector_field_name="description_embedding",
    return_fields=["name", "category", "price"],
    filter_expression="@category:{tools}",
    num_results=3
)

results = index.query(query)
for result in results:
    print(f"{result['name']} ({result['category']}) - ${result['price']}")

Building a Complete RAG Pipeline with Redis

Here is a production-grade setup that uses Redis for both semantic caching and vector retrieval:

import langchain
from langchain_community.cache import RedisSemanticCache
from langchain_community.vectorstores import Redis
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

REDIS_URL = "redis://localhost:6379"

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Enable semantic caching globally
langchain.llm_cache = RedisSemanticCache(
    redis_url=REDIS_URL,
    embedding=embeddings,
    score_threshold=0.2
)

# Connect to the Redis vector store
vector_store = Redis.from_existing_index(
    embedding=embeddings,
    redis_url=REDIS_URL,
    index_name="knowledge_base"
)

retriever = vector_store.as_retriever(search_kwargs={"k": 4})

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_template("""
You are a helpful assistant. Answer based on the context below.

Context:
{context}

Question: {question}

Answer:
""")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    }
    | prompt
    | llm
    | StrOutputParser()
)

# First call — retrieves from Redis vector store, calls OpenAI
answer1 = rag_chain.invoke("How does Redis vector search work?")
print(answer1)

# Similar follow-up — hits the semantic cache, skips OpenAI entirely
answer2 = rag_chain.invoke("Explain how Redis does vector similarity search.")
print(answer2)  # Same answer, sub-millisecond latency

For a deeper understanding of how this retrieval chain connects to agent architectures, see Build AI agent with LangChain and the semantic search tutorial.

Redis vs Memcached vs DynamoDB for AI Caching

Feature	Redis	Memcached	DynamoDB
Vector search	Yes (RediSearch)	No	No (requires DAX)
Semantic cache	Yes (native)	No	No
Read latency	Sub-millisecond	Sub-millisecond	Single-digit ms
Write throughput	Very high	Very high	High (provisioned)
TTL support	Yes	Yes	Yes
Persistence	Optional (RDB/AOF)	None	Yes (always)
Max value size	512 MB	1 MB	400 KB
Horizontal scale	Redis Cluster	Yes	Automatic
Managed cloud	Redis Cloud	ElastiCache	AWS native
Cost model	Memory-based	Memory-based	Request-based

Configuring Redis for Production

A few configuration settings make a significant difference in production:

import redis

# Production connection with connection pooling and retry logic
pool = redis.ConnectionPool(
    host="your-redis-host",
    port=6379,
    password="your-password",
    max_connections=50,
    socket_timeout=5,
    socket_connect_timeout=5,
    retry_on_timeout=True
)

client = redis.Redis(connection_pool=pool)

# For TLS (Redis Cloud / ElastiCache)
client = redis.Redis(
    host="your-redis-host",
    port=6380,
    password="your-password",
    ssl=True,
    ssl_cert_reqs="required",
    ssl_ca_certs="/path/to/ca-cert.pem"
)

Monitoring Cache Performance

Track cache hit rates to measure the value Redis is delivering:

from functools import wraps
import time

class CacheMonitor:
    def __init__(self, redis_client: redis.Redis):
        self.client = redis_client
        self.hits = 0
        self.misses = 0

    def get_stats(self) -> dict:
        info = self.client.info("stats")
        return {
            "keyspace_hits": info["keyspace_hits"],
            "keyspace_misses": info["keyspace_misses"],
            "hit_rate": info["keyspace_hits"] / max(
                info["keyspace_hits"] + info["keyspace_misses"], 1
            ),
            "used_memory_human": self.client.info("memory")["used_memory_human"]
        }

monitor = CacheMonitor(redis_client)

# Check stats periodically
stats = monitor.get_stats()
print(f"Hit rate: {stats['hit_rate']:.1%}")
print(f"Memory used: {stats['used_memory_human']}")

Common Pitfalls

Not using TTLs on cached LLM responses. Facts change. A cached response about stock prices or news events becomes wrong quickly. Set appropriate TTLs based on how time-sensitive your content is.

Frequently Asked Questions

Share this article:Facebook Twitter/X LinkedIn Telegram WhatsApp

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

Agent Development

10 LangChain Retrieval Strategies for Better RAG Results

Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.

May 31, 2026 13 min read

Agent Development

Build a LangChain Agent with Memory and Tools (Full Example)

Build a complete LangChain conversational agent with persistent memory, multiple tools, and step-by-step trace — from setup to a production-ready implementation with code.

May 31, 2026 14 min read

Agent Development

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

Understand every major LangChain agent type — ZeroShotAgent, ReAct, ConversationalAgent, and more — with Python code and agent trace walkthroughs.

May 31, 2026 13 min read

Agent Development

How to Deploy a LangChain App as a FastAPI REST Endpoint

Serve a LangChain app as a production FastAPI REST endpoint with streaming, async chains, error handling, and Docker deployment — full Python code included.

May 31, 2026 11 min read

Go deeper on this topic

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

How to Use LangChain with Redis (Cache and Vector Store)

Installation and Setup

Pattern 1: Exact-Match Caching with RedisCache

Pattern 2: Semantic Caching with RedisSemanticCache

Pattern 3: Redis as a LangChain Vector Store

Filtered Vector Search

Loading an Existing Index

Pattern 4: RedisVL for Advanced Index Control

Building a Complete RAG Pipeline with Redis

Redis vs Memcached vs DynamoDB for AI Caching

Configuring Redis for Production

Monitoring Cache Performance

Common Pitfalls

Frequently Asked Questions

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

10 LangChain Retrieval Strategies for Better RAG Results

Build a LangChain Agent with Memory and Tools (Full Example)

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

How to Deploy a LangChain App as a FastAPI REST Endpoint

Go deeper on this topic

Get Free AI Notes Daily

How to Use LangChain with Redis (Cache and Vector Store)

Installation and Setup

Pattern 1: Exact-Match Caching with RedisCache

Pattern 2: Semantic Caching with RedisSemanticCache

Pattern 3: Redis as a LangChain Vector Store

Filtered Vector Search

Loading an Existing Index

Pattern 4: RedisVL for Advanced Index Control

Building a Complete RAG Pipeline with Redis

Redis vs Memcached vs DynamoDB for AI Caching

Configuring Redis for Production

Monitoring Cache Performance

Common Pitfalls

Frequently Asked Questions

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

10 LangChain Retrieval Strategies for Better RAG Results

Build a LangChain Agent with Memory and Tools (Full Example)

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

How to Deploy a LangChain App as a FastAPI REST Endpoint

Go deeper on this topic

Get Free AI Notes Daily