Memory: Conversation History & Summarization
LangChain Memory: Giving Agents Context Across Turns
Memory is what makes an agent a conversation partner rather than a one-shot query system. Without memory, the agent forgets everything the moment the conversation ends — or even between tool calls. This lesson covers the memory types LangChain provides and how to choose between them.
The Memory Problem
LLMs are stateless by default. Each API call has no knowledge of previous calls. Memory is the mechanism that persists context across turns:
# Without memory:
agent.invoke("My name is Alice") → "Hello Alice!"
agent.invoke("What's my name?") → "I don't know your name." ❌
# With memory:
agent.invoke("My name is Alice") → "Hello Alice!"
agent.invoke("What's my name?") → "Your name is Alice." ✓
Memory is implemented by injecting prior context into each new request — the LLM doesn't actually "remember" anything, it just has more context in each prompt.
Memory Types
1. Conversation Buffer Memory (Simple, Unlimited)
Stores every message. Simple, no information loss, but gets expensive as conversations grow:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
memory_key="chat_history", # Must match the MessagesPlaceholder variable name
return_messages=True # Return Message objects, not strings
)
# Manual usage
memory.save_context(
{"input": "My name is Alice"},
{"output": "Hello Alice! How can I help you?"}
)
# Later
print(memory.load_memory_variables({}))
# {'chat_history': [HumanMessage(content='My name is Alice'), AIMessage(content='Hello Alice!')]}
Integration with chains:
from langchain.chains import ConversationChain
conversation = ConversationChain(
llm=ChatOpenAI(model="gpt-4o"),
memory=ConversationBufferMemory()
)
# Each call automatically reads and updates memory
print(conversation.predict(input="Hi, my name is Alice."))
print(conversation.predict(input="What's my name?")) # Remembers: Alice
Limitation: Every turn adds more tokens. A 100-turn conversation might use 50,000 tokens in context just for history.
2. Conversation Buffer Window Memory
Keeps only the last N turns:
from langchain.memory import ConversationBufferWindowMemory
memory = ConversationBufferWindowMemory(
k=5, # Keep last 5 exchanges
memory_key="chat_history",
return_messages=True
)
When to use: When recent context matters more than older context (most chatbot scenarios). The "right" window size is 3-10 turns for most applications.
Limitation: Information mentioned early in the conversation is lost.
3. Conversation Summary Memory
Summarizes older turns instead of storing them verbatim:
from langchain.memory import ConversationSummaryMemory
memory = ConversationSummaryMemory(
llm=ChatOpenAI(model="gpt-4o-mini"), # Small model is fine for summarization
memory_key="chat_history",
return_messages=False # Returns summary as string, not messages
)
# As conversation grows, older parts get summarized
# Recent messages kept verbatim for detail
# Much lower token cost than buffer memory for long conversations
Best for: Long conversations where you need to track overall context without full history.
4. Conversation Summary Buffer Memory (Best of Both)
Keeps recent turns verbatim, summarizes older turns — the most balanced approach:
from langchain.memory import ConversationSummaryBufferMemory
memory = ConversationSummaryBufferMemory(
llm=ChatOpenAI(model="gpt-4o-mini"),
max_token_limit=2000, # When buffer exceeds this, summarize oldest messages
memory_key="chat_history",
return_messages=True
)
When to use: Production chatbots and agents where conversations can run long. The best default choice for most applications.
5. Entity Memory
Tracks facts about specific entities (people, places, concepts) mentioned in conversation:
from langchain.memory import ConversationEntityMemory
memory = ConversationEntityMemory(
llm=ChatOpenAI(model="gpt-4o-mini"),
memory_key="chat_history"
)
# As conversation happens, it builds a knowledge store about entities:
# "Alice": "Customer since 2022, Enterprise plan, interested in API features"
# "Project Titan": "Internal codename for Q4 product launch"
# The entity store is injected into each prompt
When to use: Customer service agents, personal assistants, anywhere facts about specific entities need to persist.
Persistent Memory (Across Sessions)
All the memory types above are in-memory — they reset when the process restarts. For production, persist to a database.
Redis for Session Memory
from langchain_community.chat_message_histories import RedisChatMessageHistory
from langchain.memory import ConversationBufferMemory
def get_session_memory(session_id: str) -> ConversationBufferMemory:
"""Get or create memory for a specific conversation session."""
message_history = RedisChatMessageHistory(
url=os.environ["REDIS_URL"],
session_id=session_id # Unique per user conversation
)
return ConversationBufferMemory(
chat_memory=message_history,
memory_key="chat_history",
return_messages=True
)
# Each user gets their own persistent memory
user_memory = get_session_memory(session_id="user_123_session_456")
PostgreSQL / SQLite Message History
from langchain_community.chat_message_histories import SQLChatMessageHistory
# SQLite (simple, development)
history = SQLChatMessageHistory(
session_id="user_123",
connection_string="sqlite:///chat_history.db"
)
# PostgreSQL (production)
history = SQLChatMessageHistory(
session_id="user_123",
connection_string=os.environ["DATABASE_URL"]
)
Memory with Agents (LangGraph Pattern)
Modern LangGraph agents handle memory through state:
from langgraph.graph import StateGraph, MessagesState
from langgraph.checkpoint.memory import MemorySaver
# MemorySaver keeps state in memory (development)
# Use PostgresSaver for production
memory = MemorySaver()
def build_agent_with_memory(llm, tools):
from langgraph.prebuilt import create_react_agent
return create_react_agent(
llm,
tools,
checkpointer=memory # Enables persistence
)
agent = build_agent_with_memory(llm, tools)
# Use thread_id to maintain session state
config = {"configurable": {"thread_id": "user_123_conv_456"}}
# First turn
agent.invoke({"messages": [("human", "My order number is ORD-789")]}, config=config)
# Second turn — agent remembers the order number
agent.invoke({"messages": [("human", "What's the status of my order?")]}, config=config)
Memory Strategy Guide
| Scenario | Recommended Memory |
|---|---|
| Short single-session chat | ConversationBufferMemory |
| Production chatbot | ConversationSummaryBufferMemory + Redis |
| Customer service (track customer facts) | Entity Memory + DB persistence |
| Long document Q&A sessions | Summary Memory (keeps total tokens bounded) |
| Agent needing multi-session continuity | LangGraph with PostgresSaver |
| Simple prototyping | ConversationBufferWindowMemory (k=5) |
Clearing and Managing Memory
# Clear all memory for a session
memory.clear()
# Check what's in memory
print(memory.load_memory_variables({}))
# For Redis-backed history
history.clear() # Deletes all messages for this session_id
Next lesson: Embeddings and semantic search — the technology that powers knowledge retrieval in agents.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises