Follow AiTechWorlds on LinkedIn for professional AI content!Follow Now →
18 minLesson 9 of 23
LangChain Mastery

Memory: Conversation History & Summarization

LangChain Memory: Giving Agents Context Across Turns

Memory is what makes an agent a conversation partner rather than a one-shot query system. Without memory, the agent forgets everything the moment the conversation ends — or even between tool calls. This lesson covers the memory types LangChain provides and how to choose between them.

The Memory Problem

LLMs are stateless by default. Each API call has no knowledge of previous calls. Memory is the mechanism that persists context across turns:

# Without memory:
agent.invoke("My name is Alice") → "Hello Alice!"
agent.invoke("What's my name?")  → "I don't know your name."  ❌

# With memory:
agent.invoke("My name is Alice") → "Hello Alice!"
agent.invoke("What's my name?")  → "Your name is Alice."  ✓

Memory is implemented by injecting prior context into each new request — the LLM doesn't actually "remember" anything, it just has more context in each prompt.

Memory Types

1. Conversation Buffer Memory (Simple, Unlimited)

Stores every message. Simple, no information loss, but gets expensive as conversations grow:

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(
    memory_key="chat_history",  # Must match the MessagesPlaceholder variable name
    return_messages=True        # Return Message objects, not strings
)

# Manual usage
memory.save_context(
    {"input": "My name is Alice"},
    {"output": "Hello Alice! How can I help you?"}
)

# Later
print(memory.load_memory_variables({}))
# {'chat_history': [HumanMessage(content='My name is Alice'), AIMessage(content='Hello Alice!')]}

Integration with chains:

from langchain.chains import ConversationChain

conversation = ConversationChain(
    llm=ChatOpenAI(model="gpt-4o"),
    memory=ConversationBufferMemory()
)

# Each call automatically reads and updates memory
print(conversation.predict(input="Hi, my name is Alice."))
print(conversation.predict(input="What's my name?"))  # Remembers: Alice

Limitation: Every turn adds more tokens. A 100-turn conversation might use 50,000 tokens in context just for history.

2. Conversation Buffer Window Memory

Keeps only the last N turns:

from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(
    k=5,  # Keep last 5 exchanges
    memory_key="chat_history",
    return_messages=True
)

When to use: When recent context matters more than older context (most chatbot scenarios). The "right" window size is 3-10 turns for most applications.

Limitation: Information mentioned early in the conversation is lost.

3. Conversation Summary Memory

Summarizes older turns instead of storing them verbatim:

from langchain.memory import ConversationSummaryMemory

memory = ConversationSummaryMemory(
    llm=ChatOpenAI(model="gpt-4o-mini"),  # Small model is fine for summarization
    memory_key="chat_history",
    return_messages=False  # Returns summary as string, not messages
)

# As conversation grows, older parts get summarized
# Recent messages kept verbatim for detail
# Much lower token cost than buffer memory for long conversations

Best for: Long conversations where you need to track overall context without full history.

4. Conversation Summary Buffer Memory (Best of Both)

Keeps recent turns verbatim, summarizes older turns — the most balanced approach:

from langchain.memory import ConversationSummaryBufferMemory

memory = ConversationSummaryBufferMemory(
    llm=ChatOpenAI(model="gpt-4o-mini"),
    max_token_limit=2000,    # When buffer exceeds this, summarize oldest messages
    memory_key="chat_history",
    return_messages=True
)

When to use: Production chatbots and agents where conversations can run long. The best default choice for most applications.

5. Entity Memory

Tracks facts about specific entities (people, places, concepts) mentioned in conversation:

from langchain.memory import ConversationEntityMemory

memory = ConversationEntityMemory(
    llm=ChatOpenAI(model="gpt-4o-mini"),
    memory_key="chat_history"
)

# As conversation happens, it builds a knowledge store about entities:
# "Alice": "Customer since 2022, Enterprise plan, interested in API features"
# "Project Titan": "Internal codename for Q4 product launch"

# The entity store is injected into each prompt

When to use: Customer service agents, personal assistants, anywhere facts about specific entities need to persist.

Persistent Memory (Across Sessions)

All the memory types above are in-memory — they reset when the process restarts. For production, persist to a database.

Redis for Session Memory

from langchain_community.chat_message_histories import RedisChatMessageHistory
from langchain.memory import ConversationBufferMemory

def get_session_memory(session_id: str) -> ConversationBufferMemory:
    """Get or create memory for a specific conversation session."""
    message_history = RedisChatMessageHistory(
        url=os.environ["REDIS_URL"],
        session_id=session_id  # Unique per user conversation
    )
    
    return ConversationBufferMemory(
        chat_memory=message_history,
        memory_key="chat_history",
        return_messages=True
    )

# Each user gets their own persistent memory
user_memory = get_session_memory(session_id="user_123_session_456")

PostgreSQL / SQLite Message History

from langchain_community.chat_message_histories import SQLChatMessageHistory

# SQLite (simple, development)
history = SQLChatMessageHistory(
    session_id="user_123",
    connection_string="sqlite:///chat_history.db"
)

# PostgreSQL (production)
history = SQLChatMessageHistory(
    session_id="user_123",
    connection_string=os.environ["DATABASE_URL"]
)

Memory with Agents (LangGraph Pattern)

Modern LangGraph agents handle memory through state:

from langgraph.graph import StateGraph, MessagesState
from langgraph.checkpoint.memory import MemorySaver

# MemorySaver keeps state in memory (development)
# Use PostgresSaver for production
memory = MemorySaver()

def build_agent_with_memory(llm, tools):
    from langgraph.prebuilt import create_react_agent
    
    return create_react_agent(
        llm,
        tools,
        checkpointer=memory  # Enables persistence
    )

agent = build_agent_with_memory(llm, tools)

# Use thread_id to maintain session state
config = {"configurable": {"thread_id": "user_123_conv_456"}}

# First turn
agent.invoke({"messages": [("human", "My order number is ORD-789")]}, config=config)

# Second turn — agent remembers the order number
agent.invoke({"messages": [("human", "What's the status of my order?")]}, config=config)

Memory Strategy Guide

ScenarioRecommended Memory
Short single-session chatConversationBufferMemory
Production chatbotConversationSummaryBufferMemory + Redis
Customer service (track customer facts)Entity Memory + DB persistence
Long document Q&A sessionsSummary Memory (keeps total tokens bounded)
Agent needing multi-session continuityLangGraph with PostgresSaver
Simple prototypingConversationBufferWindowMemory (k=5)

Clearing and Managing Memory

# Clear all memory for a session
memory.clear()

# Check what's in memory
print(memory.load_memory_variables({}))

# For Redis-backed history
history.clear()  # Deletes all messages for this session_id

Next lesson: Embeddings and semantic search — the technology that powers knowledge retrieval in agents.

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →
!