5 AutoGPT Long-Term Memory Strategies (Summarize, Forget)
Master AutoGPT long-term memory with 5 proven strategies: sliding window, summarization, selective retention, and more — avoid context explosion in autonomous agents.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Context explosion is the silent killer of long-running AutoGPT sessions. The agent starts a multi-hour task with perfect recall. An hour in, it's forgotten the original constraints. Two hours in, it's contradicting decisions it made earlier. Three hours in, it's consuming so many tokens that each step costs more than the last.
The root problem isn't that AutoGPT is forgetful — it's that the default memory behavior is to keep everything until the context window is full, then start losing things unpredictably from the beginning of the conversation. Deliberate memory management changes this from an accident into a choice.
Here are five strategies, from simplest to most sophisticated, with working code for each.
The Memory Problem in Concrete Terms
Before getting into solutions, understand the scale of the problem.
GPT-4o has a 128K token context window. That sounds large. In practice, an AutoGPT session with tool calls, results, and agent reasoning burns through it faster than you'd expect:
- Average tool call + result: ~500 tokens
- Average agent reasoning step: ~800 tokens
- System prompt + goal definition: ~500 tokens
A session with 100 steps consumes roughly 130,000 tokens — slightly over the limit. Most real tasks involve far more than 100 steps. Without memory management, you're guaranteed to hit the wall on anything non-trivial.
The ideal state is that the agent always knows: what its overall goal is, what it has already accomplished, what key facts it has discovered, and what decisions it has made. Everything else can be compressed or forgotten.
Strategy 1: Sliding Window Memory
The simplest approach. Keep only the N most recent messages in the active context. Discard everything older.
# memory/sliding_window.py
from typing import List, Dict
import json
class SlidingWindowMemory:
def __init__(self, window_size: int = 20, preserve_system: bool = True):
"""
window_size: Number of recent messages to keep in context
preserve_system: Always keep system/goal messages regardless of age
"""
self.window_size = window_size
self.preserve_system = preserve_system
self._messages: List[Dict] = []
def add_message(self, role: str, content: str, message_type: str = "normal"):
"""Add a message to memory."""
self._messages.append({
"role": role,
"content": content,
"type": message_type, # "system", "goal", "critical", "normal"
})
def get_context(self) -> List[Dict]:
"""Get current context window (role + content only, no internal metadata)."""
if self.preserve_system:
# Always include system/goal messages
protected = [m for m in self._messages if m["type"] in ("system", "goal")]
recent = [m for m in self._messages if m["type"] not in ("system", "goal")]
recent = recent[-self.window_size:]
all_msgs = protected + recent
else:
all_msgs = self._messages[-self.window_size:]
# Strip internal type metadata before returning
return [{"role": m["role"], "content": m["content"]} for m in all_msgs]
def get_stats(self) -> dict:
total_tokens_est = sum(len(m["content"].split()) * 1.3 for m in self._messages)
context_tokens_est = sum(
len(m["content"].split()) * 1.3 for m in self.get_context()
)
return {
"total_messages": len(self._messages),
"context_messages": len(self.get_context()),
"estimated_total_tokens": int(total_tokens_est),
"estimated_context_tokens": int(context_tokens_est),
"messages_dropped": max(0, len(self._messages) - len(self.get_context())),
}
When to use it: Short tasks with predictable step counts. When you don't care about the agent referencing early decisions. Development and prototyping.
Tradeoffs: Simple and zero-cost. But the agent may rediscover things it already found, contradict earlier decisions, or lose important context set up in early messages.
Strategy 2: Summarization Memory
Instead of dropping old messages, compress them into a rolling summary. The agent retains the gist of past work without the full token cost.
# memory/summarization_memory.py
from openai import OpenAI
from typing import List, Dict, Optional
import os
class SummarizationMemory:
def __init__(
self,
window_size: int = 15,
summarize_threshold: int = 30,
summary_model: str = "gpt-4o-mini",
):
self.window_size = window_size
self.summarize_threshold = summarize_threshold
self.summary_model = summary_model
self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
self._messages: List[Dict] = []
self._summary: Optional[str] = None
def add_message(self, role: str, content: str):
self._messages.append({"role": role, "content": content})
# Trigger summarization when buffer exceeds threshold
if len(self._messages) > self.summarize_threshold:
self._compress_old_messages()
def _compress_old_messages(self):
"""Summarize messages outside the current window."""
messages_to_compress = self._messages[:-self.window_size]
self._messages = self._messages[-self.window_size:]
# Prepare messages for summarization
text_to_summarize = "\n".join([
f"{m['role'].upper()}: {m['content'][:300]}"
for m in messages_to_compress
])
# Include existing summary if present
existing_context = ""
if self._summary:
existing_context = f"Previous summary:\n{self._summary}\n\n"
prompt = f"""{existing_context}Summarize the following agent conversation history.
Focus on:
1. What tasks were completed
2. Key facts and data discovered
3. Decisions made and their rationale
4. Files or outputs created
5. Current progress toward the main goal
Be specific about numbers, names, and critical details.
Maximum 300 words.
Conversation to summarize:
{text_to_summarize}"""
response = self.client.chat.completions.create(
model=self.summary_model,
messages=[{"role": "user", "content": prompt}],
max_tokens=400,
)
self._summary = response.choices[0].message.content
def get_context(self) -> List[Dict]:
"""Get current context including summary if available."""
messages = []
if self._summary:
messages.append({
"role": "system",
"content": f"[Conversation History Summary]\n{self._summary}"
})
messages.extend(self._messages[-self.window_size:])
return messages
def get_summary(self) -> Optional[str]:
return self._summary
When to use it: Tasks over 30-60 steps. Research workflows where the agent needs to remember findings from earlier steps. Any task where coherence across the full session matters.
Tradeoffs: Incurs additional LLM cost for summarization (use a cheap model like gpt-4o-mini to minimize this). Summaries lose nuance. Numbers and specific facts should be extracted before summarizing.
Strategy 3: Selective Retention Memory
Not all memories are equal. This strategy explicitly categorizes messages and applies different retention policies to each category.
# memory/selective_memory.py
from typing import List, Dict, Optional
from enum import Enum
import json
class MemoryTier(str, Enum):
PERMANENT = "permanent" # Never dropped (goals, critical decisions)
LONG_TERM = "long_term" # Kept for full session (key findings, facts)
WORKING = "working" # Recent context only (last N steps)
EPHEMERAL = "ephemeral" # Drop after single use (tool call intermediates)
class SelectiveMemory:
def __init__(self, working_window: int = 10):
self.working_window = working_window
self._permanent: List[Dict] = []
self._long_term: List[Dict] = []
self._working: List[Dict] = []
def add(self, role: str, content: str, tier: MemoryTier = MemoryTier.WORKING):
message = {"role": role, "content": content}
if tier == MemoryTier.PERMANENT:
self._permanent.append(message)
elif tier == MemoryTier.LONG_TERM:
self._long_term.append(message)
elif tier == MemoryTier.WORKING:
self._working.append(message)
# EPHEMERAL messages are not stored at all
def promote_to_long_term(self, content: str):
"""Promote a fact or finding to long-term memory."""
self._long_term.append({
"role": "system",
"content": f"[Retained Fact] {content}"
})
def get_context(self) -> List[Dict]:
"""Assemble context from all tiers."""
context = []
context.extend(self._permanent) # Always present
context.extend(self._long_term[-20:]) # Up to 20 long-term items
context.extend(self._working[-self.working_window:]) # Recent working memory
return context
def extract_and_retain_facts(self, text: str, llm_client, model: str = "gpt-4o-mini"):
"""Use LLM to extract important facts from a message and retain them."""
extraction_prompt = f"""Extract the most important facts from this text that should be
remembered for future reference. Focus on: specific numbers, names, URLs, file paths,
decisions made, constraints identified.
Return a JSON array of strings. Each string should be a single, specific fact.
Maximum 5 facts. Return [] if nothing important.
Text: {text[:1000]}"""
response = llm_client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": extraction_prompt}],
max_tokens=200,
)
try:
facts = json.loads(response.choices[0].message.content)
for fact in facts[:5]:
self.promote_to_long_term(fact)
except (json.JSONDecodeError, KeyError):
pass
When to use it: Complex research tasks where some outputs are critical facts and others are just process steps. Any task where the agent needs to reliably reference specific data points discovered early on.
Tradeoffs: Requires classifying messages at creation time. The automatic fact extraction adds cost. Works best when you can predict what types of information matter.
Strategy 4: External Vector Memory
Store all past agent outputs in a vector database. At each step, retrieve only the most relevant past context rather than keeping everything in the active window.
# memory/vector_memory.py
import os
import json
from typing import List, Dict, Optional
from openai import OpenAI
import numpy as np
class VectorMemory:
"""In-memory vector store (replace with Pinecone/Chroma for production)."""
def __init__(self, embedding_model: str = "text-embedding-3-small"):
self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
self.embedding_model = embedding_model
self.store: List[Dict] = []
def _embed(self, text: str) -> List[float]:
response = self.client.embeddings.create(
input=text[:8000],
model=self.embedding_model,
)
return response.data[0].embedding
def _cosine_similarity(self, a: List[float], b: List[float]) -> float:
a, b = np.array(a), np.array(b)
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
def store_memory(self, content: str, metadata: dict = None):
"""Store a piece of memory with its embedding."""
embedding = self._embed(content)
self.store.append({
"content": content,
"embedding": embedding,
"metadata": metadata or {},
})
def retrieve_relevant(
self,
query: str,
top_k: int = 5,
threshold: float = 0.7
) -> List[str]:
"""Retrieve memories most relevant to the current query."""
if not self.store:
return []
query_embedding = self._embed(query)
# Score all memories
scored = []
for item in self.store:
score = self._cosine_similarity(query_embedding, item["embedding"])
if score >= threshold:
scored.append((score, item["content"]))
# Return top_k by score
scored.sort(reverse=True, key=lambda x: x[0])
return [content for _, content in scored[:top_k]]
def build_memory_context(self, current_query: str) -> str:
"""Build a memory context string for injection into the prompt."""
relevant = self.retrieve_relevant(current_query)
if not relevant:
return ""
context_parts = ["[Relevant Memory Retrieval]"]
for i, memory in enumerate(relevant, 1):
context_parts.append(f"{i}. {memory[:200]}")
return "\n".join(context_parts)
For production, replace this with a proper vector database. The vector database guide covers Pinecone, Chroma, and Weaviate in detail — all three work well for this pattern.
When to use it: Long-running projects spanning multiple sessions. Agents that need to reference a large knowledge base. Research tasks where earlier findings are selectively relevant to later steps.
Tradeoffs: Higher cost — every store and retrieve operation calls the embedding API. Adds latency. Requires external infrastructure for production. But enables truly unlimited long-term memory.
Strategy 5: Hierarchical Memory
The most sophisticated approach. Memories are organized in tiers: immediate context, session summary, project history. Each tier feeds the one above it through progressive summarization.
# memory/hierarchical_memory.py
from openai import OpenAI
from typing import List, Dict, Optional
import os
import json
from datetime import datetime
class HierarchicalMemory:
def __init__(
self,
immediate_window: int = 10,
session_summary_interval: int = 25,
llm_model: str = "gpt-4o-mini",
):
self.immediate_window = immediate_window
self.session_summary_interval = session_summary_interval
self.llm_model = llm_model
self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# Three memory tiers
self.immediate: List[Dict] = [] # Last N messages
self.session_summaries: List[str] = [] # Periodic session summaries
self.project_memory: Dict = { # Cross-session persistent state
"goal": "",
"key_decisions": [],
"discovered_facts": [],
"completed_tasks": [],
"created_files": [],
}
def set_goal(self, goal: str):
"""Set the project goal — always kept in permanent memory."""
self.project_memory["goal"] = goal
def add_message(self, role: str, content: str):
"""Add a message and trigger compression if needed."""
self.immediate.append({"role": role, "content": content})
# Check if we need to create a session summary
if len(self.immediate) >= self.session_summary_interval:
self._create_session_summary()
def _create_session_summary(self):
"""Compress current immediate memory into a session summary."""
messages_text = "\n".join([
f"{m['role']}: {m['content'][:250]}"
for m in self.immediate[:-self.immediate_window]
])
if not messages_text:
return
prompt = f"""Create a concise summary of these agent actions.
Update the project state with any new information.
Current project state:
Goal: {self.project_memory['goal']}
Known facts: {json.dumps(self.project_memory['discovered_facts'][-5:])}
Completed tasks: {json.dumps(self.project_memory['completed_tasks'][-5:])}
New messages to summarize:
{messages_text}
Return JSON with:
{{
"summary": "2-3 sentence narrative of what happened",
"new_facts": ["fact1", "fact2"],
"new_completed_tasks": ["task1"],
"new_decisions": ["decision1"],
"new_files": ["file_path1"]
}}"""
try:
response = self.client.chat.completions.create(
model=self.llm_model,
messages=[{"role": "user", "content": prompt}],
max_tokens=400,
response_format={"type": "json_object"},
)
result = json.loads(response.choices[0].message.content)
# Update project memory
self.session_summaries.append(result.get("summary", ""))
self.project_memory["discovered_facts"].extend(result.get("new_facts", []))
self.project_memory["completed_tasks"].extend(result.get("new_completed_tasks", []))
self.project_memory["key_decisions"].extend(result.get("new_decisions", []))
self.project_memory["created_files"].extend(result.get("new_files", []))
# Keep only recent immediate messages
self.immediate = self.immediate[-self.immediate_window:]
except Exception as e:
print(f"Warning: Memory compression failed: {e}")
# Fall back to simple truncation
self.immediate = self.immediate[-self.immediate_window:]
def get_context(self) -> List[Dict]:
"""Assemble full context from all memory tiers."""
context = []
# Project goal (always first)
if self.project_memory["goal"]:
context.append({
"role": "system",
"content": f"Project Goal: {self.project_memory['goal']}"
})
# Project state
state_parts = []
if self.project_memory["discovered_facts"]:
facts = self.project_memory["discovered_facts"][-10:]
state_parts.append(f"Known facts: {'; '.join(facts)}")
if self.project_memory["completed_tasks"]:
tasks = self.project_memory["completed_tasks"][-5:]
state_parts.append(f"Completed: {'; '.join(tasks)}")
if self.project_memory["created_files"]:
files = self.project_memory["created_files"][-5:]
state_parts.append(f"Created files: {', '.join(files)}")
if state_parts:
context.append({
"role": "system",
"content": "[Project State]\n" + "\n".join(state_parts)
})
# Recent session summaries (last 3)
if self.session_summaries:
recent_summaries = self.session_summaries[-3:]
context.append({
"role": "system",
"content": "[Recent History]\n" + "\n".join(recent_summaries)
})
# Immediate context
context.extend(self.immediate)
return context
def save_to_disk(self, path: str):
"""Persist project memory across sessions."""
state = {
"project_memory": self.project_memory,
"session_summaries": self.session_summaries,
"saved_at": datetime.utcnow().isoformat(),
}
with open(path, "w") as f:
json.dump(state, f, indent=2)
def load_from_disk(self, path: str):
"""Resume project memory from a previous session."""
with open(path) as f:
state = json.load(f)
self.project_memory = state["project_memory"]
self.session_summaries = state["session_summaries"]
print(f"Loaded memory from session saved at {state['saved_at']}")
Strategy Comparison
| Strategy | Token Cost | Coherence | Complexity | Best For |
|---|---|---|---|---|
| Sliding Window | Zero | Low — loses early context | Minimal | Short tasks, prototyping |
| Summarization | Low | Medium — retains gist | Low | Medium tasks, 30-100 steps |
| Selective Retention | Low | High for tagged content | Medium | Tasks with known critical facts |
| Vector Memory | Medium | High — semantic retrieval | High | Multi-session, large knowledge base |
| Hierarchical | Medium | Highest — structured state | High | Complex long-running projects |
According to the AI agent memory and planning research, agents using hierarchical memory strategies outperform sliding window approaches by 34% on task completion for sessions exceeding 50 steps — the summarization overhead is more than offset by the reduction in contradictory actions.
Implementing Memory in an AutoGen Agent
import autogen
from memory.hierarchical_memory import HierarchicalMemory
memory = HierarchicalMemory(immediate_window=10, session_summary_interval=25)
memory.set_goal("Research and write a comprehensive report on battery technology advances in 2026")
# If resuming a session:
# memory.load_from_disk("project_memory.json")
# Inject memory context into each agent interaction
def get_messages_with_memory(user_message: str) -> list:
memory.add_message("user", user_message)
context = memory.get_context()
context.append({"role": "user", "content": user_message})
return context
# After each agent response:
def store_agent_response(response: str):
memory.add_message("assistant", response)
memory.save_to_disk("project_memory.json") # Auto-save
The right memory strategy depends on task length and complexity. For anything under 20-30 steps, sliding window is fine. For research projects spanning hours or multiple sessions, hierarchical memory is worth the additional setup. The pattern connects naturally to Build AI agent with LangChain where LangChain's conversation memory classes implement very similar approaches with a different API.
The goal in every case is the same: the agent should always know where it is, where it came from, and where it's going — even when the raw conversation history is too long to fit in context.
Frequently Asked Questions
Why does AutoGPT lose important information during long tasks? AutoGPT, like all LLM-based agents, has a fixed context window. When a session exceeds this limit, older content is truncated or compressed. Without deliberate memory management, important early context (goals, constraints, partial results) gets pushed out by newer, less important content.
What is the difference between AutoGPT's working memory and long-term memory? Working memory is the active context window — what the agent is currently reasoning over. Long-term memory is persistent storage outside the context window, typically a vector database or file system. The agent retrieves from long-term memory as needed, treating it like a searchable external knowledge base.
Does summarization lose important details? Yes — summarization always involves information loss. The key is controlling what gets summarized and what gets preserved verbatim. Critical facts (names, numbers, specific decisions) should be extracted and stored explicitly before summarization. The summary captures flow and context; the extracted facts preserve precision.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
How to Use AutoGen with Milvus (Vector Database Memory)
Integrate Milvus vector database with AutoGen agents for large-scale persistent memory. Full setup guide with LangChain integration and vector DB comparison table.
10 AutoGPT Command Line Arguments (Continuous Mode, Speak)
Complete reference for AutoGPT's 10 most powerful CLI arguments. Master continuous mode, headless operation, and CI/CD integration for automated agent workflows.
10 AutoGPT Configuration Tweaks for Better Performance
10 proven AutoGPT configuration tweaks to improve speed, cut costs, and boost task success. Model selection, temperature, token limits, and workspace settings.
Build a Content Research Agent with AutoGPT (Trends, Outlines)
Build an AutoGPT content research agent that finds trending topics, analyzes SERPs, and generates SEO-ready outlines automatically — full workflow inside.