How do agents plan multi-step tasks?

Agent planning approaches: ReAct (plan on-the-fly) — no upfront plan; the agent reasons about each step based on current context. Simple but can lose track of overall goal. Plan-and-Execute — first generate a complete task plan, then execute each step. Better for long tasks; risk is plan becomes invalid partway through. Tree of Thoughts — generate multiple potential plan branches, evaluate each, pick the best path. Better quality but more expensive. LLM-as-planner with tools — use a separate 'planning' call to generate a structured task list (JSON), then execute each task. Provides structure while allowing flexibility. For production: hybrid approach — generate an initial plan, but allow replanning at key checkpoints if earlier steps produce unexpected results.

What is working memory for AI agents?

Working memory is the agent's active context window — everything in the current prompt: system instructions, conversation history, tool results, retrieved memories. It's the agent's 'desk' — what it's actively working with right now. Working memory management is critical because: context windows have limits (even 128K tokens fills up); older information gets 'pushed off' as new information is added; cost scales with context size; attention quality degrades over very long contexts ('lost in the middle' problem). Strategies: prioritize recent and relevant content, summarize completed sub-tasks, use compression (summarize old tool results), extract key facts to persistent storage before removing from context.

How does an agent maintain state across long tasks?

Long-task state management requires explicit persistence: Task state object — structured data (JSON/TypedDict) tracking: current step, completed steps, intermediate results, errors encountered, remaining tasks. Checkpoint saves — save state to a database after each step. If the agent crashes or needs to restart, it can resume from the last checkpoint. Scratchpad pattern — maintain a 'notes' field in state where the agent explicitly records key findings and decisions. The LLM reads the scratchpad instead of re-reading all previous tool results. Summary compression — after N steps, summarize completed work into a compact format and discard the verbose original logs. This keeps context manageable over 20+ step tasks.

What is the difference between short-term and long-term agent memory?

Short-term memory: in-context (current session, current conversation). Fast to access, limited to context window, disappears when session ends. Long-term memory: persisted outside the model. Requires explicit storage (database) and retrieval. Types: episodic (what happened in past conversations), semantic (known facts about users/domain), procedural (how to handle specific situations). Retrieval: embedding-based (semantic similarity to current query), recency-based (most recent N entries), or explicit key-value lookup. The challenge: deciding what to store in long-term memory and what to retrieve in context at any given moment. Too much retrieval = irrelevant noise; too little = agent doesn't remember important things.

AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

AI agent workflow automation on development screen — ai agent memory and planning ai agent memory planning

Agent Development

AI Agent Memory and Planning: How Agents Remember and Reason About Long Tasks

⚡ Quick Answer

AI agent memory and planning explained — how agents store context across sessions, plan multi-step tasks, and use working memory, episodic memory, and semantic memory effectively.

AiTechWorlds Team May 27, 2026 8 min read

#ai-agent-memory-planning #agent-memory-systems #agent-planning #agent-development

📚Part of the Agent Development guide — explore all Agent Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

AI Agent Memory and Planning: How Agents Remember and Reason About Long Tasks

An agent that forgets everything between sessions is a chatbot with extra steps. An agent that can't maintain coherent state across a 20-step task is an expensive random walk.

Memory and planning are what separate toy agents from genuinely useful ones. I've rebuilt the memory system in production agents three times — each time learning what actually matters. This guide covers what works.

The Four Types of Agent Memory

┌──────────────────────────────────────────────────────┐
│              Agent Memory Architecture               │
├──────────────────────────────────────────────────────┤
│                                                      │
│  WORKING MEMORY (Context Window)                     │
│  ─────────────────────────────                       │
│  Current messages, tool results, active state        │
│  Capacity: 128K-200K tokens                          │
│  Persistence: None (lost when session ends)          │
│                                                      │
│  EPISODIC MEMORY (What happened)                     │
│  ────────────────────────────────                    │
│  Past conversations, completed tasks, interactions   │
│  Capacity: Unlimited (database)                      │
│  Persistence: Permanent                              │
│  Retrieval: Semantic search or recency               │
│                                                      │
│  SEMANTIC MEMORY (What I know)                       │
│  ──────────────────────────────                      │
│  Facts, user preferences, domain knowledge          │
│  Capacity: Unlimited (vector database)               │
│  Persistence: Permanent (with updates)               │
│  Retrieval: Embedding similarity                     │
│                                                      │
│  PROCEDURAL MEMORY (How to do things)                │
│  ────────────────────────────────────                │
│  System prompt, retrieved how-to documents           │
│  Persistence: Static or dynamically retrieved        │
└──────────────────────────────────────────────────────┘

Part 1: Working Memory Management

import tiktoken
from openai import OpenAI

client = OpenAI()
enc = tiktoken.encoding_for_model("gpt-4o")

class WorkingMemory:
    """Manages the active context window for an agent."""
    
    def __init__(self, max_tokens: int = 100000, reserved_output: int = 4000):
        self.messages = []
        self.max_tokens = max_tokens - reserved_output
        self.system_prompt = ""
        self._system_tokens = 0
    
    def set_system_prompt(self, prompt: str):
        self.system_prompt = prompt
        self._system_tokens = len(enc.encode(prompt))
    
    def count_tokens(self, messages: list) -> int:
        total = self._system_tokens
        for msg in messages:
            if isinstance(msg.get("content"), str):
                total += len(enc.encode(msg["content"])) + 4
        return total
    
    def add_message(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})
        self._maybe_compress()
    
    def _maybe_compress(self):
        """Compress messages when approaching context limit."""
        while self.count_tokens(self.messages) > self.max_tokens:
            if len(self.messages) <= 4:  # Keep at least 2 exchanges
                break
            
            # Strategy: summarize the oldest non-critical messages
            # Simple: just remove oldest tool result (often verbose)
            for i, msg in enumerate(self.messages):
                if msg["role"] == "tool" and i < len(self.messages) - 4:
                    # Replace with compressed version
                    original = msg["content"]
                    self.messages[i] = {
                        "role": "tool",
                        "content": f"[Tool result summarized — {len(original)} chars]"
                    }
                    break
            else:
                # If no tool results to compress, remove oldest message pair
                if len(self.messages) > 4:
                    self.messages.pop(0)
    
    def get_context(self) -> list:
        return [{"role": "system", "content": self.system_prompt}] + self.messages
    
    def summarize_and_reset(self) -> str:
        """Summarize current context and reset working memory."""
        if not self.messages:
            return ""
        
        # Ask LLM to summarize the conversation
        summary_response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "Summarize the key facts, decisions, and outcomes from this agent interaction in 200 words or less."},
                *self.messages
            ]
        )
        
        summary = summary_response.choices[0].message.content
        
        # Reset with summary as context
        self.messages = [
            {
                "role": "system",
                "content": f"Previous context summary: {summary}"
            }
        ]
        
        return summary

Part 2: Episodic Memory with Semantic Search

import json
from datetime import datetime
import chromadb
from openai import OpenAI

client = OpenAI()

class EpisodicMemory:
    """Store and retrieve past agent interactions."""
    
    def __init__(self, collection_name: str = "agent_episodes"):
        self.chroma = chromadb.PersistentClient(path="./agent_memory")
        self.collection = self.chroma.get_or_create_collection(collection_name)
    
    def _embed(self, text: str) -> list[float]:
        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=[text]
        )
        return response.data[0].embedding
    
    def store_episode(
        self,
        session_id: str,
        user_message: str,
        agent_response: str,
        tools_used: list[str] | None = None,
        outcome: str = "success"
    ):
        """Store a completed interaction."""
        
        episode_text = f"User: {user_message}\nAgent: {agent_response}"
        
        self.collection.upsert(
            ids=[f"{session_id}_{datetime.now().timestamp()}"],
            embeddings=[self._embed(episode_text)],
            documents=[episode_text],
            metadatas=[{
                "session_id": session_id,
                "timestamp": datetime.now().isoformat(),
                "tools_used": json.dumps(tools_used or []),
                "outcome": outcome,
                "user_message_short": user_message[:100]
            }]
        )
    
    def retrieve_relevant(self, current_query: str, top_k: int = 3) -> list[dict]:
        """Find past interactions similar to current query."""
        
        query_emb = self._embed(current_query)
        
        results = self.collection.query(
            query_embeddings=[query_emb],
            n_results=top_k,
            include=["documents", "metadatas", "distances"]
        )
        
        episodes = []
        for doc, meta, distance in zip(
            results["documents"][0],
            results["metadatas"][0],
            results["distances"][0]
        ):
            if distance < 0.5:  # Only return highly relevant episodes
                episodes.append({
                    "episode": doc[:500],  # Truncate for context
                    "timestamp": meta["timestamp"],
                    "outcome": meta["outcome"],
                    "relevance": 1 - distance
                })
        
        return episodes
    
    def format_for_context(self, episodes: list[dict]) -> str:
        """Format retrieved episodes for injection into agent context."""
        if not episodes:
            return ""
        
        parts = ["Relevant past interactions:"]
        for ep in episodes:
            date = ep["timestamp"][:10]
            parts.append(f"\n[{date}, {ep['outcome']}] {ep['episode'][:200]}")
        
        return "\n".join(parts)

Part 3: Planning Patterns

Plan-and-Execute

from pydantic import BaseModel
from typing import List

class TaskPlan(BaseModel):
    goal: str
    tasks: List[str]
    success_criteria: str

def generate_plan(goal: str) -> TaskPlan:
    """Generate a structured task plan before execution."""
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": """Create a step-by-step plan to accomplish the goal.
                Return JSON with:
                - goal: restated goal
                - tasks: list of specific, actionable tasks (5-10 steps)
                - success_criteria: how to know the goal is accomplished
                
                Tasks should be concrete actions, not vague steps."""
            },
            {"role": "user", "content": f"Goal: {goal}"}
        ],
        response_format={"type": "json_object"}
    )
    
    plan_data = json.loads(response.choices[0].message.content)
    return TaskPlan(**plan_data)

class PlanExecuteAgent:
    def __init__(self, tools: list, model: str = "gpt-4o-mini"):
        self.tools = tools
        self.model = model
        self.working_memory = WorkingMemory()
        self.episodic_memory = EpisodicMemory()
        self.current_plan: TaskPlan | None = None
        self.completed_tasks: list[str] = []
        self.task_results: list[str] = []
    
    def run(self, goal: str) -> str:
        # 1. Generate plan
        self.current_plan = generate_plan(goal)
        print(f"Plan created: {len(self.current_plan.tasks)} tasks")
        
        # 2. Retrieve relevant memories
        past_episodes = self.episodic_memory.retrieve_relevant(goal)
        memory_context = self.episodic_memory.format_for_context(past_episodes)
        
        # 3. Set up working memory
        self.working_memory.set_system_prompt(f"""You are executing a plan to: {goal}

Plan:
{chr(10).join(f'{i+1}. {task}' for i, task in enumerate(self.current_plan.tasks))}

Success criteria: {self.current_plan.success_criteria}

{memory_context}

Complete each task in order. Mark tasks as complete when done.""")
        
        # 4. Execute tasks
        for i, task in enumerate(self.current_plan.tasks):
            print(f"\nExecuting task {i+1}: {task}")
            
            # Check if re-planning is needed
            if self._should_replan(task):
                print("Replanning based on intermediate results...")
                new_plan = generate_plan(f"{goal} (previously attempted, now adjusting based on: {self.task_results[-1][:200]})")
                self.current_plan = new_plan
                break
            
            result = self._execute_task(task, i)
            self.completed_tasks.append(task)
            self.task_results.append(result)
        
        # 5. Generate final output
        final_response = self._synthesize_results(goal)
        
        # 6. Store in episodic memory
        self.episodic_memory.store_episode(
            session_id="session_001",
            user_message=goal,
            agent_response=final_response[:500],
            tools_used=[t.__name__ for t in self.tools],
            outcome="success"
        )
        
        return final_response
    
    def _should_replan(self, next_task: str) -> bool:
        """Check if earlier results suggest the plan should change."""
        if not self.task_results:
            return False
        
        last_result = self.task_results[-1]
        check_response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "user",
                "content": f"Given this result: '{last_result[:200]}'\n"
                           f"Does the next planned task still make sense: '{next_task}'?\n"
                           f"Answer only: yes or no"
            }]
        )
        return "no" in check_response.choices[0].message.content.lower()
    
    def _execute_task(self, task: str, step_index: int) -> str:
        """Execute a single task."""
        context = f"Completed {step_index} of {len(self.current_plan.tasks)} tasks."
        if self.task_results:
            context += f"\nLast result: {self.task_results[-1][:300]}"
        
        self.working_memory.add_message("user", f"{context}\n\nNow execute: {task}")
        
        response = client.chat.completions.create(
            model=self.model,
            messages=self.working_memory.get_context()
        )
        
        result = response.choices[0].message.content
        self.working_memory.add_message("assistant", result)
        
        return result
    
    def _synthesize_results(self, goal: str) -> str:
        results_text = "\n\n".join([
            f"Task: {task}\nResult: {result[:300]}"
            for task, result in zip(self.completed_tasks, self.task_results)
        ])
        
        response = client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": "Synthesize the task results into a coherent final answer."},
                {"role": "user", "content": f"Goal: {goal}\n\nTask Results:\n{results_text}"}
            ]
        )
        
        return response.choices[0].message.content

Part 4: Semantic Memory for User Preferences

class SemanticMemory:
    """Store learned facts about users and domain."""
    
    def __init__(self):
        self.chroma = chromadb.PersistentClient(path="./agent_memory")
        self.facts = self.chroma.get_or_create_collection("semantic_facts")
    
    def store_fact(self, fact: str, category: str, subject: str = "user"):
        embedding = self._embed(fact)
        doc_id = f"{subject}_{category}_{hash(fact)}"
        
        self.facts.upsert(
            ids=[doc_id],
            embeddings=[embedding],
            documents=[fact],
            metadatas={
                "category": category,
                "subject": subject,
                "stored_at": datetime.now().isoformat()
            }
        )
    
    def recall(self, query: str, top_k: int = 5) -> list[str]:
        query_emb = self._embed(query)
        results = self.facts.query(
            query_embeddings=[query_emb],
            n_results=top_k
        )
        return results["documents"][0] if results["documents"] else []
    
    def extract_and_store_facts(self, conversation: str):
        """Use LLM to extract memorable facts from conversation."""
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "system",
                    "content": """Extract facts worth remembering from this conversation.
                    Return JSON: {"facts": [{"fact": "...", "category": "preference/project/personal/technical"}]}
                    Only include facts that would be useful in future conversations.
                    Return empty list if nothing worth remembering."""
                },
                {"role": "user", "content": conversation}
            ],
            response_format={"type": "json_object"}
        )
        
        data = json.loads(response.choices[0].message.content)
        for item in data.get("facts", []):
            self.store_fact(item["fact"], item["category"])

Conclusion

Memory and planning are what separate agents that can handle complex, multi-session tasks from those that can only respond to single prompts. The working memory management, episodic retrieval, and structured planning patterns here are the foundation of production agent systems.

The key insight: context is the most precious resource in an agent. Every token not used wisely is a token that could hold a more relevant fact, a more useful tool result, or a better plan.

For the graph-based framework that makes these patterns composable, see our LangGraph tutorial. For building specialized research agents with these memory systems, see our AI research agent guide.

Frequently Asked Questions

AI agents have four memory types, analogous to human memory: Sensory/Working memory — the current context window (messages + tool results in the active prompt). Limited to model's context window (128K-200K tokens). Episodic memory — records of past interactions stored in a database. Retrieved and injected into context when relevant. Short-term episodic: last N conversations. Long-term episodic: semantic search over all past interactions. Semantic memory — general facts and knowledge. For agents, this is often a vector database of domain knowledge, user preferences, or learned facts. Procedural memory — how to do tasks. Encoded in the system prompt (instructions) or retrieved as 'how-to' documents via RAG.

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

AI agent workflow automation on development screen — ai agents explained

AI Learning

🔥 Trending

AI Agents Explained: How Autonomous AI Systems Work and What They Can Do

AI agents explained — how autonomous AI systems perceive, reason, and act to complete complex tasks, the architectures powering them, and practical examples from ReAct to LangGraph.

May 27, 2026 7 min read

AI agent workflow automation on development screen — ai agents and the future of work ai agents future work

AI Learning

AI Agents and the Future of Work: What's Actually Changing in 2025-2030

AI agents and the future of work — what tasks are being automated, which jobs are transforming, and what skills matter most as autonomous agents reshape knowledge work.

May 27, 2026 9 min read

AI agent workflow automation on development screen — will ai agents replace software developers

AI Learning

🔥 Trending

Will AI Agents Replace Software Developers? The Honest Technical Analysis

Will AI agents replace software developers? An honest technical analysis of what AI agents can and can't do, current limitations, and what skills remain uniquely human in 2025.

May 27, 2026 8 min read

AI agent workflow automation on development screen — build a research agent ai research agent build

AI Learning

Build a Research Agent: End-to-End Autonomous Research Tool in Python

Build a complete AI research agent in Python — web search, source validation, synthesis, and report generation. Production patterns with LangGraph and real code.

May 27, 2026 10 min read

Go deeper on this topic

NotesPrompt Engineering Cheat Sheet NotesLLM Core Concepts Explained NotesChatGPT Tips & Tricks Cheat Sheet NotesTransformer Architecture Cheat Sheet NotesPrompt Engineering vs Fine-Tuning vs RLHF NotesRAG: Retrieval-Augmented Generation Guide

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Agent Development

AI Agent Memory and Planning: How Agents Remember and Reason About Long Tasks

⚡ Quick Answer

AI agent memory and planning explained — how agents store context across sessions, plan multi-step tasks, and use working memory, episodic memory, and semantic memory effectively.

AiTechWorlds Team May 27, 2026 8 min read

#ai-agent-memory-planning #agent-memory-systems #agent-planning #agent-development

📚Part of the Agent Development guide — explore all Agent Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

AI Agent Memory and Planning: How Agents Remember and Reason About Long Tasks

An agent that forgets everything between sessions is a chatbot with extra steps. An agent that can't maintain coherent state across a 20-step task is an expensive random walk.

The Four Types of Agent Memory

┌──────────────────────────────────────────────────────┐
│              Agent Memory Architecture               │
├──────────────────────────────────────────────────────┤
│                                                      │
│  WORKING MEMORY (Context Window)                     │
│  ─────────────────────────────                       │
│  Current messages, tool results, active state        │
│  Capacity: 128K-200K tokens                          │
│  Persistence: None (lost when session ends)          │
│                                                      │
│  EPISODIC MEMORY (What happened)                     │
│  ────────────────────────────────                    │
│  Past conversations, completed tasks, interactions   │
│  Capacity: Unlimited (database)                      │
│  Persistence: Permanent                              │
│  Retrieval: Semantic search or recency               │
│                                                      │
│  SEMANTIC MEMORY (What I know)                       │
│  ──────────────────────────────                      │
│  Facts, user preferences, domain knowledge          │
│  Capacity: Unlimited (vector database)               │
│  Persistence: Permanent (with updates)               │
│  Retrieval: Embedding similarity                     │
│                                                      │
│  PROCEDURAL MEMORY (How to do things)                │
│  ────────────────────────────────────                │
│  System prompt, retrieved how-to documents           │
│  Persistence: Static or dynamically retrieved        │
└──────────────────────────────────────────────────────┘

Part 1: Working Memory Management

import tiktoken
from openai import OpenAI

client = OpenAI()
enc = tiktoken.encoding_for_model("gpt-4o")

class WorkingMemory:
    """Manages the active context window for an agent."""
    
    def __init__(self, max_tokens: int = 100000, reserved_output: int = 4000):
        self.messages = []
        self.max_tokens = max_tokens - reserved_output
        self.system_prompt = ""
        self._system_tokens = 0
    
    def set_system_prompt(self, prompt: str):
        self.system_prompt = prompt
        self._system_tokens = len(enc.encode(prompt))
    
    def count_tokens(self, messages: list) -> int:
        total = self._system_tokens
        for msg in messages:
            if isinstance(msg.get("content"), str):
                total += len(enc.encode(msg["content"])) + 4
        return total
    
    def add_message(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})
        self._maybe_compress()
    
    def _maybe_compress(self):
        """Compress messages when approaching context limit."""
        while self.count_tokens(self.messages) > self.max_tokens:
            if len(self.messages) <= 4:  # Keep at least 2 exchanges
                break
            
            # Strategy: summarize the oldest non-critical messages
            # Simple: just remove oldest tool result (often verbose)
            for i, msg in enumerate(self.messages):
                if msg["role"] == "tool" and i < len(self.messages) - 4:
                    # Replace with compressed version
                    original = msg["content"]
                    self.messages[i] = {
                        "role": "tool",
                        "content": f"[Tool result summarized — {len(original)} chars]"
                    }
                    break
            else:
                # If no tool results to compress, remove oldest message pair
                if len(self.messages) > 4:
                    self.messages.pop(0)
    
    def get_context(self) -> list:
        return [{"role": "system", "content": self.system_prompt}] + self.messages
    
    def summarize_and_reset(self) -> str:
        """Summarize current context and reset working memory."""
        if not self.messages:
            return ""
        
        # Ask LLM to summarize the conversation
        summary_response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "Summarize the key facts, decisions, and outcomes from this agent interaction in 200 words or less."},
                *self.messages
            ]
        )
        
        summary = summary_response.choices[0].message.content
        
        # Reset with summary as context
        self.messages = [
            {
                "role": "system",
                "content": f"Previous context summary: {summary}"
            }
        ]
        
        return summary

Part 2: Episodic Memory with Semantic Search

import json
from datetime import datetime
import chromadb
from openai import OpenAI

client = OpenAI()

class EpisodicMemory:
    """Store and retrieve past agent interactions."""
    
    def __init__(self, collection_name: str = "agent_episodes"):
        self.chroma = chromadb.PersistentClient(path="./agent_memory")
        self.collection = self.chroma.get_or_create_collection(collection_name)
    
    def _embed(self, text: str) -> list[float]:
        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=[text]
        )
        return response.data[0].embedding
    
    def store_episode(
        self,
        session_id: str,
        user_message: str,
        agent_response: str,
        tools_used: list[str] | None = None,
        outcome: str = "success"
    ):
        """Store a completed interaction."""
        
        episode_text = f"User: {user_message}\nAgent: {agent_response}"
        
        self.collection.upsert(
            ids=[f"{session_id}_{datetime.now().timestamp()}"],
            embeddings=[self._embed(episode_text)],
            documents=[episode_text],
            metadatas=[{
                "session_id": session_id,
                "timestamp": datetime.now().isoformat(),
                "tools_used": json.dumps(tools_used or []),
                "outcome": outcome,
                "user_message_short": user_message[:100]
            }]
        )
    
    def retrieve_relevant(self, current_query: str, top_k: int = 3) -> list[dict]:
        """Find past interactions similar to current query."""
        
        query_emb = self._embed(current_query)
        
        results = self.collection.query(
            query_embeddings=[query_emb],
            n_results=top_k,
            include=["documents", "metadatas", "distances"]
        )
        
        episodes = []
        for doc, meta, distance in zip(
            results["documents"][0],
            results["metadatas"][0],
            results["distances"][0]
        ):
            if distance < 0.5:  # Only return highly relevant episodes
                episodes.append({
                    "episode": doc[:500],  # Truncate for context
                    "timestamp": meta["timestamp"],
                    "outcome": meta["outcome"],
                    "relevance": 1 - distance
                })
        
        return episodes
    
    def format_for_context(self, episodes: list[dict]) -> str:
        """Format retrieved episodes for injection into agent context."""
        if not episodes:
            return ""
        
        parts = ["Relevant past interactions:"]
        for ep in episodes:
            date = ep["timestamp"][:10]
            parts.append(f"\n[{date}, {ep['outcome']}] {ep['episode'][:200]}")
        
        return "\n".join(parts)

Part 3: Planning Patterns

Plan-and-Execute

from pydantic import BaseModel
from typing import List

class TaskPlan(BaseModel):
    goal: str
    tasks: List[str]
    success_criteria: str

def generate_plan(goal: str) -> TaskPlan:
    """Generate a structured task plan before execution."""
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": """Create a step-by-step plan to accomplish the goal.
                Return JSON with:
                - goal: restated goal
                - tasks: list of specific, actionable tasks (5-10 steps)
                - success_criteria: how to know the goal is accomplished
                
                Tasks should be concrete actions, not vague steps."""
            },
            {"role": "user", "content": f"Goal: {goal}"}
        ],
        response_format={"type": "json_object"}
    )
    
    plan_data = json.loads(response.choices[0].message.content)
    return TaskPlan(**plan_data)

class PlanExecuteAgent:
    def __init__(self, tools: list, model: str = "gpt-4o-mini"):
        self.tools = tools
        self.model = model
        self.working_memory = WorkingMemory()
        self.episodic_memory = EpisodicMemory()
        self.current_plan: TaskPlan | None = None
        self.completed_tasks: list[str] = []
        self.task_results: list[str] = []
    
    def run(self, goal: str) -> str:
        # 1. Generate plan
        self.current_plan = generate_plan(goal)
        print(f"Plan created: {len(self.current_plan.tasks)} tasks")
        
        # 2. Retrieve relevant memories
        past_episodes = self.episodic_memory.retrieve_relevant(goal)
        memory_context = self.episodic_memory.format_for_context(past_episodes)
        
        # 3. Set up working memory
        self.working_memory.set_system_prompt(f"""You are executing a plan to: {goal}

Plan:
{chr(10).join(f'{i+1}. {task}' for i, task in enumerate(self.current_plan.tasks))}

Success criteria: {self.current_plan.success_criteria}

{memory_context}

Complete each task in order. Mark tasks as complete when done.""")
        
        # 4. Execute tasks
        for i, task in enumerate(self.current_plan.tasks):
            print(f"\nExecuting task {i+1}: {task}")
            
            # Check if re-planning is needed
            if self._should_replan(task):
                print("Replanning based on intermediate results...")
                new_plan = generate_plan(f"{goal} (previously attempted, now adjusting based on: {self.task_results[-1][:200]})")
                self.current_plan = new_plan
                break
            
            result = self._execute_task(task, i)
            self.completed_tasks.append(task)
            self.task_results.append(result)
        
        # 5. Generate final output
        final_response = self._synthesize_results(goal)
        
        # 6. Store in episodic memory
        self.episodic_memory.store_episode(
            session_id="session_001",
            user_message=goal,
            agent_response=final_response[:500],
            tools_used=[t.__name__ for t in self.tools],
            outcome="success"
        )
        
        return final_response
    
    def _should_replan(self, next_task: str) -> bool:
        """Check if earlier results suggest the plan should change."""
        if not self.task_results:
            return False
        
        last_result = self.task_results[-1]
        check_response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "user",
                "content": f"Given this result: '{last_result[:200]}'\n"
                           f"Does the next planned task still make sense: '{next_task}'?\n"
                           f"Answer only: yes or no"
            }]
        )
        return "no" in check_response.choices[0].message.content.lower()
    
    def _execute_task(self, task: str, step_index: int) -> str:
        """Execute a single task."""
        context = f"Completed {step_index} of {len(self.current_plan.tasks)} tasks."
        if self.task_results:
            context += f"\nLast result: {self.task_results[-1][:300]}"
        
        self.working_memory.add_message("user", f"{context}\n\nNow execute: {task}")
        
        response = client.chat.completions.create(
            model=self.model,
            messages=self.working_memory.get_context()
        )
        
        result = response.choices[0].message.content
        self.working_memory.add_message("assistant", result)
        
        return result
    
    def _synthesize_results(self, goal: str) -> str:
        results_text = "\n\n".join([
            f"Task: {task}\nResult: {result[:300]}"
            for task, result in zip(self.completed_tasks, self.task_results)
        ])
        
        response = client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": "Synthesize the task results into a coherent final answer."},
                {"role": "user", "content": f"Goal: {goal}\n\nTask Results:\n{results_text}"}
            ]
        )
        
        return response.choices[0].message.content

Part 4: Semantic Memory for User Preferences

class SemanticMemory:
    """Store learned facts about users and domain."""
    
    def __init__(self):
        self.chroma = chromadb.PersistentClient(path="./agent_memory")
        self.facts = self.chroma.get_or_create_collection("semantic_facts")
    
    def store_fact(self, fact: str, category: str, subject: str = "user"):
        embedding = self._embed(fact)
        doc_id = f"{subject}_{category}_{hash(fact)}"
        
        self.facts.upsert(
            ids=[doc_id],
            embeddings=[embedding],
            documents=[fact],
            metadatas={
                "category": category,
                "subject": subject,
                "stored_at": datetime.now().isoformat()
            }
        )
    
    def recall(self, query: str, top_k: int = 5) -> list[str]:
        query_emb = self._embed(query)
        results = self.facts.query(
            query_embeddings=[query_emb],
            n_results=top_k
        )
        return results["documents"][0] if results["documents"] else []
    
    def extract_and_store_facts(self, conversation: str):
        """Use LLM to extract memorable facts from conversation."""
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "system",
                    "content": """Extract facts worth remembering from this conversation.
                    Return JSON: {"facts": [{"fact": "...", "category": "preference/project/personal/technical"}]}
                    Only include facts that would be useful in future conversations.
                    Return empty list if nothing worth remembering."""
                },
                {"role": "user", "content": conversation}
            ],
            response_format={"type": "json_object"}
        )
        
        data = json.loads(response.choices[0].message.content)
        for item in data.get("facts", []):
            self.store_fact(item["fact"], item["category"])

Conclusion

The key insight: context is the most precious resource in an agent. Every token not used wisely is a token that could hold a more relevant fact, a more useful tool result, or a better plan.

For the graph-based framework that makes these patterns composable, see our LangGraph tutorial. For building specialized research agents with these memory systems, see our AI research agent guide.

Frequently Asked Questions

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

AI Learning

🔥 Trending

AI Agents Explained: How Autonomous AI Systems Work and What They Can Do

AI agents explained — how autonomous AI systems perceive, reason, and act to complete complex tasks, the architectures powering them, and practical examples from ReAct to LangGraph.

May 27, 2026 7 min read

AI Learning

AI Agents and the Future of Work: What's Actually Changing in 2025-2030

AI agents and the future of work — what tasks are being automated, which jobs are transforming, and what skills matter most as autonomous agents reshape knowledge work.

May 27, 2026 9 min read

AI Learning

🔥 Trending

Will AI Agents Replace Software Developers? The Honest Technical Analysis

Will AI agents replace software developers? An honest technical analysis of what AI agents can and can't do, current limitations, and what skills remain uniquely human in 2025.

May 27, 2026 8 min read

AI Learning

Build a Research Agent: End-to-End Autonomous Research Tool in Python

Build a complete AI research agent in Python — web search, source validation, synthesis, and report generation. Production patterns with LangGraph and real code.

May 27, 2026 10 min read

Go deeper on this topic

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

AI Agent Memory and Planning: How Agents Remember and Reason About Long Tasks

AI Agent Memory and Planning: How Agents Remember and Reason About Long Tasks

The Four Types of Agent Memory

Part 1: Working Memory Management

Part 2: Episodic Memory with Semantic Search

Part 3: Planning Patterns

Plan-and-Execute

Part 4: Semantic Memory for User Preferences

Conclusion

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

AI Agents Explained: How Autonomous AI Systems Work and What They Can Do

AI Agents and the Future of Work: What's Actually Changing in 2025-2030

Will AI Agents Replace Software Developers? The Honest Technical Analysis

Build a Research Agent: End-to-End Autonomous Research Tool in Python

Go deeper on this topic

Get Free AI Notes Daily

AI Agent Memory and Planning: How Agents Remember and Reason About Long Tasks

AI Agent Memory and Planning: How Agents Remember and Reason About Long Tasks

The Four Types of Agent Memory

Part 1: Working Memory Management

Part 2: Episodic Memory with Semantic Search

Part 3: Planning Patterns

Plan-and-Execute

Part 4: Semantic Memory for User Preferences

Conclusion

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

AI Agents Explained: How Autonomous AI Systems Work and What They Can Do

AI Agents and the Future of Work: What's Actually Changing in 2025-2030

Will AI Agents Replace Software Developers? The Honest Technical Analysis

Build a Research Agent: End-to-End Autonomous Research Tool in Python

Go deeper on this topic

Get Free AI Notes Daily