Memory & Context Management

Why Memory Matters

The Problem: LLMs are stateless -- each API call starts fresh with no memory of previous interactions. Without memory, agents cannot maintain context across conversations.

The Solution: Memory systems give agents the ability to retain conversation history, store important facts, and recall relevant past experiences -- making them more effective over time.

Real Impact: Memory-enabled agents can handle multi-session workflows, personalize responses, and avoid repeating mistakes.

Real-World Analogy

Think of agent memory like human memory systems:

Short-Term = Your working memory during a conversation
Long-Term = Facts and knowledge stored for later recall
Episodic = Memories of specific past experiences and outcomes
Context Window = How much you can hold in mind at once
Summarization = Creating mental summaries of long events

Memory Types at a Glance

Conversation Buffer

Store the full conversation history. Simple but grows linearly and can exceed context limits.

Sliding Window

Keep only the last N messages. Prevents context overflow but loses early context.

Summary Memory

Periodically summarize older messages. Preserves key information while saving tokens.

Vector Store Memory

Embed and store all interactions. Retrieve relevant memories via semantic search.

Short-Term Memory

sliding_window.py

class SlidingWindowMemory:
    def __init__(self, max_messages=20):
        self.messages = []
        self.max_messages = max_messages

    def add(self, role, content):
        self.messages.append({"role": role, "content": content})
        if len(self.messages) > self.max_messages:
            self.messages = self.messages[-self.max_messages:]

    def get_context(self):
        context = [{"role": "system", "content": self.system_prompt}]
        context.extend(self.messages)
        return context

Long-Term Memory

Memory Architecture

Episodic Memory

vector_memory.py

from chromadb import Client

class VectorMemory:
    def __init__(self):
        self.client = Client()
        self.collection = self.client.get_or_create_collection("agent_memory")

    def store(self, text, metadata=None):
        self.collection.add(
            documents=[text],
            metadatas=[metadata or {}],
            ids=[f"mem_{self.collection.count()}"]
        )

    def recall(self, query, n_results=5):
        results = self.collection.query(
            query_texts=[query], n_results=n_results
        )
        return results["documents"][0]

Context Window Management

Strategy	How It Works	Trade-off
Full History	Keep all messages	Hits token limits quickly
Sliding Window	Keep last N messages	Loses early context
Summarization	Summarize old messages	Lossy, extra LLM cost
RAG Retrieval	Embed and search past messages	Setup complexity
Hybrid	Window + summary + RAG	Most effective but complex

Quick Reference

Memory Type	Storage	Best For
Buffer	In-memory list	Short conversations
Window	Fixed-size list	Long conversations
Summary	Compressed text	Multi-session agents
Vector	Vector database	Knowledge-heavy agents
Entity	Structured store	Relationship tracking

Why Memory Matters

Why Memory Matters

Real-World Analogy

Memory Types at a Glance

Conversation Buffer

Sliding Window

Summary Memory

Vector Store Memory

Short-Term Memory

Long-Term Memory

Episodic Memory

Context Window Management

Quick Reference

Related Topics