Memory & Context Management

Medium 25 min read

Why Memory Matters

Why Memory Matters

The Problem: LLMs are stateless -- each API call starts fresh with no memory of previous interactions. Without memory, agents cannot maintain context across conversations.

The Solution: Memory systems give agents the ability to retain conversation history, store important facts, and recall relevant past experiences -- making them more effective over time.

Real Impact: Memory-enabled agents can handle multi-session workflows, personalize responses, and avoid repeating mistakes.

Real-World Analogy

Think of agent memory like human memory systems:

  • Short-Term = Your working memory during a conversation
  • Long-Term = Facts and knowledge stored for later recall
  • Episodic = Memories of specific past experiences and outcomes
  • Context Window = How much you can hold in mind at once
  • Summarization = Creating mental summaries of long events

Memory Types at a Glance

Conversation Buffer

Store the full conversation history. Simple but grows linearly and can exceed context limits.

Sliding Window

Keep only the last N messages. Prevents context overflow but loses early context.

Summary Memory

Periodically summarize older messages. Preserves key information while saving tokens.

Vector Store Memory

Embed and store all interactions. Retrieve relevant memories via semantic search.

Short-Term Memory

sliding_window.py
class SlidingWindowMemory:
    def __init__(self, max_messages=20):
        self.messages = []
        self.max_messages = max_messages

    def add(self, role, content):
        self.messages.append({"role": role, "content": content})
        if len(self.messages) > self.max_messages:
            self.messages = self.messages[-self.max_messages:]

    def get_context(self):
        context = [{"role": "system", "content": self.system_prompt}]
        context.extend(self.messages)
        return context

Long-Term Memory

Memory Architecture
Agent Short-Term Conversation buffer Working Memory Current task context Long-Term Store Vector DB + summaries Bidirectional sync

Episodic Memory

vector_memory.py
from chromadb import Client

class VectorMemory:
    def __init__(self):
        self.client = Client()
        self.collection = self.client.get_or_create_collection("agent_memory")

    def store(self, text, metadata=None):
        self.collection.add(
            documents=[text],
            metadatas=[metadata or {}],
            ids=[f"mem_{self.collection.count()}"]
        )

    def recall(self, query, n_results=5):
        results = self.collection.query(
            query_texts=[query], n_results=n_results
        )
        return results["documents"][0]

Context Window Management

StrategyHow It WorksTrade-off
Full HistoryKeep all messagesHits token limits quickly
Sliding WindowKeep last N messagesLoses early context
SummarizationSummarize old messagesLossy, extra LLM cost
RAG RetrievalEmbed and search past messagesSetup complexity
HybridWindow + summary + RAGMost effective but complex

Quick Reference

Memory TypeStorageBest For
BufferIn-memory listShort conversations
WindowFixed-size listLong conversations
SummaryCompressed textMulti-session agents
VectorVector databaseKnowledge-heavy agents
EntityStructured storeRelationship tracking