Memory Systems

Conversation Memory and Buffers

13m read

Conversation Memory and Buffers

Memory is what transforms a stateless LLM into an agent that can maintain context across a conversation. Without memory, every message is a fresh start — the agent can't say "as I mentioned earlier" or build on previous exchanges. This lesson covers the spectrum of memory types, from simple buffers to sophisticated summarization strategies.

Why Memory Is Non-Trivial

The naive approach to conversation memory is to include the entire conversation history in every prompt. This works until:

  • The conversation exceeds the context window (usually after 50-100 exchanges)
  • The cost of passing thousands of tokens per message becomes prohibitive
  • Early irrelevant messages pollute the model's attention for current queries

Memory management is about deciding what to keep, what to compress, and what to discard.

Memory Type 1: ConversationBufferMemory

Keep the full conversation history. Simple, lossless, but doesn't scale:

from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Full buffer — keeps everything
memory = ConversationBufferMemory(
    return_messages=True,  # Return Message objects instead of string
    memory_key="chat_history"
)

chain = ConversationChain(llm=llm, memory=memory, verbose=True)

# Each exchange is remembered
response1 = chain.invoke({"input": "My name is Alice and I'm building a chatbot."})
response2 = chain.invoke({"input": "What am I building?"})
print(response2["response"])  # "You're building a chatbot."

# Access memory directly
print(memory.chat_memory.messages)  # Full conversation history
print(f"Messages in memory: {len(memory.chat_memory.messages)}")

Memory Type 2: ConversationBufferWindowMemory

Keep only the last K exchanges — a sliding window:

from langchain.memory import ConversationBufferWindowMemory

# Keep only last 5 exchanges (10 messages: 5 human + 5 AI)
window_memory = ConversationBufferWindowMemory(
    k=5,
    return_messages=True,
    memory_key="chat_history"
)

# After 5 exchanges, oldest messages are dropped
# Pros: Bounded memory usage
# Cons: Loses early context (may forget the user's name stated in turn 1)

Memory Type 3: ConversationSummaryMemory

Periodically summarize old conversations to compress them:

from langchain.memory import ConversationSummaryMemory, ConversationSummaryBufferMemory

# Summarize when context gets too long
summary_memory = ConversationSummaryMemory(
    llm=llm,
    memory_key="chat_history",
    return_messages=True,
)

# ConversationSummaryBufferMemory: keep recent messages verbatim,
# summarize older ones — best of both worlds
summary_buffer_memory = ConversationSummaryBufferMemory(
    llm=llm,
    max_token_limit=2000,   # Summarize when buffer exceeds this
    memory_key="chat_history",
    return_messages=True,
)

Building a Custom Memory Store

For production systems, you want conversation history persisted in a database (not just in-memory Python objects):

from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_community.chat_message_histories import PostgresChatMessageHistory
import json

class RedisChatHistory(BaseChatMessageHistory):
    """
    Redis-backed conversation history.
    
    Stores messages as JSON in a Redis list with TTL for automatic expiry.
    Suitable for production multi-user deployments.
    """
    
    def __init__(self, session_id: str, redis_client, ttl_seconds: int = 86400):
        self.session_id = session_id
        self.redis = redis_client
        self.ttl = ttl_seconds
        self.key = f"chat_history:{session_id}"

    @property
    def messages(self) -> list[BaseMessage]:
        raw_messages = self.redis.lrange(self.key, 0, -1)
        result = []
        for raw in raw_messages:
            data = json.loads(raw)
            if data["type"] == "human":
                result.append(HumanMessage(content=data["content"]))
            elif data["type"] == "ai":
                result.append(AIMessage(content=data["content"]))
        return result

    def add_message(self, message: BaseMessage) -> None:
        data = json.dumps({
            "type": "human" if isinstance(message, HumanMessage) else "ai",
            "content": message.content,
        })
        self.redis.rpush(self.key, data)
        self.redis.expire(self.key, self.ttl)

    def clear(self) -> None:
        self.redis.delete(self.key)


# Using session-aware chat with persistent storage
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Remember details the user shares about themselves."),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

chain = prompt | llm

# Wrap chain with history management
# history_store: dict[str, RedisChatHistory] (or PostgresChatHistory, etc.)
history_store: dict = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in history_store:
        # In production: RedisChatHistory(session_id, redis_client)
        from langchain_core.chat_history import InMemoryChatMessageHistory
        history_store[session_id] = InMemoryChatMessageHistory()
    return history_store[session_id]

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

# Each user gets isolated conversation history
result1 = chain_with_history.invoke(
    {"input": "I'm working on a Python data pipeline."},
    config={"configurable": {"session_id": "user_alice"}}
)
result2 = chain_with_history.invoke(
    {"input": "What am I working on?"},
    config={"configurable": {"session_id": "user_alice"}}
)
print(result2.content)  # Remembers Alice's Python pipeline

Choosing the Right Memory Type

ScenarioRecommended Memory
Short, focused conversationsConversationBufferMemory
Long conversations, recent context mattersConversationBufferWindowMemory (k=10-20)
Long conversations, full history neededConversationSummaryBufferMemory
Multi-session, persistent usersDatabase-backed history (Redis/Postgres)
Knowledge-intensive agentsVector store memory (next lesson)

Memory is the state management layer of your agent. Invest in getting it right early — retrofitting memory architecture into a working system is painful.