Conversation Memory and Buffers

Memory is what transforms a stateless LLM into an agent that can maintain context across a conversation. Without memory, every message is a fresh start — the agent can't say "as I mentioned earlier" or build on previous exchanges. This lesson covers the spectrum of memory types, from simple buffers to sophisticated summarization strategies.

Why Memory Is Non-Trivial

The naive approach to conversation memory is to include the entire conversation history in every prompt. This works until:

The conversation exceeds the context window (usually after 50-100 exchanges)
The cost of passing thousands of tokens per message becomes prohibitive
Early irrelevant messages pollute the model's attention for current queries

Memory management is about deciding what to keep, what to compress, and what to discard.

Memory Type 1: ConversationBufferMemory

Keep the full conversation history. Simple, lossless, but doesn't scale:

from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Full buffer — keeps everything
memory = ConversationBufferMemory(
    return_messages=True,  # Return Message objects instead of string
    memory_key="chat_history"
)

chain = ConversationChain(llm=llm, memory=memory, verbose=True)

# Each exchange is remembered
response1 = chain.invoke({"input": "My name is Alice and I'm building a chatbot."})
response2 = chain.invoke({"input": "What am I building?"})
print(response2["response"])  # "You're building a chatbot."

# Access memory directly
print(memory.chat_memory.messages)  # Full conversation history
print(f"Messages in memory: {len(memory.chat_memory.messages)}")

Memory Type 2: ConversationBufferWindowMemory

Keep only the last K exchanges — a sliding window:

from langchain.memory import ConversationBufferWindowMemory

# Keep only last 5 exchanges (10 messages: 5 human + 5 AI)
window_memory = ConversationBufferWindowMemory(
    k=5,
    return_messages=True,
    memory_key="chat_history"
)

# After 5 exchanges, oldest messages are dropped
# Pros: Bounded memory usage
# Cons: Loses early context (may forget the user's name stated in turn 1)

Memory Type 3: ConversationSummaryMemory

Periodically summarize old conversations to compress them:

from langchain.memory import ConversationSummaryMemory, ConversationSummaryBufferMemory

# Summarize when context gets too long
summary_memory = ConversationSummaryMemory(
    llm=llm,
    memory_key="chat_history",
    return_messages=True,
)

# ConversationSummaryBufferMemory: keep recent messages verbatim,
# summarize older ones — best of both worlds
summary_buffer_memory = ConversationSummaryBufferMemory(
    llm=llm,
    max_token_limit=2000,   # Summarize when buffer exceeds this
    memory_key="chat_history",
    return_messages=True,
)

Building a Custom Memory Store

For production systems, you want conversation history persisted in a database (not just in-memory Python objects):

from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_community.chat_message_histories import PostgresChatMessageHistory
import json

class RedisChatHistory(BaseChatMessageHistory):
    """
    Redis-backed conversation history.
    
    Stores messages as JSON in a Redis list with TTL for automatic expiry.
    Suitable for production multi-user deployments.
    """
    
    def __init__(self, session_id: str, redis_client, ttl_seconds: int = 86400):
        self.session_id = session_id
        self.redis = redis_client
        self.ttl = ttl_seconds
        self.key = f"chat_history:{session_id}"

    @property
    def messages(self) -> list[BaseMessage]:
        raw_messages = self.redis.lrange(self.key, 0, -1)
        result = []
        for raw in raw_messages:
            data = json.loads(raw)
            if data["type"] == "human":
                result.append(HumanMessage(content=data["content"]))
            elif data["type"] == "ai":
                result.append(AIMessage(content=data["content"]))
        return result

    def add_message(self, message: BaseMessage) -> None:
        data = json.dumps({
            "type": "human" if isinstance(message, HumanMessage) else "ai",
            "content": message.content,
        })
        self.redis.rpush(self.key, data)
        self.redis.expire(self.key, self.ttl)

    def clear(self) -> None:
        self.redis.delete(self.key)


# Using session-aware chat with persistent storage
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Remember details the user shares about themselves."),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

chain = prompt | llm

# Wrap chain with history management
# history_store: dict[str, RedisChatHistory] (or PostgresChatHistory, etc.)
history_store: dict = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in history_store:
        # In production: RedisChatHistory(session_id, redis_client)
        from langchain_core.chat_history import InMemoryChatMessageHistory
        history_store[session_id] = InMemoryChatMessageHistory()
    return history_store[session_id]

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

# Each user gets isolated conversation history
result1 = chain_with_history.invoke(
    {"input": "I'm working on a Python data pipeline."},
    config={"configurable": {"session_id": "user_alice"}}
)
result2 = chain_with_history.invoke(
    {"input": "What am I working on?"},
    config={"configurable": {"session_id": "user_alice"}}
)
print(result2.content)  # Remembers Alice's Python pipeline

Choosing the Right Memory Type

Scenario	Recommended Memory
Short, focused conversations	ConversationBufferMemory
Long conversations, recent context matters	ConversationBufferWindowMemory (k=10-20)
Long conversations, full history needed	ConversationSummaryBufferMemory
Multi-session, persistent users	Database-backed history (Redis/Postgres)
Knowledge-intensive agents	Vector store memory (next lesson)

Memory is the state management layer of your agent. Invest in getting it right early — retrofitting memory architecture into a working system is painful.