Integrating Tools and Memory

The real power of an agent emerges when tools and memory work together. A tool-using agent without memory repeats itself, re-fetches data it already has, and treats every task as if it's the first. A memory-equipped tool-using agent builds knowledge over time, retrieves relevant context before acting, and avoids redundant work. This lesson shows how to wire tools and memory into a coherent agent system.

The Integration Architecture

User request
     │
     ▼
Memory Recall: What do we know relevant to this request?
     │ (entity facts, past episodes, retrieved docs)
     ▼
Context Construction: Combine memory + request into prompt
     │
     ▼
Agent Reasoning: Plan actions based on available tools + context
     │
     ▼
Tool Execution: Call tools, collect results
     │
     ▼
Memory Update: Store new facts, update state, record episode
     │
     ▼
Response Generation: Formulate answer grounded in context + results

Memory-Aware Tool Selection

The agent should use memory to decide which tools to call and whether to call them at all:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.tools import tool
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Memory stores
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = Chroma(
    collection_name="agent_knowledge",
    embedding_function=embeddings,
    persist_directory="./.chroma_db",
)

@tool
def search_knowledge_base(query: str) -> str:
    """Search the agent's knowledge base for information on a topic.
    
    Use this BEFORE calling external APIs or searching the web — 
    the knowledge base may already have the answer cached.
    
    Returns relevant text passages with their sources.
    """
    results = vector_store.similarity_search(query, k=4)
    if not results:
        return "No relevant information found in the knowledge base for this query."
    
    formatted = []
    for doc in results:
        source = doc.metadata.get("source", "cached knowledge")
        formatted.append(f"[{source}]: {doc.page_content}")
    return "\n\n".join(formatted)

@tool
def store_in_knowledge_base(content: str, topic: str, source: str = "agent") -> str:
    """Store new information in the knowledge base for future retrieval.
    
    Use after fetching useful information from external sources to avoid 
    redundant API calls in future sessions.
    
    Args:
        content: The information to store
        topic: A brief label describing what this is about
        source: Where the information came from
    """
    from langchain_core.documents import Document
    
    doc = Document(
        page_content=content,
        metadata={"topic": topic, "source": source}
    )
    vector_store.add_documents([doc])
    return f"Stored in knowledge base under topic: '{topic}'"

@tool
def fetch_from_web(url: str) -> str:
    """Fetch content from a URL. 
    
    Use only when the knowledge base doesn't have the information needed.
    After fetching, consider storing important information in the knowledge base.
    """
    import httpx
    try:
        response = httpx.get(url, timeout=15.0, follow_redirects=True)
        response.raise_for_status()
        # In production: use html2text or similar for HTML content
        return response.text[:3000]  # Limit to first 3000 chars
    except Exception as e:
        return f"Failed to fetch {url}: {str(e)}"

tools = [search_knowledge_base, store_in_knowledge_base, fetch_from_web]

Building the Memory-Integrated Agent

import json
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# Session storage
session_histories: dict[str, InMemoryChatMessageHistory] = {}

def get_session_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in session_histories:
        session_histories[session_id] = InMemoryChatMessageHistory()
    return session_histories[session_id]

# Agent prompt with memory placeholders
prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a knowledgeable assistant with access to a knowledge base and the web.

Your memory strategy:
1. ALWAYS check the knowledge base first before fetching from external sources
2. After fetching useful information from the web, store it in the knowledge base
3. Reference past conversation context when relevant to avoid repeating questions
4. Be explicit when you're recalling stored information vs. generating from training

Available tools: {tool_names}"""),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
    MessagesPlaceholder("agent_scratchpad"),
])

# Create base agent
agent = create_tool_calling_agent(
    llm=llm,
    tools=tools,
    prompt=prompt.partial(tool_names=", ".join(t.name for t in tools)),
)

# Wrap in executor
executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    max_iterations=8,
    handle_parsing_errors=True,
)

# Add conversation history
agent_with_history = RunnableWithMessageHistory(
    executor,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

# Run a multi-turn conversation
session_id = "user_alice_session_1"

result1 = agent_with_history.invoke(
    {"input": "What are the best practices for Kafka consumer group configuration?"},
    config={"configurable": {"session_id": session_id}}
)
print(result1["output"])

# Second question — agent should use cached knowledge if it stored it
result2 = agent_with_history.invoke(
    {"input": "What about partition assignment strategies? Which is best for my use case?"},
    config={"configurable": {"session_id": session_id}}
)
print(result2["output"])

Memory Update Hooks

Track what the agent learns and updates across sessions:

from datetime import datetime

class AgentMemoryLogger:
    """Log all memory operations for observability and debugging."""
    
    def __init__(self):
        self.log: list[dict] = []
    
    def record_retrieval(self, query: str, results: list, session_id: str) -> None:
        self.log.append({
            "type": "retrieval",
            "timestamp": datetime.utcnow().isoformat(),
            "session_id": session_id,
            "query": query,
            "result_count": len(results),
        })
    
    def record_storage(self, content_preview: str, topic: str, session_id: str) -> None:
        self.log.append({
            "type": "storage",
            "timestamp": datetime.utcnow().isoformat(),
            "session_id": session_id,
            "topic": topic,
            "content_preview": content_preview[:100],
        })
    
    def get_session_summary(self, session_id: str) -> dict:
        session_ops = [op for op in self.log if op.get("session_id") == session_id]
        return {
            "total_retrievals": sum(1 for op in session_ops if op["type"] == "retrieval"),
            "total_storage_ops": sum(1 for op in session_ops if op["type"] == "storage"),
            "operations": session_ops,
        }

The combination of tools and memory turns a stateless LLM into a learning, adaptive agent — one that becomes more effective with each interaction as its knowledge base grows.