Core Capabilities of Modern Agents

Modern AI agents derive their power from a set of core capabilities that, when combined, enable them to tackle tasks of surprising complexity. Understanding these capabilities — and their current limitations — is essential for designing agents that work reliably in practice.

1. Natural Language Understanding and Generation

The foundation of every LLM-based agent is its ability to understand ambiguous, open-ended instructions and produce coherent, contextually appropriate responses. This means:

Interpreting vague goals ("make this code better") and resolving them into actionable steps
Understanding tool output that may be noisy, partial, or in unexpected formats
Generating structured outputs (JSON, code, SQL) as well as prose explanations

2. Tool Use (Function Calling)

Tools extend an agent beyond pure text generation. Common categories include:

Tool Category	Examples
Information retrieval	Web search, database query, document retrieval
Computation	Code interpreter, calculator, data analysis
Communication	Email, Slack, webhook calls
File operations	Read/write files, create spreadsheets
Agent spawning	Delegating subtasks to sub-agents

# Example: defining a tool for the agent to use
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for current information on any topic.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query to execute"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

3. Multi-Step Planning

Rather than solving a problem in one shot, agents can break complex tasks into steps, execute them in sequence (or parallel), and adapt the plan based on intermediate results. This is what allows an agent to handle tasks like:

"Research competitor pricing, analyze the data, and draft a pricing strategy memo"
"Find all failing tests in this repository, diagnose the root causes, and fix them"

4. Memory Management

Agents maintain different types of memory:

Working memory: The current conversation and task context (in the context window)
Episodic memory: Records of past interactions (stored externally, retrieved as needed)
Semantic memory: A knowledge base the agent can query (vector store or database)

Managing what goes into context and what gets stored externally is one of the key engineering challenges in agent design.

5. Self-Monitoring and Error Recovery

Capable agents don't just barrel forward when something goes wrong. They:

Detect when a tool call returns an error or unexpected result
Reason about why it went wrong
Try an alternative approach
Know when to escalate to a human rather than keep trying

Current Limitations

Being honest about limitations is as important as understanding capabilities:

Context window limits: Agents can lose track of early task context on very long tasks
Hallucination: LLMs can confidently generate incorrect information, especially about factual details
Tool reliability: Agents are only as reliable as their tools — a flaky API makes for a flaky agent
Cascading errors: Mistakes early in a multi-step task can compound into larger failures later

Understanding these limitations shapes how you design safeguards: human-in-the-loop checkpoints for high-stakes actions, output validation before acting on agent results, and conservative max-iteration limits.