What Are AI Agents?

Core Capabilities of Modern Agents

11m read

Core Capabilities of Modern Agents

Modern AI agents derive their power from a set of core capabilities that, when combined, enable them to tackle tasks of surprising complexity. Understanding these capabilities — and their current limitations — is essential for designing agents that work reliably in practice.

1. Natural Language Understanding and Generation

The foundation of every LLM-based agent is its ability to understand ambiguous, open-ended instructions and produce coherent, contextually appropriate responses. This means:

  • Interpreting vague goals ("make this code better") and resolving them into actionable steps
  • Understanding tool output that may be noisy, partial, or in unexpected formats
  • Generating structured outputs (JSON, code, SQL) as well as prose explanations

2. Tool Use (Function Calling)

Tools extend an agent beyond pure text generation. Common categories include:

Tool CategoryExamples
Information retrievalWeb search, database query, document retrieval
ComputationCode interpreter, calculator, data analysis
CommunicationEmail, Slack, webhook calls
File operationsRead/write files, create spreadsheets
Agent spawningDelegating subtasks to sub-agents
# Example: defining a tool for the agent to use
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for current information on any topic.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query to execute"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

3. Multi-Step Planning

Rather than solving a problem in one shot, agents can break complex tasks into steps, execute them in sequence (or parallel), and adapt the plan based on intermediate results. This is what allows an agent to handle tasks like:

  • "Research competitor pricing, analyze the data, and draft a pricing strategy memo"
  • "Find all failing tests in this repository, diagnose the root causes, and fix them"

4. Memory Management

Agents maintain different types of memory:

  • Working memory: The current conversation and task context (in the context window)
  • Episodic memory: Records of past interactions (stored externally, retrieved as needed)
  • Semantic memory: A knowledge base the agent can query (vector store or database)

Managing what goes into context and what gets stored externally is one of the key engineering challenges in agent design.

5. Self-Monitoring and Error Recovery

Capable agents don't just barrel forward when something goes wrong. They:

  • Detect when a tool call returns an error or unexpected result
  • Reason about why it went wrong
  • Try an alternative approach
  • Know when to escalate to a human rather than keep trying

Current Limitations

Being honest about limitations is as important as understanding capabilities:

  • Context window limits: Agents can lose track of early task context on very long tasks
  • Hallucination: LLMs can confidently generate incorrect information, especially about factual details
  • Tool reliability: Agents are only as reliable as their tools — a flaky API makes for a flaky agent
  • Cascading errors: Mistakes early in a multi-step task can compound into larger failures later

Understanding these limitations shapes how you design safeguards: human-in-the-loop checkpoints for high-stakes actions, output validation before acting on agent results, and conservative max-iteration limits.