Tool Error Handling and Validation
Tools fail — APIs go down, databases return unexpected results, users provide invalid inputs. How your agent handles tool failures determines whether it degrades gracefully or crashes ungracefully. This lesson covers the complete error handling strategy for robust tool-using agents.
Why Tool Errors Are Special
When a regular Python function raises an exception, you handle it with try/except. When a tool in an agent raises an exception, the consequences cascade:
- The agent loop may crash entirely, losing all progress
- The exception traceback may be sent to the model as raw text, confusing it
- The agent may retry the same failing call repeatedly
Proper tool error handling returns errors as informative strings — not exceptions — so the model can reason about what went wrong and try an alternative approach.
The Error Return Pattern
from langchain_core.tools import tool
from typing import Union
import httpx
import json
@tool
def call_external_api(endpoint: str, payload: dict) -> str:
"""Call an external REST API endpoint.
Args:
endpoint: Full URL of the API endpoint
payload: JSON payload to send as POST body
Returns:
JSON response string on success, or error description on failure.
"""
# Never raise exceptions — always return strings (success or error)
# Input validation
if not endpoint.startswith(("http://", "https://")):
return "ERROR: endpoint must start with http:// or https://"
if len(str(payload)) > 10_000:
return "ERROR: payload too large (>10KB). Split into smaller requests."
try:
with httpx.Client(timeout=30.0) as client:
response = client.post(endpoint, json=payload)
if response.status_code == 200:
return response.text
elif response.status_code == 400:
return f"ERROR: Bad request. API returned: {response.text[:300]}"
elif response.status_code == 401:
return "ERROR: Authentication failed. API key may be invalid or expired."
elif response.status_code == 404:
return f"ERROR: Endpoint not found: {endpoint}"
elif response.status_code == 429:
retry_after = response.headers.get("Retry-After", "60")
return f"ERROR: Rate limited. Try again after {retry_after} seconds."
elif response.status_code >= 500:
return f"ERROR: Server error ({response.status_code}). The API service may be down. Try again later."
else:
return f"ERROR: Unexpected status {response.status_code}: {response.text[:200]}"
except httpx.TimeoutException:
return "ERROR: Request timed out after 30 seconds. The endpoint may be slow or unreachable."
except httpx.ConnectError:
return f"ERROR: Could not connect to {endpoint}. Check if the URL is correct and the service is running."
except Exception as e:
# Last resort — never let unknown errors crash the agent
return f"ERROR: Unexpected failure: {type(e).__name__}: {str(e)}"
Validation Before Execution
Validate tool inputs before performing the actual operation, especially for destructive actions:
from pydantic import BaseModel, field_validator, ValidationError
import re
class EmailSendInput(BaseModel):
to_addresses: list[str]
subject: str
body: str
@field_validator("to_addresses")
@classmethod
def validate_emails(cls, addresses: list[str]) -> list[str]:
email_pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
invalid = [addr for addr in addresses if not email_pattern.match(addr)]
if invalid:
raise ValueError(f"Invalid email addresses: {invalid}")
if len(addresses) > 50:
raise ValueError("Cannot send to more than 50 recipients at once")
return addresses
@field_validator("subject")
@classmethod
def validate_subject(cls, subject: str) -> str:
if not subject.strip():
raise ValueError("Subject cannot be empty")
if len(subject) > 200:
raise ValueError(f"Subject too long ({len(subject)} chars). Maximum 200 characters.")
return subject.strip()
@tool
def send_email(to_addresses: list[str], subject: str, body: str) -> str:
"""Send an email to one or more recipients.
Args:
to_addresses: List of recipient email addresses (max 50)
subject: Email subject line (max 200 characters)
body: Email body in plain text or HTML
Returns:
Confirmation message on success, error description on failure.
"""
try:
validated = EmailSendInput(
to_addresses=to_addresses,
subject=subject,
body=body,
)
except ValidationError as e:
# Return validation errors as a clear string
errors = "; ".join(err["msg"] for err in e.errors())
return f"ERROR: Invalid input — {errors}"
# Proceed with validated data
try:
# email_service.send(validated.to_addresses, validated.subject, validated.body)
return f"SUCCESS: Email sent to {len(validated.to_addresses)} recipient(s). Subject: '{validated.subject}'"
except Exception as e:
return f"ERROR: Email sending failed: {str(e)}"
Idempotency and Safe Retries
Some tools should be idempotent — calling them multiple times with the same arguments should produce the same result without side effects. This is critical for agent reliability since agents may retry tool calls:
import hashlib
@tool
def create_database_record(table: str, data: dict) -> str:
"""Create a new record in the database.
This operation is idempotent: passing the same data multiple times will not
create duplicate records — it returns the existing record's ID if it already exists.
Args:
table: Target table name
data: Record data as key-value pairs
"""
# Generate a deterministic ID based on the content
content_hash = hashlib.sha256(json.dumps(data, sort_keys=True).encode()).hexdigest()[:16]
try:
# Check if record with this hash already exists
# existing = db.query(f"SELECT id FROM {table} WHERE content_hash = ?", [content_hash])
# if existing:
# return f"Record already exists with ID: {existing[0]['id']} (idempotent)"
# Create new record with content hash for deduplication
# record_id = db.insert(table, {**data, "content_hash": content_hash})
record_id = f"rec_{content_hash}"
return json.dumps({"success": True, "record_id": record_id, "created": True})
except Exception as e:
return f"ERROR: Failed to create record: {str(e)}"
Global Tool Error Handler in LangChain
For consistent error handling across all tools in an agent:
from langchain.agents import AgentExecutor
from langchain_core.tools import BaseTool
class SafeAgentExecutor(AgentExecutor):
"""AgentExecutor with centralized tool error handling."""
handle_parsing_errors: bool = True
handle_tool_error: bool = True # Catch tool exceptions and feed as observations
def _run_tool(self, tool: BaseTool, tool_input: str) -> str:
try:
return super()._run_tool(tool, tool_input)
except Exception as e:
# Log the error for monitoring
import logging
logging.error(f"Tool {tool.name} raised exception: {e}", exc_info=True)
# Return as observation so agent can reason about it
return f"Tool execution failed: {type(e).__name__}: {str(e)}"
Robust error handling is what makes the difference between an agent that's impressive in demos and one that's reliable in production. Every tool interaction is an opportunity for failure — design for it explicitly.