Integrating the Anthropic API
Anthropic's Claude models (Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku) are known for their strong reasoning, long context (200K tokens), and adherence to constitutional AI safety principles. The Anthropic API has a clean design with a few important differences from OpenAI's that are worth understanding.
Installation and Setup
pip install anthropic python-dotenv
# client setup
import os
from dotenv import load_dotenv
import anthropic
load_dotenv()
# Reads ANTHROPIC_API_KEY from environment automatically
client = anthropic.Anthropic()
Key Difference: System Prompt is a Top-Level Parameter
In OpenAI's API, the system message is just another item in the messages array with role: "system". In Anthropic's API, the system prompt is a separate top-level parameter — not part of the messages array:
# OpenAI style (system as message)
openai_messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
]
# Anthropic style (system as separate parameter)
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system="You are a helpful assistant.", # <-- top-level parameter
messages=[
{"role": "user", "content": "Hello!"} # No system role here
]
)
print(response.content[0].text)
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
Response Structure
Anthropic responses differ structurally from OpenAI:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=512,
messages=[{"role": "user", "content": "What is a neural network?"}]
)
# Anthropic response structure
print(response.id) # msg_01...
print(response.model) # claude-3-5-sonnet-20241022
print(response.stop_reason) # "end_turn" | "max_tokens" | "stop_sequence"
print(response.content[0].type) # "text"
print(response.content[0].text) # The actual response text
print(response.usage.input_tokens)
print(response.usage.output_tokens)
Streaming
# Synchronous streaming
with client.messages.stream(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain the concept of entropy in information theory."}],
) as stream:
for text_chunk in stream.text_stream:
print(text_chunk, end="", flush=True)
# Get the final message after streaming completes
final_message = stream.get_final_message()
print(f"\nTotal input tokens: {final_message.usage.input_tokens}")
Extended Thinking (Claude 3.7+)
Claude 3.7 Sonnet supports extended thinking — the model shows its reasoning process before answering:
# Extended thinking for complex reasoning tasks
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # Max tokens for thinking
},
messages=[{
"role": "user",
"content": "Prove that there are infinitely many prime numbers."
}]
)
# Response has two content blocks: thinking + text
for block in response.content:
if block.type == "thinking":
print("=== Claude's Reasoning ===")
print(block.thinking)
elif block.type == "text":
print("=== Final Answer ===")
print(block.text)
Prompt Caching (Cost Reduction)
Anthropic's prompt caching dramatically reduces costs when reusing large system prompts:
# Cache a large system prompt (minimum 1024 tokens to be cacheable)
large_system_prompt = """You are an expert legal analyst specializing in contract law...
[...thousands of tokens of legal context...]"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=[
{
"type": "text",
"text": large_system_prompt,
"cache_control": {"type": "ephemeral"} # Cache this block
}
],
messages=[{"role": "user", "content": "Review this contract clause..."}]
)
# First call: full input token cost
# Subsequent calls with same system: cache read at ~10% cost
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
Model Selection Guide
| Model | Best For | Speed | Cost |
|---|---|---|---|
claude-3-5-haiku-20241022 | High-volume, latency-sensitive | Fastest | Cheapest |
claude-3-5-sonnet-20241022 | Balance of capability and cost | Fast | Medium |
claude-3-opus-20240229 | Maximum capability (complex tasks) | Slower | Most expensive |
claude-3-7-sonnet-20250219 | Extended thinking, hard reasoning | Variable | Medium-High |
For most agent use cases, claude-3-5-sonnet-20241022 offers the best capability-to-cost ratio. Reserve Opus for tasks requiring the highest reasoning quality where cost is not the primary constraint.