M06: Multi-Tool Orchestration
In M05 you gave Claude one tool at a time. Now you'll orchestrate multiple tools together — running them in parallel, chaining their outputs, and handling failures gracefully. This is where agents become truly powerful.
Learning Objectives
- Explain the difference between parallel and sequential tool calls, and when to use each
- Implement the agentic loop that processes multiple tool_use blocks and chains results
- Engineer effective tool descriptions that help Claude select the right tool
- Build a ToolRegistry that dynamically adds and removes tools based on context
- Handle partial failures, retries, and circuit breakers in multi-tool workflows
Parallel Tool Calls — When and Why
BEFORE: Imagine a kitchen where only one sous chef handles all prep work — chopping onions, then dicing tomatoes, then mincing garlic, one task after another while the head chef waits.
PAIN: Dinner service grinds to a halt because three 10-minute tasks take 30 minutes total, and every dish that depends on those ingredients sits idle the entire time.
MAPPING: Parallel tool callsWhen Claude requests multiple tool calls in a single response because the calls are independent. The client executes all tools concurrently and returns all results in one message, reducing round trips and wall-clock time. solve this exactly the way a smart head chef would — assign each task to a different sous chef so all three run simultaneously. The total time drops to the slowest single task (10 minutes), not the sum of all three (30 minutes). In agent terms, Claude emits multiple tool_use blocks in one response, your code runs them concurrently, and the wall-clock time equals the slowest tool, not the total.
tool_useA content block in Claude's response with type "tool_use". It contains the tool's name, a unique ID, and an input object with arguments. Your code reads this block, executes the function, and returns the result. When multiple tool_use blocks appear in one response, the calls are independent and can run in parallel. content blocks in a single assistant message when the calls are independent. Your client code then executes all of those tools concurrently. Remember from M05: your code runs the tools, not Claude. Once all tools finish, you bundle all the tool_resultA message you send back to Claude containing the output of a tool execution. It must reference the tool_use_id from Claude's request. When returning multiple results from parallel calls, include all tool_result blocks in a single user message. messages into a single user message and send them back to Claude in the next turn. The payoff? One API round trip handles N tools, instead of N separate round trips.
So what does parallel execution actually look like in an API response? When Claude decides it needs multiple independent tools, it returns multiple tool_use blocks in a single response. Here's an example:
Three tool_use blocks, each with its own id. Your code sees all three, fires them off concurrently (using ThreadPoolExecutor in Python or Promise.all in Node.js), and returns all three tool_result blocks in a single user message. One round trip, three tools executed.
Sequential Tool Chains — Output Feeds Input
BEFORE: Imagine trying to build a car by dumping all the raw materials — steel, rubber, glass — into a room and hoping a finished vehicle appears. Without a defined sequence, nothing fits together.
PAIN: You can't install a windshield before the frame is welded, and you can't paint the body before it's assembled. Doing steps out of order wastes materials and produces a broken result.
MAPPING: A sequential tool chain works like an assembly line — Station A (search) produces URLs, Station B (fetch_page) takes those URLs and produces page text, Station C (summarize) takes that text and produces bullet points. Each stage transforms the data and passes it forward. Skipping a station or running them out of order means the next stage gets the wrong input and the whole pipeline breaks.
tool_result, and Claude processes it and responds with another tool_use. This back-and-forth is managed by the agentic loopThe while loop in your code that repeatedly sends tool results back to Claude until stop_reason is "end_turn" instead of "tool_use". This is the same pattern you built in M05 — now it runs for multiple iterations., which keeps running until stop_reasonA field in Claude's API response indicating why generation stopped. "end_turn" means Claude finished normally. "tool_use" means Claude wants to call a tool and is waiting for your code to execute it and return the result. becomes "end_turn" — meaning Claude has finished its work and is ready to present a final answer.
Here's what a sequential chain looks like in practice. Notice how each step's output becomes the next step's input — data transforms at each stage:
Each round trip is a full API call. Claude sees the previous result, decides what to do next, and issues another tool_use. This continues until the chain is complete and Claude returns its final text response.
Tool Selection — How Claude Picks the Right Tool
BEFORE: Imagine walking into a hardware store where every tool is in an unmarked cardboard box — no labels, no descriptions, just numbered bins. You need to drive a small screw into a circuit board, but you have no idea which bin has the right screwdriver.
PAIN: You end up grabbing random boxes, trying tools that don't fit, stripping the screw, and wasting an hour on a two-minute job. Worse, you might use a power drill and destroy the delicate board entirely.
MAPPING: This is exactly what happens when Claude gets tool definitions with vague descriptions like "searches stuff". Claude reads the name and description of each tool to decide which one fits the user's request. A label like "Phillips-head screwdriver, size #2, for small electronics" maps directly to a good tool description: "Search the web using a query string. Returns top 5 results. Use for current events or factual questions." The clearer the label, the more accurate Claude's selection.
fetch_page next than send_email.
Tool Description Engineering
"name": "search", "description": "searches stuff" — too vague. Claude doesn't know what it searches, when to use it, or what it returns.
"name": "web_search", "description": "Search the web using a query string. Returns top 5 results with title, URL, and snippet. Use for current events, factual questions, or when the user asks to look something up online."
Keep 4–5 tools per agent maximum. Tool selection accuracy degrades rapidly above 5. Anti-pattern: one agent with 18+ tools. Instead, distribute tools across specialized subagents.
"More tools = more capable agent" — This is the most counterintuitive misconception in multi-tool orchestration. In practice, tool selection accuracy degrades noticeably once you pass 5–6 tools. Each additional tool adds more descriptions for Claude to evaluate, more chances for ambiguity between similar tools, and more input tokens per request. An agent with 4 focused tools will outperform an agent with 18 scattered ones almost every time.
"Claude automatically parallelizes independent tools" — Claude CAN return multiple tool_use blocks in one response, and it often does when the calls are clearly independent. But you still need to write the parallel execution logic in YOUR code (ThreadPoolExecutor, Promise.all). If your code processes tool_use blocks sequentially even when Claude sends multiple, you lose the speedup. Parallelism requires effort on both sides.
"Sequential is always worse than parallel" — Not at all. When Tool B needs the output of Tool A (e.g., fetch a page from a URL that search returned), they MUST run sequentially. Forcing parallel execution on dependent tools produces incorrect results — Tool B would run with no input. The right approach is to parallelize independent tools and chain dependent ones.
"Dynamic registration is premature optimization" — For a 3-tool agent, yes. For a production agent with 15–20 tools, it's essential. Each tool definition consumes 200–500 input tokens. Sending 20 tools with every request means 4,000–10,000 extra tokens per call — that's real cost at scale. And the accuracy benefit of fewer tools is arguably more important than the token savings.
"Claude always picks the right tool" — Claude is remarkably good at tool selection, but it's not infallible. Ambiguous descriptions, overlapping tool capabilities, and misleading parameter names all cause misselection. This is why tool description engineering matters — it's the single highest-leverage thing you can do to improve agent reliability.
Dynamic Tool Registration
BEFORE: Imagine a surgeon walking into the operating room and finding every instrument the hospital owns laid out on the tray — orthopedic saws, dental drills, eye surgery lasers, and the cardiac tools they actually need. Hundreds of instruments, all within reach.
PAIN: The surgeon wastes time scanning past irrelevant tools, risks grabbing the wrong instrument under pressure, and the tray is so cluttered that the correct scalpel is buried under equipment meant for a completely different specialty.
MAPPING: Dynamic tool registration is like a surgical nurse who curates the tray — only cardiac instruments are laid out for a heart surgery. In agent terms, instead of sending all 20 tools with every API call, you filter the tools array based on the current task context. Fewer tools means Claude scans less, picks more accurately, and you burn fewer input tokens on irrelevant definitions.
tools array you pass gets serialized into input tokens — Claude reads every tool's name, description, and parameter schema before deciding which to use. This is called token overheadTool definitions consume input tokens. Each tool's name, description, and parameter schema are serialized and sent with every API call. Sending 20 tools when only 3 are relevant wastes tokens on every request.. Dynamic tool registration means you build a different tools array for each request based on what the user actually needs. Why bother? Three reasons: (1) Cost — fewer tools means fewer input tokens burned on every call. (2) Accuracy — Claude picks better when it has 4 focused options instead of 20 scattered ones. (3) Security via least privilegeA security principle: give each request access only to the tools it actually needs. An admin-only tool like delete_user should not be available during a regular user's research query. — a regular user's research query should never see the delete_user tool, even if it exists in your system.
Handling Errors in Multi-Tool Workflows
BEFORE: Imagine a relay race where the team has no backup plan — four runners, one baton, and if anyone trips, the entire team is disqualified. No substitutes, no recovery protocol.
PAIN: In the real race, the second runner twists an ankle at the handoff. Without a plan, the baton hits the ground, the team freezes, and they forfeit a race they were winning. One failure cascades into total failure.
MAPPING: Multi-tool workflows face the same risk — if fetch_page returns a 404, does your entire agent crash? Error handling gives you the backup plan: return is_error: true so Claude can reason about alternatives (like switching to web_search), implement retries with exponential backoff for transient failures, and add circuit breakers that disable a tool after repeated failures so the agent doesn't waste time on a broken endpoint.
tool_result with is_error: trueA boolean flag in the tool_result message that tells Claude the tool execution failed. Claude can then reason about the failure and decide on an alternative approach, retry, or inform the user — rather than treating error text as a successful result. and a descriptive message. This tells Claude "this tool failed, here's why" — and Claude can then decide to try an alternative tool, ask the user for help, or work with partial results. For tools that fail repeatedly (e.g., an API endpoint that's down), use a circuit breakerA pattern that tracks consecutive failures for a tool. After N failures, the circuit "opens" and the tool is temporarily disabled. This prevents wasting tokens and time on a tool that's consistently failing. — a counter that disables the tool after N consecutive failures so the agent stops wasting tokens retrying a broken endpoint.
That relay race analogy maps directly to real API messages. When a tool fails, you don't just drop the baton — you hand Claude a structured report explaining what went wrong, so it can pick a new runner. Here's exactly what that looks like in practice — the JSON you'd send back to Claude when a tool fails versus when it succeeds with no results:
Claude makes very different decisions based on which one you return. The first says "the endpoint is broken, try something else." The second says "nothing matches your query." Confusing the two — returning an empty result when the tool actually crashed — leads Claude to conclude there's genuinely no data, which is a silent, hard-to-debug failure.
Error Handling Strategies
- Per-tool try/catch: Wrap each tool in error handling; return descriptive messages as
tool_result - Let Claude adapt: Return
is_error: true— Claude can often find alternatives on its own - Tool-level retries: Exponential backoff for transient failures (timeouts, rate limits)
- Circuit breakers: After N consecutive failures, disable the tool and notify the user
- Graceful degradation: Return partial results — some data is better than no data
"If a tool fails, just return an empty result" — This is one of the most dangerous mistakes in agent development. An empty result ({"results": []}) tells Claude "I checked and found nothing." An error (is_error: true) tells Claude "I couldn't even check." Claude makes completely different decisions based on which one you return. The first leads to "there's no data on this topic." The second leads to "let me try a different approach." Confusing them creates silent, hard-to-debug failures.
"Retrying a failed tool 100 times will eventually work" — Brute-force retries burn tokens and time. If an endpoint is down, it's down. Use exponential backoff (wait 1s, then 2s, then 4s) with a maximum of 3 retries for transient failures like timeouts. For persistent failures, use a circuit breaker instead.
"Claude can recover from any error automatically" — Claude is good at adapting when you give it structured error information. But it can only work with the tools you've provided. If the only way to get data is through a broken tool and no alternative exists, Claude will inform the user rather than magically producing correct information. Always design your tool set with fallback options where possible.
Circuit Breakers in Depth
A circuit breaker is borrowed from electrical engineering — when too much current flows through a wire, the breaker trips and cuts the circuit to prevent a fire. In agent code, it works the same way: you track consecutive failures for each tool, and after a threshold (typically 3–5 failures in a row), you temporarily disable that tool. This prevents the agent from wasting tokens and time retrying a broken endpoint over and over.
Here's how it works internally. You maintain a counter per tool (e.g., {"fetch_page": 0, "web_search": 0}). Every time a tool succeeds, its counter resets to zero. Every time it fails, the counter increments. When the counter hits your threshold, the circuit "opens" — meaning that tool is removed from the tools array sent to Claude on the next API call. Claude never even sees it as an option, so it naturally picks alternatives. After a cooldown period (say 60 seconds), you can "half-open" the circuit by adding the tool back for one test call to see if the endpoint has recovered.
How does this differ from simple retries? Retries happen within a single tool call — you might try the same HTTP request 3 times with exponential backoff before giving up. Circuit breakers operate across tool calls — they track a pattern of repeated failures over time and make a system-level decision to stop using that tool entirely. You'd typically use both together: retry transient failures within a call, and circuit-break persistent failures across calls.
Code Walkthrough: Research Assistant Agent
This agent demonstrates all orchestration patterns: parallel search, sequential fetch-and-summarize, dynamic tool registration, and error recovery.
Step 1: Define the Tools
input_schema specifying arguments. Remember from the Tool Selection section: Claude reads these schemas to decide which tools to call and what arguments to pass. It never executes them directly. The interesting part is the description field — notice how each one explains not just what the tool does, but when to use it (e.g., "Use after web_search to get full content from a result URL"). Vague descriptions like "does stuff" leave Claude guessing, and a guessing agent is an unreliable agent.
# pip install anthropic>=0.30.0
import anthropic
import json
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY env var
tools = [
{
"name": "web_search",
"description": (
"Search the web for current information. Returns top 3 "
"results with title, URL, and snippet. Use for recent "
"events, factual questions, or general research."
),
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
},
{
"name": "fetch_page",
"description": (
"Fetch the full text content of a web page by URL. "
"Returns page text (max 5000 chars). Use after "
"web_search to get full content from a result URL."
),
"input_schema": {
"type": "object",
"properties": {
"url": {"type": "string", "description": "Full URL to fetch"}
},
"required": ["url"]
}
},
{
"name": "summarize_text",
"description": (
"Summarize long text into key points (3-5 bullets). "
"Use after fetch_page to condense page content."
),
"input_schema": {
"type": "object",
"properties": {
"text": {"type": "string", "description": "Text to summarize"},
"max_points": {
"type": "integer",
"description": "Max bullet points (default 5)"
}
},
"required": ["text"]
}
},
{
"name": "format_citation",
"description": (
"Format a source as an academic citation. Use after "
"summaries are ready to create proper references."
),
"input_schema": {
"type": "object",
"properties": {
"title": {"type": "string", "description": "Article title"},
"url": {"type": "string", "description": "Source URL"},
"accessed_date": {"type": "string", "description": "e.g. '2025-01-15'"}
},
"required": ["title", "url"]
}
},
{
"name": "save_to_file",
"description": "Save content to a local file. Returns file path.",
"input_schema": {
"type": "object",
"properties": {
"filename": {"type": "string", "description": "Output filename"},
"content": {"type": "string", "description": "Content to save"}
},
"required": ["filename", "content"]
}
}
]
// npm install @anthropic-ai/sdk@^0.30.0
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic(); // reads ANTHROPIC_API_KEY env var
const tools = [
{
name: "web_search",
description:
"Search the web for current information. Returns top 3 " +
"results with title, URL, and snippet. Use for recent " +
"events, factual questions, or general research.",
input_schema: {
type: "object",
properties: {
query: { type: "string", description: "Search query" }
},
required: ["query"]
}
},
{
name: "fetch_page",
description:
"Fetch the full text content of a web page by URL. " +
"Returns page text (max 5000 chars). Use after " +
"web_search to get full content from a result URL.",
input_schema: {
type: "object",
properties: {
url: { type: "string", description: "Full URL to fetch" }
},
required: ["url"]
}
},
{
name: "summarize_text",
description:
"Summarize long text into key points (3-5 bullets). " +
"Use after fetch_page to condense page content.",
input_schema: {
type: "object",
properties: {
text: { type: "string", description: "Text to summarize" },
max_points: { type: "integer", description: "Max bullet points (default 5)" }
},
required: ["text"]
}
},
{
name: "format_citation",
description:
"Format a source as an academic citation. Use after " +
"summaries are ready to create proper references.",
input_schema: {
type: "object",
properties: {
title: { type: "string", description: "Article title" },
url: { type: "string", description: "Source URL" },
accessed_date: { type: "string", description: "e.g. '2025-01-15'" }
},
required: ["title", "url"]
}
},
{
name: "save_to_file",
description: "Save content to a local file. Returns file path.",
input_schema: {
type: "object",
properties: {
filename: { type: "string", description: "Output filename" },
content: { type: "string", description: "Content to save" }
},
required: ["filename", "content"]
}
}
];
Step 2: Implement Tools with Error Handling
execute_tool dispatcher function. Instead of having your agentic loop know the internals of every tool, it just calls execute_tool(name, inputs) and gets back a result. This keeps the loop clean: it doesn't care whether it's calling a web scraper or a database — it just passes a name and inputs and gets JSON back. Adding a new tool later means writing one function and adding one entry to the dictionary. No changes to the loop.
Here's the dilemma with error handling: if a tool throws an exception, should you crash the whole agent? Absolutely not. The dispatcher wraps every call in try/catch and returns errors as structured JSON with is_error: true. Claude reads that structured error and can reason about alternatives — "OK, the page fetch failed, let me try a web search instead." But this only works if you give Claude parseable information. A raw Python stack trace doesn't help it reason. A clean {"error": "404 Not Found", "tool": "fetch_page"} does.
# Mock implementations (replace with real APIs in production)
def web_search(query: str) -> dict:
time.sleep(0.2) # Simulate latency
return {"results": [
{"title": f"Result 1: {query}", "url": "https://example.com/1",
"snippet": f"Overview of {query}..."},
{"title": f"Result 2: {query}", "url": "https://example.com/2",
"snippet": f"Developments in {query}..."},
{"title": f"Result 3: {query}", "url": "https://broken.example.com/404",
"snippet": f"Deep dive into {query}..."},
]}
def fetch_page(url: str) -> dict:
time.sleep(0.3)
if "broken" in url or "404" in url:
raise ConnectionError(f"404 Not Found: {url}")
return {"content": f"Full page content from {url}. " * 20}
def summarize_text(text: str, max_points: int = 5) -> dict:
return {"summary": [f"Key point {i+1}" for i in range(min(max_points, 5))]}
def format_citation(title: str, url: str, accessed_date: str = None) -> dict:
date = accessed_date or "2025-01-15"
return {"citation": f'"{title}." Available at: {url}. Accessed: {date}.'}
def save_to_file(filename: str, content: str) -> dict:
return {"status": "saved", "path": f"/output/{filename}", "bytes": len(content)}
# Dispatcher with per-tool error handling
tool_functions = {
"web_search": web_search, "fetch_page": fetch_page,
"summarize_text": summarize_text, "format_citation": format_citation,
"save_to_file": save_to_file,
}
def execute_tool(name: str, inputs: dict) -> tuple[str, bool]:
"""Execute a tool, returning (result_json, is_error)."""
func = tool_functions.get(name)
if not func:
return json.dumps({"error": f"Unknown tool: {name}"}), True
try:
result = func(**inputs)
return json.dumps(result), False
except Exception as e:
return json.dumps({"error": str(e)}), True
// Mock implementations
async function webSearch(query) {
await new Promise(r => setTimeout(r, 200));
return { results: [
{ title: `Result 1: ${query}`, url: "https://example.com/1",
snippet: `Overview of ${query}...` },
{ title: `Result 2: ${query}`, url: "https://example.com/2",
snippet: `Developments in ${query}...` },
{ title: `Result 3: ${query}`, url: "https://broken.example.com/404",
snippet: `Deep dive into ${query}...` },
]};
}
async function fetchPage(url) {
await new Promise(r => setTimeout(r, 300));
if (url.includes("broken") || url.includes("404"))
throw new Error(`404 Not Found: ${url}`);
return { content: `Full page content from ${url}. `.repeat(20) };
}
function summarizeText(text, maxPoints = 5) {
return { summary: Array.from({ length: Math.min(maxPoints, 5) },
(_, i) => `Key point ${i + 1}`) };
}
function formatCitation(title, url, accessedDate) {
const date = accessedDate || "2025-01-15";
return { citation: `"${title}." Available at: ${url}. Accessed: ${date}.` };
}
function saveToFile(filename, content) {
return { status: "saved", path: `/output/${filename}`, bytes: content.length };
}
const toolFunctions = {
web_search: (i) => webSearch(i.query),
fetch_page: (i) => fetchPage(i.url),
summarize_text: (i) => summarizeText(i.text, i.max_points),
format_citation: (i) => formatCitation(i.title, i.url, i.accessed_date),
save_to_file: (i) => saveToFile(i.filename, i.content),
};
async function executeTool(name, inputs) {
const func = toolFunctions[name];
if (!func)
return { result: JSON.stringify({ error: `Unknown tool: ${name}` }), isError: true };
try {
const result = await func(inputs);
return { result: JSON.stringify(result), isError: false };
} catch (e) {
return { result: JSON.stringify({ error: e.message }), isError: true };
}
}
Step 3: The Agentic Loop with Parallel Execution
tool_use blocks, executes them, and feeds results back. It keeps going until Claude says it's done (stop_reason: "end_turn"). The clever part is the branching logic: when Claude returns multiple tool_use blocks, the code runs them in parallel using ThreadPoolExecutor (Python) or Promise.all (Node.js). When there's only one, it runs sequentially. Either way, results go back in one message. The critical safety measure you'll notice is the max_iterations limit — without it, a confused model could loop indefinitely, burning tokens and money. One subtle gotcha: you must append response.content (the full content array, including tool_use blocks) to messages, not just the text. Claude needs to see its own tool requests in the conversation history to make sense of the results you're sending back.
def run_research_agent(question: str, available_tools=None) -> str:
"""Run the agentic loop with parallel tool execution."""
active_tools = available_tools or tools
messages = [{"role": "user", "content": question}]
max_iterations = 10 # Safety limit
for iteration in range(max_iterations):
try:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
system="You are a research assistant. Search multiple "
"sources in parallel when possible.",
tools=active_tools,
messages=messages,
)
except anthropic.APIError as e:
return f"API error: {e.status_code} - {e.message}"
# Collect tool_use blocks
tool_uses = [b for b in response.content if b.type == "tool_use"]
if response.stop_reason == "end_turn" or not tool_uses:
# Claude is done ā extract final text
return "\n".join(
b.text for b in response.content if b.type == "text"
)
# Execute tools ā parallel when multiple requested
if len(tool_uses) > 1:
# PARALLEL: use ThreadPoolExecutor
tool_results = []
with ThreadPoolExecutor(max_workers=len(tool_uses)) as pool:
futures = {
pool.submit(execute_tool, tu.name, tu.input): tu.id
for tu in tool_uses
}
for future in as_completed(futures):
tid = futures[future]
result_json, is_err = future.result()
tool_results.append({
"type": "tool_result",
"tool_use_id": tid,
"content": result_json,
**({"is_error": True} if is_err else {}),
})
else:
# SEQUENTIAL: single tool
tu = tool_uses[0]
result_json, is_err = execute_tool(tu.name, tu.input)
tool_results = [{
"type": "tool_result",
"tool_use_id": tu.id,
"content": result_json,
**({"is_error": True} if is_err else {}),
}]
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
return "Max iterations reached."
# Run it
answer = run_research_agent(
"Research the latest developments in AI agents. "
"Search multiple sources and summarize findings."
)
print(answer)
async function runResearchAgent(question, availableTools) {
const activeTools = availableTools || tools;
const messages = [{ role: "user", content: question }];
const maxIterations = 10;
for (let i = 0; i < maxIterations; i++) {
let response;
try {
response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 4096,
system: "You are a research assistant. Search multiple " +
"sources in parallel when possible.",
tools: activeTools,
messages,
});
} catch (e) {
return `API error: ${e.status} - ${e.message}`;
}
const toolUses = response.content.filter(b => b.type === "tool_use");
if (response.stop_reason === "end_turn" || toolUses.length === 0) {
return response.content
.filter(b => b.type === "text")
.map(b => b.text)
.join("\n");
}
let toolResults;
if (toolUses.length > 1) {
// PARALLEL: Promise.all
toolResults = await Promise.all(
toolUses.map(async (tu) => {
const { result, isError } = await executeTool(tu.name, tu.input);
return {
type: "tool_result",
tool_use_id: tu.id,
content: result,
...(isError ? { is_error: true } : {}),
};
})
);
} else {
// SEQUENTIAL: single tool
const tu = toolUses[0];
const { result, isError } = await executeTool(tu.name, tu.input);
toolResults = [{
type: "tool_result",
tool_use_id: tu.id,
content: result,
...(isError ? { is_error: true } : {}),
}];
}
messages.push({ role: "assistant", content: response.content });
messages.push({ role: "user", content: toolResults });
}
return "Max iterations reached.";
}
const answer = await runResearchAgent(
"Research the latest developments in AI agents. " +
"Search multiple sources and summarize findings."
);
console.log(answer);
tool_use blocks, your code runs them concurrently using ThreadPoolExecutor (Python) or Promise.all (Node.js). When Claude returns a single tool call, it runs sequentially. Either way, results go back to Claude in one message, and the loop repeats until Claude says it's done (stop_reason: "end_turn"). This is the same pattern powering production agents — the only difference in real systems is that the tool implementations call actual APIs instead of mocks.
Step 4: Dynamic Tool Registry
ToolRegistry class stores tools with category tags and filters them on demand. The payoff is simple: instead of sending all 20 tools with every API call, you call get_tools_for_context(tags=["research"]) and get back only the 3 tools relevant to the current phase. This is especially powerful in multi-phase workflows — you'd use research tools during the search phase, then swap to citation tools once summaries are ready. One small trap to watch for: tag names are case-sensitive. If you tag one tool "research" and another "Research", the filter won't match both. Pick a convention (lowercase recommended) and stick with it.
class ToolRegistry:
"""Manages tools and filters them by context."""
def __init__(self):
self._tools: dict[str, dict] = {}
self._tags: dict[str, set[str]] = {}
def register(self, tool: dict, tags: list[str] = None):
name = tool["name"]
self._tools[name] = tool
self._tags[name] = set(tags or [])
def unregister(self, name: str):
self._tools.pop(name, None)
self._tags.pop(name, None)
def get_tools_for_context(
self, tags: list[str] = None, names: list[str] = None
) -> list[dict]:
if names:
return [self._tools[n] for n in names if n in self._tools]
if tags:
tag_set = set(tags)
return [
self._tools[n] for n, t in self._tags.items()
if t & tag_set
]
return list(self._tools.values())
# Usage
registry = ToolRegistry()
registry.register(tools[0], tags=["research", "search"])
registry.register(tools[1], tags=["research", "fetch"])
registry.register(tools[2], tags=["research", "analysis"])
registry.register(tools[3], tags=["citation"])
registry.register(tools[4], tags=["output"])
# Phase 1: only research tools
research_tools = registry.get_tools_for_context(tags=["research"])
# => [web_search, fetch_page, summarize_text]
# Phase 2: add citation tools after summaries are ready
cite_tools = registry.get_tools_for_context(
names=["format_citation", "save_to_file"]
)
class ToolRegistry {
constructor() {
this._tools = new Map();
this._tags = new Map();
}
register(tool, tags = []) {
this._tools.set(tool.name, tool);
this._tags.set(tool.name, new Set(tags));
}
unregister(name) {
this._tools.delete(name);
this._tags.delete(name);
}
getToolsForContext({ tags, names } = {}) {
if (names)
return names.filter(n => this._tools.has(n)).map(n => this._tools.get(n));
if (tags) {
const tagSet = new Set(tags);
const result = [];
for (const [name, toolTags] of this._tags) {
for (const t of tagSet) {
if (toolTags.has(t)) { result.push(this._tools.get(name)); break; }
}
}
return result;
}
return [...this._tools.values()];
}
}
// Usage
const registry = new ToolRegistry();
registry.register(tools[0], ["research", "search"]);
registry.register(tools[1], ["research", "fetch"]);
registry.register(tools[2], ["research", "analysis"]);
registry.register(tools[3], ["citation"]);
registry.register(tools[4], ["output"]);
const researchTools = registry.getToolsForContext({ tags: ["research"] });
const citeTools = registry.getToolsForContext({
names: ["format_citation", "save_to_file"]
});
ToolRegistry that tags tools by category ("research", "citation", "output") and filters them on demand. In a real workflow, you'd call get_tools_for_context(tags=["research"]) during the search phase to give Claude only 3 tools, then switch to names=["format_citation", "save_to_file"] when summaries are ready. This is how production agents keep tool sets lean — each phase of a multi-step workflow sees only the tools it needs, improving both accuracy and token efficiency.
Hands-On Exercise
What You'll Build
A multi-tool research agent that searches multiple sources in parallel, fetches pages sequentially, handles errors gracefully with is_error: true, and uses a ToolRegistry to filter tools by context.
Time Estimate: 30–45 minutes
Prerequisites: Python 3.10+ (or Node.js 18+), an Anthropic API key, and completion of M05 (Function Calling Fundamentals).
Files You'll Create: multi_tool_agent.py (or multi_tool_agent.mjs for Node.js) — a single file containing tool schemas, mock implementations, a tool dispatcher, a ToolRegistry, and the agentic loop with parallel execution.
Environment Setup
mkdir multi-tool-lab && cd multi-tool-lab
python -m venv venv && source venv/bin/activate # Windows: venv\Scripts\activate
pip install anthropic>=0.30.0
export ANTHROPIC_API_KEY=your-key-here # Windows: set ANTHROPIC_API_KEY=your-key-heremkdir multi-tool-lab && cd multi-tool-lab
npm init -y && npm install @anthropic-ai/sdk
export ANTHROPIC_API_KEY=your-key-here # Windows: set ANTHROPIC_API_KEY=your-key-hereStep 1: Define Tools, Mock Implementations & ToolRegistry
What: This step sets up everything the agent needs: 5 tool schemas that Claude will see, mock functions behind each schema, an execute_tool dispatcher with error handling, and a ToolRegistry for filtering tools by context.
Why: Separating tool definitions (what Claude sees) from tool implementations (what your code runs) is a fundamental pattern. The ToolRegistry adds the ability to dynamically filter which tools are sent to Claude based on context. We're putting it all in one file for simplicity.
Create a new file called multi_tool_agent.py and add the following:
import anthropic
import json
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY
# āā Tool Schemas (what Claude sees) āāāāāāāāāāāāāāāāāāāāāāāāāā
tools = [
{
"name": "web_search",
"description": (
"Search the web for current information. Returns top 3 "
"results with title, URL, and snippet. Use for recent "
"events, factual questions, or general research."
),
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
},
{
"name": "fetch_page",
"description": (
"Fetch the full text content of a web page by URL. "
"Returns page text (max 5000 chars). Use after "
"web_search to get full content from a result URL."
),
"input_schema": {
"type": "object",
"properties": {
"url": {"type": "string", "description": "Full URL to fetch"}
},
"required": ["url"]
}
},
{
"name": "summarize_text",
"description": (
"Summarize long text into key points (3-5 bullets). "
"Use after fetch_page to condense page content."
),
"input_schema": {
"type": "object",
"properties": {
"text": {"type": "string", "description": "Text to summarize"},
"max_points": {
"type": "integer",
"description": "Max bullet points (default 5)"
}
},
"required": ["text"]
}
},
{
"name": "format_citation",
"description": (
"Format a source as an academic citation. Use after "
"summaries are ready to create proper references."
),
"input_schema": {
"type": "object",
"properties": {
"title": {"type": "string", "description": "Article title"},
"url": {"type": "string", "description": "Source URL"},
"accessed_date": {"type": "string", "description": "e.g. '2025-01-15'"}
},
"required": ["title", "url"]
}
},
{
"name": "save_to_file",
"description": "Save content to a local file. Returns file path.",
"input_schema": {
"type": "object",
"properties": {
"filename": {"type": "string", "description": "Output filename"},
"content": {"type": "string", "description": "Content to save"}
},
"required": ["filename", "content"]
}
}
]
# āā Mock Implementations āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
def web_search(query: str) -> dict:
time.sleep(0.2)
return {"results": [
{"title": f"Result 1: {query}", "url": "https://example.com/1",
"snippet": f"Overview of {query}..."},
{"title": f"Result 2: {query}", "url": "https://example.com/2",
"snippet": f"Developments in {query}..."},
{"title": f"Result 3: {query}", "url": "https://broken.example.com/404",
"snippet": f"Deep dive into {query}..."},
]}
def fetch_page(url: str) -> dict:
time.sleep(0.3)
if "broken" in url or "404" in url:
raise ConnectionError(f"404 Not Found: {url}")
return {"content": f"Full page content from {url}. " * 20}
def summarize_text(text: str, max_points: int = 5) -> dict:
return {"summary": [f"Key point {i+1}" for i in range(min(max_points, 5))]}
def format_citation(title: str, url: str, accessed_date: str = None) -> dict:
date = accessed_date or "2025-01-15"
return {"citation": f'"{title}." Available at: {url}. Accessed: {date}.'}
def save_to_file(filename: str, content: str) -> dict:
return {"status": "saved", "path": f"/output/{filename}", "bytes": len(content)}
# āā Dispatcher with Error Handling āāāāāāāāāāāāāāāāāāāāāāāāāāā
tool_functions = {
"web_search": web_search, "fetch_page": fetch_page,
"summarize_text": summarize_text, "format_citation": format_citation,
"save_to_file": save_to_file,
}
def execute_tool(name: str, inputs: dict) -> tuple[str, bool]:
"""Execute a tool, returning (result_json, is_error)."""
func = tool_functions.get(name)
if not func:
return json.dumps({"error": f"Unknown tool: {name}"}), True
try:
result = func(**inputs)
return json.dumps(result), False
except Exception as e:
return json.dumps({"error": str(e), "tool": name}), True
# āā ToolRegistry āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
class ToolRegistry:
def __init__(self):
self._tools: dict[str, dict] = {}
self._tags: dict[str, set[str]] = {}
def register(self, tool: dict, tags: list[str] = None):
self._tools[tool["name"]] = tool
self._tags[tool["name"]] = set(tags or [])
def unregister(self, name: str):
self._tools.pop(name, None)
self._tags.pop(name, None)
def get_tools_for_context(self, tags: list[str] = None, names: list[str] = None) -> list[dict]:
if names:
return [self._tools[n] for n in names if n in self._tools]
if tags:
tag_set = set(tags)
return [self._tools[n] for n, t in self._tags.items() if t & tag_set]
return list(self._tools.values())
# Register tools with tags
registry = ToolRegistry()
registry.register(tools[0], tags=["research", "search"])
registry.register(tools[1], tags=["research", "fetch"])
registry.register(tools[2], tags=["research", "analysis"])
registry.register(tools[3], tags=["citation"])
registry.register(tools[4], tags=["output"])
print("ā Tools, dispatcher, and registry ready.")
print(f" All tools: {[t['name'] for t in registry.get_tools_for_context()]}")
print(f" Research only: {[t['name'] for t in registry.get_tools_for_context(tags=['research'])]}")
Run it:
ModuleNotFoundError: No module named 'anthropic'→ Runpip install anthropicTypeError: 'type' object is not subscriptable→ You need Python 3.9+ fordict[str, dict],list[dict],set[str]annotations and 3.9+ fortuple[str, bool]. On older versions, addfrom __future__ import annotationsat the top of the file, or usetyping.Dict,typing.List,typing.Set,typing.Tuple.- Tag filter returns nothing unexpectedly → Tag names are case-sensitive.
"Research"and"research"are different. The registry uses set intersection (t & tag_set) — if your tag list and registered tags don't match exactly, you'll get an empty result.
Step 2: Add the Agentic Loop with Parallel Execution
What: This step adds the orchestration engine — the loop that sends messages to Claude, executes tool calls (in parallel when multiple are returned), feeds results back, and repeats until Claude is done.
Why: Without this loop, you can only make one-shot API calls. The agentic loop is what turns a chatbot into an agent — it lets Claude chain together multiple tools across multiple iterations to complete complex tasks. This step uses the tools and dispatcher from Step 1.
Add the following to the bottom of multi_tool_agent.py (after the registry setup):
# āā Agentic Loop with Parallel Execution āāāāāāāāāāāāāāāāāāāāā
def run_agent(question: str, tool_tags: list[str] = None, verbose: bool = True) -> str:
"""Run the multi-tool agent. Optionally filter tools by tag."""
if tool_tags:
active_tools = registry.get_tools_for_context(tags=tool_tags)
else:
active_tools = registry.get_tools_for_context()
if verbose:
print(f"\n{'='*60}")
print(f"Question: {question}")
print(f"Active tools: {[t['name'] for t in active_tools]}")
print(f"{'='*60}")
messages = [{"role": "user", "content": question}]
max_iterations = 10
for iteration in range(max_iterations):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
system=(
"You are a research assistant. When asked to compare or "
"research multiple topics, search for each one in parallel. "
"When asked to fetch and summarize a page, do it sequentially."
),
tools=active_tools,
messages=messages,
)
tool_uses = [b for b in response.content if b.type == "tool_use"]
if response.stop_reason == "end_turn" or not tool_uses:
final_text = "\n".join(
b.text for b in response.content if b.type == "text"
)
if verbose:
print(f"\nā Agent finished in {iteration + 1} iteration(s)")
return final_text
# Show what Claude requested
if verbose:
mode = "PARALLEL" if len(tool_uses) > 1 else "SEQUENTIAL"
print(f"\n Iteration {iteration + 1} [{mode}]:")
for tu in tool_uses:
print(f" ā {tu.name}({json.dumps(tu.input)[:80]}...)")
# Execute tools ā parallel when multiple
if len(tool_uses) > 1:
tool_results = []
with ThreadPoolExecutor(max_workers=len(tool_uses)) as pool:
futures = {
pool.submit(execute_tool, tu.name, tu.input): tu.id
for tu in tool_uses
}
for future in as_completed(futures):
tid = futures[future]
result_json, is_err = future.result()
if verbose and is_err:
print(f" ā Error for {tid}: {result_json[:60]}")
tool_results.append({
"type": "tool_result",
"tool_use_id": tid,
"content": result_json,
**({"is_error": True} if is_err else {}),
})
else:
tu = tool_uses[0]
result_json, is_err = execute_tool(tu.name, tu.input)
if verbose and is_err:
print(f" ā Error: {result_json[:60]}")
tool_results = [{
"type": "tool_result",
"tool_use_id": tu.id,
"content": result_json,
**({"is_error": True} if is_err else {}),
}]
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
return "Max iterations reached."
# āā Test Scenarios āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
if __name__ == "__main__":
# Test 1: Parallel search (Claude should call web_search multiple times)
print("\n" + "ā¶ TEST 1: PARALLEL SEARCH ".ljust(60, "ā"))
result1 = run_agent(
"Search for information about these 3 topics: AI agents, "
"prompt engineering, and tool use patterns.",
tool_tags=["research"]
)
print(f"\nResult preview: {result1[:200]}...")
# Test 2: Sequential chain (search ā fetch ā summarize)
print("\n" + "ā¶ TEST 2: SEQUENTIAL CHAIN ".ljust(60, "ā"))
result2 = run_agent(
"Search for 'Claude AI tool use', then fetch the first "
"result page and summarize its content.",
tool_tags=["research"]
)
print(f"\nResult preview: {result2[:200]}...")
# Test 3: Error recovery (fetch_page will 404 on broken URL)
print("\n" + "ā¶ TEST 3: ERROR RECOVERY ".ljust(60, "ā"))
result3 = run_agent(
"Fetch and summarize this page: https://broken.example.com/404",
tool_tags=["research"]
)
print(f"\nResult preview: {result3[:200]}...")
# Test 4: Dynamic tool filtering (citation tools only)
print("\n" + "ā¶ TEST 4: DYNAMIC TOOL FILTERING ".ljust(60, "ā"))
result4 = run_agent(
"Format a citation for an article titled 'Multi-Tool AI Agents' "
"from https://example.com/agents, accessed today.",
tool_tags=["citation"]
)
print(f"\nResult preview: {result4[:200]}...")
Run the full agent:
Look for these key behaviors in your output:
- Test 1: Should show
[PARALLEL]with 3web_searchcalls in one iteration - Test 2: Should show
[SEQUENTIAL]across 3–4 iterations, each building on the previous result - Test 3: Should show
ā Errorfollowed by Claude adapting (trying a different approach or informing the user) - Test 4: Should show
Active tools: ['format_citation']— only 1 tool instead of 5
- Agent runs forever / hits max iterations → The
max_iterations = 10safety limit will stop it. If Claude keeps calling tools without converging, make the prompt more specific. AuthenticationError→ Check yourANTHROPIC_API_KEYis set correctly. Runecho $ANTHROPIC_API_KEYto verify.- Test 1 shows SEQUENTIAL instead of PARALLEL → Claude doesn't always parallelize. Try rephrasing: "Search for these 3 topics simultaneously: ..." The system prompt also encourages parallel behavior.
APIError: 529(overloaded) → Wait 30 seconds and try again. Consider running tests one at a time by commenting out the others.
Verify Everything Works
Run the complete file end-to-end. All 4 tests should complete without crashing, demonstrating parallel execution, sequential chaining, error recovery, and dynamic tool filtering:
If all 4 tests complete and you see the ā Agent finished message for each, you've successfully built a multi-tool orchestration agent with parallel execution, error handling, and dynamic tool registration.
You've built a production-pattern multi-tool agent! You can extend this by swapping mock implementations for real APIs (e.g., use the requests library in fetch_page), adding a circuit breaker counter that disables tools after 3 consecutive failures, or implementing execution timing to compare parallel vs sequential wall-clock times.
- Add execution timing to each tool call and print a trace waterfall showing parallel vs sequential sections
- Implement a circuit breaker class that disables a tool after 3 consecutive failures
- Add a cost tracker that estimates token usage per iteration based on message length
Knowledge Check
Q1: Given three tool calls where B needs A's result, but C is independent of both, what's the optimal execution strategy?
Q2: Rank these tool descriptions from LEAST to MOST effective: (1) "queries data" | (2) "Run a SQL query against the users DB. Returns matching rows. Use when asking about user accounts." | (3) "database tool"
Q3: A tool fails with a network timeout. What's the BEST way to report this to Claude? (Recall from M05: Claude doesn't execute tools — you do.)
Q4: You have 20 tools averaging 400 input tokens each. Filtering to 5 per request saves approximately how many tokens?
Q5: In a 3-tool sequential chain (search → fetch → summarize), how many API round trips before Claude's final text response?
Q6: Claude returns 3 tool_use blocks in one response. How should you return the results? (Recall from M05: the tool_use_id links each result to its request.)
Module Summary
Key Concepts Recap
- Parallel tool calls: Multiple independent tools in one response. Execute concurrently, return all results in one message.
- Sequential chains: Output of one feeds the next. Each step is a full API round trip via the agentic loop.
- Tool selection: Description quality directly determines accuracy. Include what, when, and returns.
- Dynamic registration: Filter tools by context to save tokens, improve accuracy, enforce least privilege.
- Error handling: Return
is_error: true. Let Claude adapt. Use circuit breakers for persistent failures.
Next: M07 — Model Context Protocol (MCP)
You've been defining tools manually in code. MCP standardizes how tools are discovered, described, and invoked across any client and server. You'll connect to external MCP servers and expose your own tools as MCP services.