← M04: Structured Output 🏠 Home M06: Multi-Tool Orchestration →

M05: Function Calling Fundamentals

This is the pivotal moment in the course. Claude goes from chatbot to agent. For the first time, Claude will do things — not just generate text. You'll build your first tool-using agent.

Learning Objectives

Explain how function calling works: Claude proposes, your code executes
Define tools with JSON Schema input parameters and clear descriptions
Implement the complete tool use loop: send → detect tool_use → execute → return result
Handle tool errors gracefully without crashing the agent loop
Build a multi-tool agent that chooses the right tool based on the user's question

What Is Tool Use / Function Calling?

Everyday Analogy

BEFORE: Imagine a super-smart assistant who can write beautifully, reason through complex problems, and hold a conversation — but is locked in a windowless room with no phone, no internet, and can't see the clock. Every answer about the outside world is a guess based on what they already know.

PAIN: When someone asks "What's the weather in Tokyo right now?", the assistant has no choice but to make something up or say "I don't know." They can't check a weather API, query a database, or look anything up. Their intelligence is trapped — brilliant but blind to the real world.

MAPPING: Function calling hands the assistant a set of labeled buttons — get_weather(), calculate(), ask_time() — each one connected to the outside world. Now when someone asks about Tokyo's weather, the assistant presses the get_weather button with "Tokyo" as input, reads the result that comes back, and gives a grounded, accurate answer. That's function calling: Claude gains the ability to reach outside its context window and interact with real systems, while you stay in control of what those buttons actually do.

Technical Definition

When you call the Messages API, you can include a tools parameter — an array of tool definitions that describe what functions Claude is allowed to request.

Each definition has three parts. First, a name — this is how your code identifies which function was called. Second, a description — the plain-English explanation Claude reads to decide when this tool is useful. Third, an input_schema — a JSON Schema object that specifies what arguments the function accepts, their types, and which ones are required. Of these three, the description matters most — it's what Claude uses to decide which tool fits the user's request.

Here's the key idea: Claude never runs your code. When Claude decides it needs a tool, it stops generating text and instead returns a tool_use content block. In plain terms, it's a structured JSON object that says "I want to call get_weather with {city: "Tokyo"}." Your application code reads that request, runs the actual function on your server, and sends the output back as a tool_result message. Claude then uses that result to write its final answer.

Why this design? Because it keeps you in control. Claude can suggest calling any tool you've defined, but your code is the gatekeeper — you validate inputs, enforce rate limits, check permissions, and decide whether to actually execute. This separation of "deciding what to do" (Claude) from "actually doing it" (your code) is what makes tool use safe for production systems.

Why It Matters

Without tool use, Claude is limited to what it learned during training — which has a knowledge cutoff date and zero access to your private data. With tool use, Claude can query your production database, check a live inventory API, send an email, or trigger a deployment. A customer support agent with 5 tools (order lookup, refund processor, FAQ search, ticket creator, escalation handler) can resolve 70-80% of tickets autonomously. That's the difference between a chatbot that says "I'd recommend checking your order status" and an agent that says "Your order #4829 shipped yesterday via FedEx and arrives Thursday."

So what does this look like in practice? Here's the actual JSON that Claude returns when it wants to call a tool:

{ "type": "tool_use", "id": "toolu_01A09q90qw90lq917835lq9", "name": "get_weather", "input": { "city": "Tokyo" } }

That's it — a content block inside Claude's response with type: "tool_use". Let's break it down field by field. The id is a unique identifier you'll need to reference when sending results back (think of it as a receipt number for this specific tool call). The name tells you which function to call. The input object contains the arguments Claude wants to pass, matching the JSON Schema you defined.

And here's what the tool_result looks like when you send data back to Claude:

{ "role": "user", "content": [{ "type": "tool_result", "tool_use_id": "toolu_01A09q90qw90lq917835lq9", "content": "{\"temp\": 22, \"condition\": \"sunny\", \"humidity\": 65}" }] }

Notice the tool_use_id matches the id from Claude's request above — that's how Claude knows which tool call this result belongs to. The result goes inside a "user" message because, from Claude's perspective, you (the application) are providing new information. No magic — just structured data flowing back and forth.

Security: Critical Insight Claude does NOT run tools. Claude ASKS to run tools. You execute them. This is critical for security — you control what code runs, with what permissions, and in what environment. Never let Claude construct arbitrary code or commands to execute.

⚠️ Common Misconceptions

"Claude executes the tool, right?" — No. This is the single most important misconception to clear up. Claude never executes anything. It returns a JSON object that says "I'd like to call this function with these arguments." Your code reads that request, decides whether to honor it, runs the actual function, and sends back the result. Claude is the decision-maker; your application is the executor.

"Tools are like ChatGPT plugins?" — Not quite. Plugins run on someone else's infrastructure with a fixed interface you can't customize. With Claude's function calling, YOU define the tools, YOU write the implementation, and YOU host the execution. There's no plugin marketplace or approval process — any function you can write is a potential tool.

"More tools = more capable agent?" — Actually, the opposite is often true. Claude's tool selection accuracy starts to degrade when you offer more than 5-6 tools at once. Each additional tool means more descriptions for Claude to evaluate, more chances for ambiguity, and slower response times. Module 6 covers strategies for managing many tools effectively.

"The tool description is just a label — Claude understands from the name" — Names help, but Claude heavily relies on the description to decide when and how to use a tool. A tool named search with description "Search" gives Claude almost nothing to work with. Does it search the web? A database? Files? The description is where you tell Claude what data source the tool hits, what format the output comes in, and when this tool is appropriate versus other options.

"If the tool fails, the whole agent crashes" — Only if you let it. Best practice is to catch exceptions and return the error as the tool_result content. Claude can then tell the user what went wrong or try an alternative approach. The agent loop keeps running — only an unhandled exception in YOUR code kills it.

Animation: The Tool Use Loop

1User sends message"What's the weather in Tokyo?"The journey begins here

↓

2Claude receives message + tool definitionstools: [get_weather, calculate]Claude sees your tools array

↓

3Claude returns tool_use blockget_weather({city: "Tokyo"})stop_reason: "tool_use"

↓

4YOUR CODE executes the functionresult: {temp: 22, condition: "sunny"}You control execution!

↓

5You send tool_result back to Claudetool_result: {temp: 22, ...}Include the tool_use_id

↓

6Claude formulates final response"It's 22°C and sunny in Tokyo!"stop_reason: "end_turn"

            ■ User
            ■ Claude
            ■ Your Code
          

Diagram: The Tool Use Loop — Circular Flow

Defining Tools

A tool definition is a JSON object you pass to the API that tells Claude "here's a function you can ask me to call." It has three parts: name, description, and input_schema.

Under the hood, Claude processes tool definitions as part of the system context. When a user sends a message, Claude reads all available tool descriptions, evaluates which one (if any) matches the user's intent, and either calls the best match or responds directly without tools. This is fundamentally different from keyword matching — Claude understands semantics. A user saying "how hot is it in Tokyo?" will match a tool described as "Get current weather for a city" even though the word "weather" never appears in the query.

If you've used APIs before, think of tool definitions as similar to API documentation — but instead of a human developer reading the docs, Claude reads them. The input_schema uses JSON Schema — a standard for describing the shape of JSON data. You may have seen it in OpenAPI or Swagger specs. Here's one key difference from M04: in structured output, the schema described what came OUT of Claude. Here, the schema describes what goes IN to your function. And of the three parts (name, description, schema), the description is by far the most important — Claude picks tools based on descriptions, not names.

Animation: Claude's Tool Selection

User: "What's the weather in Tokyo?"

🅞

get_weather

Get current weather for a city

🧮

calculate

Evaluate a math expression

🕓

get_time

Get time in a timezone

🔍

search_db

Search a database by query

Common Mistake: Lazy Descriptions

Here's a trap almost every beginner falls into: you name your tool search, set the description to "Search", and wonder why Claude picks the wrong tool. The problem is that Claude relies heavily on the description to decide when and how to use a tool. A one-word description gives Claude almost nothing to work with. Does it search the web? A database? Files on disk?

A great description includes four things: (1) what the tool does, (2) what data source it queries, (3) what format the output comes in, and (4) when this tool is appropriate versus other options. Spend more time writing descriptions than writing the tool implementation — descriptions are the #1 lever for tool selection accuracy.

🎓 Cert Tip — Domain 2.1

Tool descriptions are critical for selection accuracy. Include: what the tool does, input format with examples, expected output format, and edge cases. Poor descriptions = Claude picks the wrong tool.

Domain Connection: UCC Pipeline

In the UCC filing pipeline (our capstone domain), you'd define tools like: search_filings(debtor_name) to query the Gold layer in BigQuery, check_entity_risk(entity_id) to pull risk profiles, and get_amendment_history(filing_id) to trace changes over time. Each tool maps to a specific data source, and the descriptions would tell Claude exactly when to use each one — e.g., "Search UCC filings by debtor name. Returns filing number, secured party, collateral description, and filing date. Use when the user asks about a company's liens or secured debts."

Example Tool Definition

Diagram: Tool Definition Anatomy

{ "name": "get_weather", "description": "Get the current weather for a specific city. Returns temperature in Celsius, conditions, and humidity.", "input_schema": { "type": "object", "properties": { "city": { "type": "string", "description": "City name, e.g. 'Tokyo' or 'New York'" } }, "required": ["city"] } }

The Tool Use Loop in Code

The tool use loop is the core pattern that turns Claude from a chatbot into an agent. Here's the idea in plain English: you send Claude a message, Claude thinks about it, and sometimes Claude decides it needs more information before answering. When that happens, Claude pauses and says "I'd like to call this tool." Your code runs the tool, sends back the result, and Claude picks up where it left off. This back-and-forth continues until Claude has everything it needs to give a final answer.

In code, this translates to a while loop. Each time Claude responds, you check a field called stop_reason. If stop_reason is "tool_use", Claude is waiting for a tool result — so you execute the tool, append the result to the conversation, and call the API again. If stop_reason is "end_turn", Claude is finished — you extract the text and return it to the user.

This is different from a simple request/response pattern. In a normal API call, you send one message and get one reply. With the tool use loop, a single user question might trigger 2, 3, or even 10 API round-trips as Claude calls different tools to gather information. Each round-trip adds to the conversation history, so Claude always has the full context of what it's already tried. The loop only exits when Claude decides it has enough information — signaled by "end_turn".

Animation: The While Loop Pattern

Iteration: waiting to start...

↻while stop_reason == "tool_use":The core agent pattern

↓

1Extract tool name + argumentsget_weather({city:"Tokyo"})

↓

2Execute tool functionweather_api.fetch("Tokyo")

↓

3Append tool_result to messages{temp:22, condition:"sunny"}

↓

4Send messages to Claude againstop_reason: "tool_use"Still tool_use — loop again!

↻ Loop back — Claude requested another tool!

⚠️ Common Misconceptions: The Loop

"The loop runs forever if Claude keeps calling tools" — In theory, yes. In practice, Claude almost always converges within 2-5 iterations. But you should always add a safety cap (e.g., max_iterations = 10) so a misbehaving prompt can't burn through your API budget. Module 12 covers this in depth.

"I need to parse Claude's text to know when it's done" — Never do this. The stop_reason field is the only reliable signal. Don't scan Claude's response for phrases like "I'm finished" or "here's your answer" — those are unreliable and fragile.

"Each loop iteration is a new conversation" — No. Each iteration adds to the SAME messages array. Claude sees the full history — including every tool call and result from previous iterations — which is how it knows what it's already tried and what information it still needs.

🎓 Cert Tip — Domain 2.2

Never silently return empty results for access failures. {isError: false, results: []} means "nothing found." {isError: true, errorCategory: "access_denied"} means "couldn't even check." Claude makes catastrophically different decisions based on which one you return.

🚀 Looking Ahead: Pre-Built Tools (Computer Use, Bash, Text Editor)

Everything in this module covered custom tools — functions you define and execute. Anthropic also ships a small set of pre-built tools that the API knows how to use natively, with no schema for you to write:

Computer Use (computer_20250124) — Claude takes screenshots, clicks, types, and scrolls a real desktop. Used for browser automation, form filling, QA testing, and accessibility flows that have no API.
Bash (bash_20250124) — Claude runs shell commands. Common in dev-environment agents (build, test, grep, install).
Text Editor (text_editor_20250429) — Claude reads, writes, and edits files with an Anthropic-defined schema. Used in coding agents and Claude Code itself.

These work like the custom tools you just wrote — same tool_use / tool_result loop — but the schema is fixed by Anthropic, and you provide the execution sandbox (a Docker VM with a virtual display, a shell, a workspace directory). M24 covers Computer Use in depth, including the sandboxing requirements that keep these tools safe to deploy.

Error Handling & Edge Cases

When tools fail — and they will — the worst thing you can do is crash the agent loop. Instead, the pattern is straightforward: catch the error, describe it in plain language, and send that description back as the tool_result. Claude then has enough information to inform the user, try a different approach, or gracefully give up.

Here's how this works internally. When your code catches an exception inside a tool function, you have two choices. You could let the exception propagate, which kills the while loop and likely returns an ugly stack trace to the user. Or you could wrap the call in a try/except, format the error into a JSON object like {"error": "City not found: Tokyoo"}, and return it as the tool_result. Claude reads that result, realizes something went wrong, and responds accordingly — often saying "I couldn't find weather data for 'Tokyoo'. Did you mean Tokyo?" That's a much better user experience.

This approach differs from traditional error handling because the "caller" (Claude) isn't your code — it's a language model that can reason about failures. In normal programming, you handle errors with if/else branches. With tool use, you report errors as data, and Claude decides what to do. That means your error messages need to be descriptive enough for Claude to make a good decision.

There are four main failure modes to handle:

Tool timeout: Set a time limit (e.g., 10 seconds) for each tool call. If it exceeds the limit, return a timeout message as the tool_result so Claude can tell the user to try again.
Tool failure: Return a human-readable error description, not a raw stack trace. Claude needs to understand what went wrong, not debug your Python code.
Invalid arguments: Validate inputs before executing. If Claude sends "city": 123 instead of a string, return a helpful error message explaining what format you expected.
Non-existent tool: This is rare (Claude usually sticks to defined tools), but handle it with a fallback that returns {"error": "Unknown tool: tool_name"}. This prevents a crash if Claude hallucinates a tool name.

Security Warning NEVER let Claude construct arbitrary code or shell commands to execute. Tools are YOUR pre-defined functions with controlled inputs. This is the security model that makes agent tool use safe.

🎓 Cert Tip — Domain 2.2

Return structured errors from tools: {isError: true, errorCategory: "auth_failure", isRetryable: true, context: "Token expired"}. Anti-pattern: generic "Operation failed" — Claude can't decide to retry, try alternatives, or escalate.

Code Walkthrough: Multi-Tool Agent

Conceptual Bridge: You've seen the tool use loop as a diagram — user sends message, Claude returns a tool_use block, your code executes, you send the result back, and Claude responds. Now it's time to translate that six-step loop into real, runnable code. The code below implements everything you just learned: tool definitions with JSON Schema, the while loop that keeps running until stop_reason is "end_turn", a dispatcher that routes tool requests to actual functions, and error handling at every step. Read through the chunk annotations before each section to understand what each part does and why.

Complete Tool-Using Agent

This agent has 5 logical sections. Read the annotations to understand each chunk before looking at the code.

Chunk 1 — Setup & Tool Definitions
Let's start with the foundation. The first thing the code does is initialize the Anthropic client and define two tools: get_weather and calculate. Each tool gets a name, a description, and a JSON Schema specifying what arguments it accepts. The interesting part is the description field — this is literally what Claude reads to decide which tool fits the user's request. Think of it as a menu item description at a restaurant: vague descriptions like "Search" are like labeling a dish "Food" — nobody knows what they're getting, and Claude will pick the wrong tool.

Chunk 2 — Mock Tool Implementations
Next up, the actual functions that run when Claude requests a tool. Right now these are mocks — get_weather returns hardcoded data for a few cities, and calculate evaluates math expressions. In production, you'd replace these with real API calls or database queries. The point of using mocks is that they let you develop and test the entire agent loop without worrying about API keys, network errors, or rate limits. Here's the dilemma with calculate: we need to evaluate a math expression, but Python's eval() can execute arbitrary code if you're not careful. That's why the function validates that the input contains only math characters (0-9 + - * / . ( )) before evaluating. Never skip this step — unsanitized eval is a textbook security vulnerability.

Chunk 3 — The Agent Loop (while True)
This is the heart of the agent — the while True loop that keeps the conversation going until Claude is done. Here's how each iteration works: send the current messages to Claude, check the stop_reason in the response. If it's "tool_use", Claude wants to call a tool, so you execute it and loop again. If it's "end_turn", Claude is finished and you return the final text. The key gotcha that trips up most beginners: you must append both Claude's assistant message AND the tool_result user message to the conversation history before the next API call. Forget either one and the API returns an error — Claude needs to see the full conversation to know where it left off.

Chunk 4 — Tool Dispatch & Error Handling
Inside the loop, the code needs to figure out which function to actually call. That's what the dispatcher dictionary (tool_functions) does — it maps tool names to Python functions, so "get_weather" routes to get_weather(). The clever bit is what happens when Claude requests a tool that doesn't exist in the dictionary: instead of crashing with a KeyError, the code returns an error message as the tool_result. Claude reads that error and can inform the user or try something else. This is a recurring pattern you'll see in every agent module going forward: errors are data, not crashes.

Chunk 5 — Test Cases
Finally, the code runs three test queries that exercise every path through the agent. The first ("What's the weather in Tokyo?") requires the weather tool. The second ("What's 15% of 850?") requires the calculator. The third ("What's the capital of France?") needs no tool at all — Claude knows the answer from training data and responds directly. Testing all three paths is important because it proves your agent handles tool selection, tool execution, AND the no-tool case correctly. If any of these three fails, you've got a bug in your loop logic.

# pip install "anthropic>=0.39.0"
import anthropic
import json

client = anthropic.Anthropic()

# Define tools
tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a city. Returns temp (Celsius), condition, humidity.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name, e.g. 'Tokyo'"}
            },
            "required": ["city"]
        }
    },
    {
        "name": "calculate",
        "description": "Evaluate a mathematical expression. Use for any math computation.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "Math expression, e.g. '15 * 3 + 7'"}
            },
            "required": ["expression"]
        }
    }
]

# Mock tool implementations
def get_weather(city: str) -> dict:
    """Mock weather API."""
    data = {"Tokyo": {"temp": 22, "condition": "sunny", "humidity": 65},
            "London": {"temp": 14, "condition": "cloudy", "humidity": 80},
            "New York": {"temp": 28, "condition": "partly cloudy", "humidity": 70}}
    return data.get(city, {"error": f"No data for {city}"})

def calculate(expression: str) -> dict:
    """Safe math evaluator."""
    try:
        # Only allow safe math operations
        allowed = set("0123456789+-*/.() ")
        if not all(c in allowed for c in expression):
            return {"error": "Invalid characters in expression"}
        result = eval(expression)  # Safe: only math chars allowed
        return {"result": result}
    except Exception as e:
        return {"error": str(e)}

# Tool dispatcher
tool_functions = {"get_weather": get_weather, "calculate": calculate}

def run_agent(user_message: str) -> str:
    """Run the agent loop until Claude produces a final response."""
    messages = [{"role": "user", "content": user_message}]

    while True:
        try:
            response = client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                tools=tools,
                messages=messages,
            )
        except anthropic.APIError as e:
            return f"API error: {e.status_code} - {e.message}"

        # Check if Claude wants to use a tool
        if response.stop_reason == "tool_use":
            # Process each tool_use block
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    func = tool_functions.get(block.name)
                    if func:
                        result = func(**block.input)
                    else:
                        result = {"error": f"Unknown tool: {block.name}"}

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": json.dumps(result),
                    })

            # Add Claude's response and tool results to conversation
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})

        else:
            # stop_reason == "end_turn" — Claude is done
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
            return "No text response from Claude."

# Test it!
print(run_agent("What's the weather in Tokyo?"))
print()
print(run_agent("What's 15% of 850?"))
print()
print(run_agent("What's the capital of France?"))  # No tool needed

// npm install @anthropic-ai/sdk
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const tools = [
  {
    name: 'get_weather',
    description: 'Get current weather for a city. Returns temp (Celsius), condition, humidity.',
    input_schema: {
      type: 'object',
      properties: {
        city: { type: 'string', description: "City name, e.g. 'Tokyo'" }
      },
      required: ['city']
    }
  },
  {
    name: 'calculate',
    description: 'Evaluate a mathematical expression. Use for any math computation.',
    input_schema: {
      type: 'object',
      properties: {
        expression: { type: 'string', description: "Math expression, e.g. '15 * 3 + 7'" }
      },
      required: ['expression']
    }
  }
];

// Mock tool implementations
function getWeather(city) {
  const data = {
    Tokyo: { temp: 22, condition: 'sunny', humidity: 65 },
    London: { temp: 14, condition: 'cloudy', humidity: 80 },
    'New York': { temp: 28, condition: 'partly cloudy', humidity: 70 },
  };
  return data[city] || { error: `No data for ${city}` };
}

function calculate(expression) {
  try {
    const allowed = /^[0-9+\-*/.() ]+$/;
    if (!allowed.test(expression)) return { error: 'Invalid characters' };
    const result = Function(`"use strict"; return (${expression})`)();
    return { result };
  } catch (e) { return { error: e.message }; }
}

const toolFunctions = { get_weather: getWeather, calculate };

async function runAgent(userMessage) {
  const messages = [{ role: 'user', content: userMessage }];

  while (true) {
    let response;
    try {
      response = await client.messages.create({
        model: 'claude-sonnet-4-6',
        max_tokens: 1024,
        tools,
        messages,
      });
    } catch (error) {
      if (error instanceof Anthropic.APIError) {
        return `API error: ${error.status} - ${error.message}`;
      }
      throw error;
    }

    if (response.stop_reason === 'tool_use') {
      const toolResults = [];
      for (const block of response.content) {
        if (block.type === 'tool_use') {
          const func = toolFunctions[block.name];
          const result = func
            ? func(...Object.values(block.input))
            : { error: `Unknown tool: ${block.name}` };

          toolResults.push({
            type: 'tool_result',
            tool_use_id: block.id,
            content: JSON.stringify(result),
          });
        }
      }
      messages.push({ role: 'assistant', content: response.content });
      messages.push({ role: 'user', content: toolResults });
    } else {
      const textBlock = response.content.find(b => b.type === 'text');
      return textBlock?.text || 'No text response from Claude.';
    }
  }
}

console.log(await runAgent("What's the weather in Tokyo?"));
console.log();
console.log(await runAgent("What's 15% of 850?"));
console.log();
console.log(await runAgent("What's the capital of France?"));

Expected Output:

The weather in Tokyo is currently 22°C and sunny with 65% humidity. 15% of 850 is 127.5. The capital of France is Paris.

What Just Happened? You just built a complete tool-using AI agent. Here's what the code does end-to-end: it defines two tools with JSON Schema descriptions, implements mock functions for each, and runs a while True loop that keeps calling the Messages API until Claude stops requesting tools. For the first query ("weather in Tokyo"), Claude recognized it needed get_weather, returned a tool_use block, your code executed the mock function, sent back the result, and Claude composed a natural-language answer from the data. For the second query, Claude picked the calculate tool instead. For the third query ("capital of France"), Claude knew the answer from its training data and responded directly without calling any tool — the stop_reason was "end_turn" on the first response, so the loop exited immediately. This is the foundational agent pattern that every module from here forward builds upon.

Hands-On Exercise

Conceptual Bridge: You've seen how tool definitions work, walked through the while loop pattern, and studied a complete multi-tool agent. Now it's time to build one yourself from scratch. The exercise below follows the same pattern — define tools, implement mock functions, wire up the loop — but adds a third tool (get_time) so you practice the full flow with more variety.

What You'll Build

A 3-tool agent with get_weather, calculate, and get_time tools that handles tool calls, no-tool questions, and error cases — all with the while stop_reason == "tool_use" loop pattern.

Time estimate: 25–35 minutes • Prerequisites: M01-M04 labs complete • Files you'll create: tool_agent.py (or tool_agent.mjs)

Environment Setup

mkdir m05-lab && cd m05-lab
python -m venv venv && source venv/bin/activate   # Windows: venv\Scripts\activate
pip install "anthropic>=0.39.0"
export ANTHROPIC_API_KEY="your-key-here"           # Windows: set ANTHROPIC_API_KEY=your-key-here

mkdir m05-lab && cd m05-lab
npm init -y && npm install @anthropic-ai/sdk
export ANTHROPIC_API_KEY="your-key-here"           # Windows: set ANTHROPIC_API_KEY=your-key-here

Step 1: Define Three Tools with Mock Implementations

This step creates the tool definitions (what Claude sees) and mock implementations (what your code runs). Using mock data means you can test the full loop pattern without needing real API keys for weather services or time APIs. The mock implementations return realistic but fixed data.

Create a new file called tool_agent.py (or tool_agent.mjs):

import anthropic
import json

client = anthropic.Anthropic()

# --- Tool definitions (what Claude sees) ---
TOOLS = [
    {
        "name": "get_weather",
        "description": "Get current weather for a city. Returns temperature (Celsius), condition, and humidity.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name, e.g. 'Tokyo' or 'New York'"}
            },
            "required": ["city"],
        },
    },
    {
        "name": "calculate",
        "description": "Evaluate a mathematical expression. Supports +, -, *, /, **, parentheses.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "Math expression, e.g. '(15 * 7) + 23'"}
            },
            "required": ["expression"],
        },
    },
    {
        "name": "get_time",
        "description": "Get the current time in a specific timezone. Returns time in HH:MM format.",
        "input_schema": {
            "type": "object",
            "properties": {
                "timezone": {"type": "string", "description": "Timezone, e.g. 'US/Eastern', 'Asia/Tokyo', 'Europe/London'"}
            },
            "required": ["timezone"],
        },
    },
]

# --- Mock implementations (what your code runs) ---
MOCK_WEATHER = {
    "tokyo": {"temp": 22, "condition": "sunny", "humidity": 45},
    "london": {"temp": 14, "condition": "cloudy", "humidity": 72},
    "new york": {"temp": 28, "condition": "partly cloudy", "humidity": 60},
}

def run_tool(name: str, args: dict) -> str:
    """Execute a tool and return the result as a JSON string."""
    if name == "get_weather":
        city = args.get("city", "").lower()
        data = MOCK_WEATHER.get(city)
        if data:
            return json.dumps(data)
        return json.dumps({"error": f"City '{args.get('city')}' not found. Available: Tokyo, London, New York"})

    elif name == "calculate":
        expr = args.get("expression", "")
        try:
            # Safe eval: only allow math operations
            allowed = set("0123456789+-*/.()**% ")
            if not all(c in allowed for c in expr):
                return json.dumps({"error": f"Invalid characters in expression: {expr}"})
            result = eval(expr)  # Safe because we validated characters
            return json.dumps({"result": result})
        except Exception as e:
            return json.dumps({"error": f"Calculation failed: {str(e)}"})

    elif name == "get_time":
        tz = args.get("timezone", "")
        # Mock: return a fixed time
        times = {"us/eastern": "14:30", "asia/tokyo": "03:30", "europe/london": "19:30"}
        time_str = times.get(tz.lower(), "12:00")
        return json.dumps({"timezone": tz, "time": time_str})

    return json.dumps({"error": f"Unknown tool: {name}"})

print(f"Defined {len(TOOLS)} tools: {[t['name'] for t in TOOLS]}")
print(f"Test: {run_tool('get_weather', {'city': 'Tokyo'})}")

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const TOOLS = [
  {
    name: 'get_weather',
    description: 'Get current weather for a city. Returns temperature (Celsius), condition, and humidity.',
    input_schema: {
      type: 'object',
      properties: { city: { type: 'string', description: "City name, e.g. 'Tokyo'" } },
      required: ['city'],
    },
  },
  {
    name: 'calculate',
    description: 'Evaluate a mathematical expression. Supports +, -, *, /, **, parentheses.',
    input_schema: {
      type: 'object',
      properties: { expression: { type: 'string', description: "Math expression, e.g. '(15 * 7) + 23'" } },
      required: ['expression'],
    },
  },
  {
    name: 'get_time',
    description: 'Get the current time in a specific timezone.',
    input_schema: {
      type: 'object',
      properties: { timezone: { type: 'string', description: "e.g. 'US/Eastern', 'Asia/Tokyo'" } },
      required: ['timezone'],
    },
  },
];

const MOCK_WEATHER = {
  tokyo: { temp: 22, condition: 'sunny', humidity: 45 },
  london: { temp: 14, condition: 'cloudy', humidity: 72 },
  'new york': { temp: 28, condition: 'partly cloudy', humidity: 60 },
};

function runTool(name, args) {
  if (name === 'get_weather') {
    const data = MOCK_WEATHER[args.city?.toLowerCase()];
    return data ? JSON.stringify(data) : JSON.stringify({ error: `City '${args.city}' not found` });
  }
  if (name === 'calculate') {
    try {
      const result = Function(`"use strict"; return (${args.expression})`)();
      return JSON.stringify({ result });
    } catch (e) { return JSON.stringify({ error: `Calculation failed: ${e.message}` }); }
  }
  if (name === 'get_time') {
    const times = { 'us/eastern': '14:30', 'asia/tokyo': '03:30', 'europe/london': '19:30' };
    return JSON.stringify({ timezone: args.timezone, time: times[args.timezone?.toLowerCase()] || '12:00' });
  }
  return JSON.stringify({ error: `Unknown tool: ${name}` });
}

console.log(`Defined ${TOOLS.length} tools: ${TOOLS.map(t => t.name).join(', ')}`);
console.log(`Test: ${runTool('get_weather', { city: 'Tokyo' })}`);

Run it: python tool_agent.py (or node tool_agent.mjs)

Expected Output:

Defined 3 tools: ['get_weather', 'calculate', 'get_time'] Test: {"temp": 22, "condition": "sunny", "humidity": 45}

✅ Checkpoint: If you see 3 tools listed and the weather JSON for Tokyo, Step 1 is working. Your tools are defined and the mock implementations return valid JSON.

Troubleshooting Step 1

ModuleNotFoundError: No module named 'anthropic' — Run pip install anthropic>=0.30.0. Make sure your virtual environment is activated.
SyntaxError in the TOOLS list — Check for missing commas between tool definitions. Each tool dictionary in the list needs a trailing comma.
Output shows {"error": "City 'Tokyo' not found"} — The mock lookup uses .lower(). Make sure the MOCK_WEATHER keys are lowercase: "tokyo", not "Tokyo".

Step 2: Build the Agent Loop and Test Single-Tool Calls

Now let's wire up the while stop_reason == "tool_use" loop and test it with questions that each require exactly one tool. This is the core agent pattern you'll use for the rest of the course. This uses the TOOLS and run_tool() from Step 1.

Add the following to tool_agent.py:

def agent_chat(user_message: str) -> str:
    """Run the full agent loop: send message, handle tool calls, return final answer."""
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=TOOLS,
            messages=messages,
        )

        # If Claude is done, return the text
        if response.stop_reason == "end_turn":
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
            return "(no text response)"

        # If Claude wants to use a tool, execute it
        if response.stop_reason == "tool_use":
            # Append Claude's full response (including tool_use blocks)
            messages.append({"role": "assistant", "content": response.content})

            # Process each tool_use block
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    print(f"  🔧 Tool call: {block.name}({json.dumps(block.input)})")
                    result = run_tool(block.name, block.input)
                    print(f"  📦 Result: {result[:80]}")
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result,
                    })

            # Send all results back
            messages.append({"role": "user", "content": tool_results})
        else:
            return f"(unexpected stop_reason: {response.stop_reason})"

# Test with single-tool questions
test_questions = [
    "What's the weather like in Tokyo?",
    "What is (15 * 7) + 23?",
    "What time is it in London?",
    "What's the capital of France?",  # No tool needed!
]

for q in test_questions:
    print(f"\n{'='*50}")
    print(f"User: {q}")
    answer = agent_chat(q)
    print(f"Agent: {answer[:150]}")

async function agentChat(userMessage) {
  const messages = [{ role: 'user', content: userMessage }];

  while (true) {
    const response = await client.messages.create({
      model: 'claude-sonnet-4-6',
      max_tokens: 1024,
      tools: TOOLS,
      messages,
    });

    if (response.stop_reason === 'end_turn') {
      for (const block of response.content) {
        if (block.type === 'text') return block.text;
      }
      return '(no text response)';
    }

    if (response.stop_reason === 'tool_use') {
      messages.push({ role: 'assistant', content: response.content });
      const toolResults = [];
      for (const block of response.content) {
        if (block.type === 'tool_use') {
          console.log(`  🔧 Tool call: ${block.name}(${JSON.stringify(block.input)})`);
          const result = runTool(block.name, block.input);
          console.log(`  📦 Result: ${result.slice(0, 80)}`);
          toolResults.push({ type: 'tool_result', tool_use_id: block.id, content: result });
        }
      }
      messages.push({ role: 'user', content: toolResults });
    } else {
      return `(unexpected stop_reason: ${response.stop_reason})`;
    }
  }
}

const testQuestions = [
  "What's the weather like in Tokyo?",
  "What is (15 * 7) + 23?",
  "What time is it in London?",
  "What's the capital of France?",
];

for (const q of testQuestions) {
  console.log(`\n${'='.repeat(50)}`);
  console.log(`User: ${q}`);
  const answer = await agentChat(q);
  console.log(`Agent: ${answer.slice(0, 150)}`);
}

Run it: python tool_agent.py (or node tool_agent.mjs)

Expected Output:

Defined 3 tools: ['get_weather', 'calculate', 'get_time'] Test: {"temp": 22, "condition": "sunny", "humidity": 45} ================================================== User: What's the weather like in Tokyo? 🔧 Tool call: get_weather({"city": "Tokyo"}) 📦 Result: {"temp": 22, "condition": "sunny", "humidity": 45} Agent: It's currently 22°C and sunny in Tokyo with 45% humidity. ================================================== User: What is (15 * 7) + 23? 🔧 Tool call: calculate({"expression": "(15 * 7) + 23"}) 📦 Result: {"result": 128} Agent: (15 × 7) + 23 = 128 ================================================== User: What time is it in London? 🔧 Tool call: get_time({"timezone": "Europe/London"}) 📦 Result: {"timezone": "Europe/London", "time": "19:30"} Agent: It's currently 19:30 in London. ================================================== User: What's the capital of France? Agent: The capital of France is Paris.

✅ Checkpoint: You should see 3 tool calls (weather, calculate, time) and 1 direct answer (France question). The key observation: Claude correctly chose not to use a tool for the France question. It only uses tools when they're relevant — that's the intelligence of the loop.

Troubleshooting

Claude uses a tool for the France question — Occasionally Claude may try search_db or another tool. If this happens consistently, check that your tool descriptions don't overlap with general knowledge questions. The descriptions should be specific about what each tool does.
AttributeError: 'TextBlock' object has no attribute 'input' — You're iterating over content blocks without checking block.type == "tool_use" first. Text blocks don't have an input field.
Infinite loop — Check that you're appending Claude's response AND the tool results to messages. Both are needed for the next iteration.

Verify Everything Works

Run the complete file to execute both steps:

python tool_agent.py    # or: node tool_agent.mjs

🎉 Congratulations! You've built a working tool-using agent with the while stop_reason == "tool_use" loop. This is the foundational pattern for every agent in this course. In M06, you'll extend this to handle multiple tools in a single turn and manage tool selection for complex queries.

Stretch Goals (Optional)

Test error handling: ask about a city not in the mock data (e.g., "What's the weather in Mars?") and verify the error flows back through Claude gracefully
Add a search_database(query) tool with mock data and test a question that requires chaining two tools

Knowledge Check

Test your understanding of function calling and the tool use loop.

Q1: Who executes the tool — Claude or your code?

AClaude executes the tool directly on Anthropic's servers

BYour code executes the tool — Claude only returns a request to call it

CThe Anthropic API automatically dispatches the tool call

DThe tool executes itself when Claude references it

Correct! Claude never executes tools. It returns a tool_use content block with the tool name and arguments. Your code reads this, runs the function, and sends back the result. This is critical for security and control.

Q2: What field in Claude's response tells you it wants to use a tool?

Aresponse.action == "call_function"

Bresponse.tool_request being non-null

Cresponse.stop_reason == "tool_use"

Dresponse.content[0].type == "function_call"

Correct! When stop_reason is "tool_use", Claude is pausing to wait for a tool result. You'll find the tool details in the content blocks where type == "tool_use".

Q3: What's wrong with this tool definition? (Recall from M04: tool input_schema uses JSON Schema — the same format you used for structured output.)

{"name": "search", "description": "Search", "input_schema": {"type": "object", "properties": {"q": {"type": "string"}}}}

AThe description is too vague — Claude won't know when or how to use this tool

BThe property name "q" is too short and will cause a syntax error

CIt's missing a "required" field, which is mandatory

DNothing is wrong — this is a valid minimal definition

Correct! "Search" tells Claude nothing about what to search, what data source it hits, what it returns, or when to use it versus other tools. Good descriptions are the #1 factor in tool selection accuracy. The JSON Schema structure itself (from M04) is technically valid, but the description is the real problem.

Q4: What should you do when a tool call raises an exception?

ACrash the agent and return the stack trace to the user

BSilently ignore the error and skip the tool result

CRetry the same tool call indefinitely until it succeeds

DReturn the error as a tool_result so Claude can adapt its response

Correct! Send the error message (not the stack trace) as the tool_result. Claude can then inform the user about the issue or try a different approach. This makes agents resilient without hiding failures.

Q5: Fill in the blank to send a tool result back to Claude:

{ "role": "user", "content": [{ "type": "______", "tool_use_id": "toolu_abc123", "content": "{\"temp\": 22}" }] }

Afunction_result

Btool_result

Ctool_response

Dtool_output

Correct! The type is tool_result. It must include the tool_use_id from Claude's original tool_use block so Claude knows which tool call this result corresponds to.

Q6: Which of these is a security risk when implementing tool use?

ADefining tools with detailed descriptions

BReturning error messages as tool_result content

CLetting Claude construct and execute arbitrary shell commands as a "tool"

DUsing tool_choice to force a specific tool

Correct! Never let Claude construct arbitrary commands. Tools should be pre-defined functions with validated inputs. Giving Claude a "run any command" tool is a critical security vulnerability.

Module Summary

Key Takeaways

Claude proposes, you execute — Claude returns tool_use blocks with arguments; your code runs the actual function. This is the security model.
Descriptions matter most — Claude picks tools based on descriptions. Invest time in making them clear, specific, and complete.
The while loop pattern — keep sending messages until stop_reason is "end_turn". Claude may call multiple tools before finishing.
Return errors as results — when tools fail, send the error as a tool_result so Claude can adapt. Don't crash.
This is the chatbot-to-agent moment — Claude can now interact with external systems. Every module from here builds on this foundation.

Next Module Preview: M06 — Multi-Tool Orchestration

Now that Claude can call one tool, what happens when it needs multiple tools working together? In Module 6, you'll build agents that chain tools, run them in parallel, and dynamically register new tools at runtime.

M05: Function Calling Fundamentals

Learning Objectives

What Is Tool Use / Function Calling?

Defining Tools

Example Tool Definition

The Tool Use Loop in Code

Error Handling & Edge Cases

Code Walkthrough: Multi-Tool Agent

Complete Tool-Using Agent

Hands-On Exercise

What You'll Build

Environment Setup

Step 1: Define Three Tools with Mock Implementations

Step 2: Build the Agent Loop and Test Single-Tool Calls

Verify Everything Works

Stretch Goals (Optional)

Knowledge Check

Q1: Who executes the tool — Claude or your code?

Q2: What field in Claude's response tells you it wants to use a tool?

Q3: What's wrong with this tool definition? (Recall from M04: tool input_schema uses JSON Schema — the same format you used for structured output.)

Q4: What should you do when a tool call raises an exception?

Q5: Fill in the blank to send a tool result back to Claude:

Q6: Which of these is a security risk when implementing tool use?

Module Summary

Key Takeaways

Next Module Preview: M06 — Multi-Tool Orchestration

References & Resources