M04: Structured Output & Parsing

Agents don't just generate text — they produce data that your code must parse, validate, and act on. This module teaches you how to get reliable, structured output from Claude and what to do when it isn't perfect.

Learning Objectives

  • Explain why agents require structured output instead of free-form text
  • Use Claude's tool use feature to guarantee structured JSON responses
  • Validate API responses with Pydantic (Python) and Zod (TypeScript) schemas
  • Implement retry logic with error-aware re-prompting for parse failures
  • Build a complete data extraction pipeline with schema validation and error recovery

Why Agents Need Structured Responses

Everyday Analogy

BEFORE: Imagine asking a coworker for customer data and they reply with a rambling paragraph: "Oh yeah, John works over at Acme, I think his email is something like john@acme.com, and he might be a PM?" You'd have to read it, guess which parts are data, and manually copy them out every single time.

PAIN: Now imagine your code has to do this hundreds of times per minute. Regex breaks on edge cases, sentence structures vary wildly, and one missed field can crash your entire pipeline at 2 AM.

MAPPING: Structured dataData organized in a predictable format with labeled fields and defined types — like JSON objects, XML documents, or database rows. Structured data can be reliably parsed by code, unlike free-form text. is like getting a filled-in spreadsheet instead of a paragraph — every field has a label, every value has a type, and your code can grab exactly what it needs with zero guesswork.

What this actually looks like: Here's the difference between unstructured and structured output from the same prompt. The left side is what you'd get from a plain text response. The right side is what tool use gives you:

# Unstructured (free-form text — hard to parse reliably):
"Jane Smith is the VP of Engineering at TechCorp. Her email is
 jane.smith@techcorp.io and you can reach her at (555) 123-4567."

# Structured (tool_use response — code-ready):
{
  "name": "Jane Smith",
  "email": "jane.smith@techcorp.io",
  "phone": "(555) 123-4567",
  "company": "TechCorp",
  "role": "VP of Engineering"
}
Technical Definition Here's the core problem: agents don't just generate text for humans to read — they produce data that code must consume. Your agent's output gets parsed by a JSON deserializer, inserted into a database, or sent to an API. All of those downstream steps expect predictable data shapes with labeled fields and typed values.

JSONJavaScript Object Notation — a lightweight, human-readable data format using key-value pairs and arrays. The standard format for API communication and the most common structured output format for LLM agents. is the standard format for this. It bridges the gap between natural language (what Claude generates) and programmatic consumption (what your code needs). Without structured output, agents can't reliably extract fields from responses. They can't route decisions based on Claude's answer. And they can't feed results into APIs or databases. Every step becomes fragile string parsing that breaks on edge cases.
Animation: From Text to Structured Data

Unstructured → Structured → Application

"John is a PM at Acme Corp, email john@acme.com"
{"name":"John", "role":"PM", "company":"Acme", "email":"john@acme.com"}
✓ DB Insert ✓ API Call ✓ UI
"John works at Acme and his email is..."
JSON.parse() → SyntaxError
✗ Pipeline crashed
Why It Matters Structured output is the contract between the AI and the rest of your system. In a real-world pipeline processing 10,000 requests/day, even a 2% parse failure rate means 200 broken requests daily — each one a potential customer-facing error, a lost database write, or a silent data corruption. Teams that switch from prompt-based JSON extraction to tool-use structured output typically see failure rates drop from 5-15% to under 0.5%. In M05 (Function Calling), you'll use tool definitions to make Claude produce structured output automatically. In M16–M17, you'll add guardrails that validate this output before it reaches production systems.

Tool Use as Structured Output

Everyday Analogy

BEFORE: Without tool useA Claude API feature where you define functions (tools) with JSON Schema parameters. Claude returns structured tool_use content blocks specifying which tool to call and with what arguments. This is Claude's most reliable structured output mechanism., getting structured output from Claude was like shouting your order across a noisy room — you'd write a prompt saying "please return JSON with these fields" and hope the model understood.

PAIN: The model might add markdown formatting around the JSON, forget required fields, use the wrong types, or slip in a conversational sentence before the data. You'd need fragile regex and try-catch blocks just to extract the output.

MAPPING: Tool use is like handing Claude a restaurant order form with pre-printed fields: name of dish, quantity, special instructions. Claude can only fill in the blanks on the form — it physically cannot return free-form text when forced to use a tool. The form (JSON Schema) guarantees the structure.

What this actually looks like in the API response: When Claude uses a tool, you don't get a plain text string. You get a structured content block with type: "tool_use":

# Claude's actual response when forced to use a tool:
{
  "role": "assistant",
  "content": [
    {
      "type": "tool_use",           # ← not "text"!
      "id": "toolu_01A2B3C4D5",
      "name": "extract_contact",    # ← which tool it "called"
      "input": {                    # ← structured data, guaranteed valid JSON
        "name": "Jane Smith",
        "email": "jane.smith@techcorp.io",
        "phone": "(555) 123-4567",
        "company": "TechCorp",
        "role": "VP of Engineering"
      }
    }
  ],
  "stop_reason": "tool_use"        # ← stopped because it wants to call a tool
}
Technical Definition Here's how tool use works for structured output, step by step.

First, you define a tool in your API request's tools array. Each tool has a name, description, and an JSON SchemaA standard for describing the structure of JSON data. It specifies field names, types, required fields, enums, nested objects, and validation rules. Used in Claude's tool definitions to describe expected parameters. that describes its expected parameters — field names, types, which are required.

Second, Claude reads the tool definition and returns a tool_use content blockA specific content type in Claude's API response. When Claude decides to call a tool, it returns a content block with type "tool_use", containing the tool name and a structured input object matching the tool's JSON Schema. — a structured object with the tool name and populated arguments that match the schema.

Third, you can force Claude to use a specific tool with the tool_choiceAn API parameter that controls whether Claude must use a tool. Set to {"type": "tool", "name": "..."} to force a specific tool, or {"type": "any"} to require any tool. This guarantees a structured tool_use response. parameter. This guarantees you'll get structured output every time. The key insight: you don't have to actually execute the tool. You can define a "tool" purely as a structured output mechanism — Claude fills in the form, and you just read the values. This works because Claude is specifically trained to produce valid tool_use blocks.
Animation: Tool Use Loop
💬
User Request
📋
Tool Menu
Fill Params
Execute
📦
Result
Response
Diagram: Tool Use Flow — Full Cycle
User Message "Extract contact" API Claude reads tool definitions picks extract_contact stop_reason: "tool_use" tool_use tool_use Block name: "extract_contact" input: {name, email} id: "toolu_01A2B3" YOUR CODE Your Code Executes run extract_contact() result Claude (again) receives tool_result stop_reason: "end_turn" Final Response "Jane, jane@..." KEY INSIGHT User / Client side Claude (Anthropic API) Your Code (you control) You don't have to actually execute the tool. For structured output, define a "tool" purely as a schema — Claude fills the form, you just read the values. tool_choice: {"type": "tool", "name": "extract_contact"} ← forces structured output
Why It Matters Tool use is not just for calling external functions — it is Claude's most reliable mechanism for producing structured output, even when you never execute the "tool." In benchmarks, tool-use extraction achieves 99%+ schema-valid output compared to ~85-92% for prompt-only JSON extraction. For a healthcare agent processing 500 insurance claims/hour, that difference means 40-75 fewer manual interventions per hour. This pattern is the foundation of M05 (Function Calling), where you'll build agents that actually execute tools to interact with external systems.

Validation, Stop Sequences & Schema Checking

Everyday Analogy

BEFORE: Without validation layers, you'd send Claude a prompt, get back some JSON-ish text, cross your fingers, and call JSON.parse(). Sometimes it worked; sometimes Claude added a friendly "Here's the data:" prefix or a trailing explanation that broke parsing entirely.

PAIN: In production, this meant ~5-15% of responses would fail to parse, triggering silent data loss or crashes that only surfaced hours later in downstream systems. Debugging was a nightmare because failures were intermittent and format-dependent.

MAPPING: Think of stop sequencesStrings you specify in the API request that cause Claude to stop generating when encountered. For example, setting "}" as a stop sequence ensures Claude stops right after closing a JSON object instead of adding extra text. as a film director yelling "Cut!" at exactly the right moment — they halt generation at the closing brace. Schema validation is the quality inspector who checks every frame after the cut. Together, they form a multi-layer safety net.

What this looks like in practice: Without a stop sequence, Claude might return {"name": "Jane"} Hope that helps!. With "stop_sequences": ["}"] in your API request, Claude stops at the closing brace: {"name": "Jane"}. Clean JSON, no trailing text to strip.

Technical Definition Reliable structured output uses a defense-in-depth strategy — meaning you stack multiple safety layers so that if one fails, the next catches the problem. Here are the three layers and what each one actually does:

Layer 1 — Format constraints: This is where you use tool use or careful prompt engineering to make Claude emit valid JSON in the first place. Think of it as building the output in the right shape from the start.
Layer 2 — Stop sequences: These are strings (like } or ]) that you tell the API to watch for. When Claude generates one of these strings, it immediately stops producing more text. This prevents Claude from appending conversational text after your JSON object.
You set them in the API request as "stop_sequences": ["}"]. Claude generates tokens until it hits that closing brace, then halts. The response's stop_reason will be "stop_sequence" instead of "end_turn", so your code can tell exactly why generation stopped. Note: stop sequences are most useful for prompt-based JSON extraction. When you use tool use (Layer 1), you don't need them because tool_use blocks are already bounded.
Layer 3 — Schema validationThe process of checking that a data structure matches an expected format — verifying that required fields exist, types are correct, and values are within allowed ranges. Implemented with Pydantic (Python) or Zod (TypeScript).: Even if the JSON is syntactically valid, it might have the wrong fields or types. Schema validation checks three things: Does every required field exist? Does every value have the correct type? Did any unexpected data sneak in? When all three layers are active, you approach near-100% valid structured output.
Diagram: Schema Validation Pipeline
Raw LLM Output JSON string Parse JSON json.loads() syntax check Pydantic / Zod Validator types correct? required fields? constraints met? VALID Valid Data {name: "Jane", email: "j@..."} INVALID ValidationError "email: expected str, got None" → feed into retry prompt retry with error details Defense-in-depth: JSON parse (syntax) → Schema validation (semantics) → Retry (recovery)

Schema Validation with Pydantic & Zod

PydanticA Python library for data validation using type annotations. You define a class with typed fields, and Pydantic validates, parses, and transforms incoming data automatically. The standard tool for validating LLM output in Python agents. (Python) and ZodA TypeScript-first schema validation library. You define schemas with z.object() and Zod validates data at runtime, providing detailed error messages for each invalid field. The TypeScript equivalent of Pydantic. (TypeScript) are schema validation libraries. In plain English: you define the exact shape your data must have (which fields, what types, which are required), and the library automatically checks every response against that shape. If something doesn't match, it tells you exactly what's wrong.

Under the hood, these libraries work by defining a model class (Pydantic) or schema object (Zod) with typed fields. When you pass Claude's output through the model, the library checks each field: Is name a string? Is price a number? Is email present (since it's required)? If any check fails, it raises a ValidationErrorAn exception raised by Pydantic when input data fails validation. It contains a list of specific field errors — which field failed, what type was expected, and what was received — making it ideal for retry prompts. with field-level details — not just "invalid data" but "field 'email' expected str, got None." These specific error messages are exactly what you'll feed back into retry prompts.

How does this differ from just calling JSON.parse() or json.loads()? Those only check that the JSON is syntactically valid — matching braces, correct commas. Schema validation goes further: it verifies that the data is semantically correct for your application. Valid JSON like {"name": 123} would pass JSON.parse() but fail Pydantic because name should be a string, not a number. The tool's parameters are specified in the input_schemaA field in each tool definition that contains a JSON Schema object describing the tool's expected parameters — their names, types, descriptions, and which are required. Claude uses this schema to generate valid arguments. field of the tool definition. Here's a time-saving trick: you can auto-generate this schema from your Pydantic model using ContactInfo.model_json_schema(). That way, you define the shape once in Pydantic, and the tool definition stays in sync automatically.

🎓 Cert Tip — Domain 4.3

tool_use guarantees STRUCTURE (valid JSON matching schema) but NOT semantic correctness. Values inside the JSON may still be wrong. Always add business rule validation after tool_use extraction.

⚠️ Common Misconceptions

"Tool use guarantees the data is correct." — No. Tool use guarantees structure (valid JSON matching the schema), not semantic correctness. If you ask Claude to extract a phone number and it hallucinates "(555) 000-0000," the JSON will be perfectly valid but the data is wrong. Always add business rule validation on top of schema validation.

"I can just ask Claude to 'respond in JSON' instead of using tool use." — You can, but it's significantly less reliable. Prompt-only JSON extraction fails 5–15% of the time (markdown wrappers, trailing text, missing commas). Tool use fails under 0.5% because Claude is specifically trained for the tool_use format. For anything beyond a quick prototype, use tool use.

"Schema validation catches all bad output." — Schema validation catches type errors (string where number expected) and missing fields, but not logical errors. A schema can confirm age is an integer, but not that 350 is an unreasonable value for a human's age. You need separate business logic validation for semantic checks.

"If parsing fails, just retry the same prompt." — Blind retries have a low success rate because Claude will likely make the same mistake again. The effective approach is to include the specific error message in the retry prompt so Claude can see what went wrong and self-correct.

"Structured output is only for data extraction." — It's also essential for decision routing (Claude returns {"action": "escalate", "reason": "..."}), tool selection (returning which tool to call with what parameters), and any scenario where downstream code needs to branch on Claude's response. If your code does an if on Claude's output, you need structured output.

Error Recovery: When Parsing Fails

Everyday Analogy

BEFORE: Early LLM integrations treated parsing failures as fatal errors — if the JSON was malformed, the request simply failed and the user saw a generic "Something went wrong" message.

PAIN: This meant a single missing comma in Claude's output could waste an entire API call ($0.01-0.05 in tokens), leave a customer-facing request unanswered, and require manual intervention to unblock the pipeline.

MAPPING: Error recovery is like a GPS recalculating your route — when you miss a turn (a parse failure), the system doesn't pull over and shut off the engine. It immediately recomputes, telling you exactly which turn you missed and offering a corrected path to the same destination.

What the retry prompt actually looks like: "Extract contact info from: ...\n\nPrevious attempt failed with: ValidationError — field 'email' expected str, got None\nPlease fix the output to match the required schema exactly." — Claude reads the error, sees it missed the email, and self-corrects on the next attempt.

Technical Definition No matter how good your schema and prompts are, parsing failures will happen in production. The question is not "if" but "how does your system respond?" Here are five strategies, ordered from most common to last resort:

1. Retry with error feedback: Send the same request again, but append the specific error message (e.g., "field 'email' was null, expected string") to the prompt. Claude reads the error and self-corrects — this fixes ~90% of failures on the first retry.
2. Fallback to simpler format: If the complex schema keeps failing, ask for a simpler one (fewer fields, no nested objects). A partial answer is better than no answer.
3. Partial parsing: Extract whichever fields did validate and flag the rest as incomplete. Useful when some data is better than none.
4. Cascading validators: Try multiple schemas in order — strict first, then progressively looser. This handles cases where Claude returns valid data in a slightly different shape.
5. Human-in-the-loop escalation: After all automated strategies fail, route to a human reviewer. This is your safety net for truly ambiguous inputs.

Across all strategies, always use exponential backoffA retry strategy where the wait time doubles after each failure (e.g., 1s, 2s, 4s). This prevents overwhelming the API during outages and gives transient errors time to resolve. (doubling wait times: 1s, 2s, 4s) and a max retry count (typically 3) to prevent infinite loops and runaway API costs.
Conceptual Bridge: You've seen the validation layer that catches bad output. But catching an error is only half the battle — what happens next? The following animation shows the full retry flow: how the system detects a failure, feeds the error back to Claude, and progressively escalates if the problem persists.
Animation: Error Recovery with Retry
FAIL Attempt 1: Missing "email" field → Retry with error in prompt
FAIL Attempt 2: Invalid email format → Retry with stricter instruction
PASS Attempt 3: All fields valid → Return validated result
ALT If 3 failures: Circuit breaker triggers → Fallback to simpler schema or human review
Why It Matters The best agents aren't the ones that never fail — they're the ones that recover gracefully. In production systems, retry-with-error-feedback resolves ~90% of validation failures on the first retry, and ~98% within three attempts. Without retry logic, a B2B order-tracking agent processing 2,000 orders/day would accumulate 100+ unresolved failures daily (at a 5% base failure rate), each requiring manual investigation. With proper retry and fallback, that drops to under 5 per day. This pattern is a foundation you'll reuse in M17 (Output Guardrails) and M18 (Evaluation & Testing).
🎓 Cert Tip — Domain 4.4

When a validation-retry fails, append SPECIFIC error details to the prompt: which field, what was wrong, expected vs actual. Anti-pattern: generic "there were errors, please try again."

Code Walkthrough: Data Extraction Pipeline

Conceptual Bridge: You now understand why structured output matters and the three mechanisms for achieving it (tool use, stop sequences, schema validation). Let's put all three together in working code. The following walkthrough builds a complete data extraction pipeline, starting with tool use for structured output, then layering on Pydantic/Zod validation and retry logic.

Approach 1: Tool Use for Structured Output

Let's start with the data model. The code below defines a ContactInfo schema in two places: a Pydantic model (your validation blueprint) and a matching tool definition (what Claude sees). Having both gives you a two-layer guarantee — Claude's tool_use ensures the right structure, and Pydantic ensures the right types and values. One important gotcha: these two definitions must stay in sync. If you add a field to one, add it to the other. In production, you'd use ContactInfo.model_json_schema() to auto-generate the tool schema from Pydantic, eliminating the sync problem entirely.

The interesting part is the extract_contact() function. It sends text to Claude with tool_choice={"type": "tool", "name": "extract_contact"} — this is the critical line that forces Claude to return a structured tool_use block instead of free-form text. The function then loops through the response content blocks to find the one with type == "tool_use" and validates its input field through Pydantic. Here's the important nuance: even with forced tool use, the values inside the JSON might be wrong (Claude could hallucinate an email, for example). Tool use guarantees the structure is valid, not that the content is correct.

Finally, notice how the error handling separates ValidationError from APIError. These are fundamentally different problems: a validation error means Claude returned the wrong data shape (retry with a better prompt), while an API error means the network or rate limit failed (retry with backoff). Catching them separately lets you respond appropriately to each. Never catch a bare Exception — you'll mask bugs in your own code.

# pip install "anthropic>=0.40.0" "pydantic>=2.0"
import anthropic
import json
from pydantic import BaseModel, ValidationError
from typing import Optional

client = anthropic.Anthropic()

# Define the schema as both Pydantic model and tool definition
class ContactInfo(BaseModel):
    name: str
    email: str
    phone: Optional[str] = None
    company: Optional[str] = None
    role: Optional[str] = None

# Tool definition matches the Pydantic schema
extract_contact_tool = {
    "name": "extract_contact",
    "description": "Extract structured contact information from text.",
    "input_schema": {
        "type": "object",
        "properties": {
            "name": {"type": "string", "description": "Full name of the person"},
            "email": {"type": "string", "description": "Email address"},
            "phone": {"type": "string", "description": "Phone number, if mentioned"},
            "company": {"type": "string", "description": "Company name, if mentioned"},
            "role": {"type": "string", "description": "Job title or role, if mentioned"},
        },
        "required": ["name", "email"],
    },
}

def extract_contact(text: str) -> ContactInfo:
    """Extract contact info using tool use + Pydantic validation."""
    try:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=[extract_contact_tool],
            tool_choice={"type": "tool", "name": "extract_contact"},
            messages=[{
                "role": "user",
                "content": f"Extract the contact information from this text:\n\n{text}"
            }]
        )

        # Claude returns a tool_use content block
        for block in response.content:
            if block.type == "tool_use":
                # Validate with Pydantic
                contact = ContactInfo(**block.input)
                return contact

        raise ValueError("No tool_use block in response")

    except ValidationError as e:
        print(f"Validation failed: {e}")
        raise
    except anthropic.APIError as e:
        print(f"API error: {e.status_code} - {e.message}")
        raise

# Usage
text = """
Best regards,
Jane Smith, VP of Engineering
TechCorp Inc. | jane.smith@techcorp.io | (555) 123-4567
"""

contact = extract_contact(text)
print(f"Name:    {contact.name}")
print(f"Email:   {contact.email}")
print(f"Phone:   {contact.phone}")
print(f"Company: {contact.company}")
print(f"Role:    {contact.role}")
// npm install "@anthropic-ai/sdk@^0.40.0" zod
import Anthropic from '@anthropic-ai/sdk';
import { z } from 'zod';

const client = new Anthropic();

// Define the schema with Zod
const ContactInfo = z.object({
  name: z.string(),
  email: z.string().email(),
  phone: z.string().optional(),
  company: z.string().optional(),
  role: z.string().optional(),
});

const extractContactTool = {
  name: 'extract_contact',
  description: 'Extract structured contact information from text.',
  input_schema: {
    type: 'object',
    properties: {
      name: { type: 'string', description: 'Full name of the person' },
      email: { type: 'string', description: 'Email address' },
      phone: { type: 'string', description: 'Phone number, if mentioned' },
      company: { type: 'string', description: 'Company name, if mentioned' },
      role: { type: 'string', description: 'Job title or role, if mentioned' },
    },
    required: ['name', 'email'],
  },
};

async function extractContact(text) {
  try {
    const response = await client.messages.create({
      model: 'claude-sonnet-4-6',
      max_tokens: 1024,
      tools: [extractContactTool],
      tool_choice: { type: 'tool', name: 'extract_contact' },
      messages: [{
        role: 'user',
        content: `Extract the contact information from this text:\n\n${text}`
      }]
    });

    for (const block of response.content) {
      if (block.type === 'tool_use') {
        // Validate with Zod
        const contact = ContactInfo.parse(block.input);
        return contact;
      }
    }
    throw new Error('No tool_use block in response');
  } catch (error) {
    if (error instanceof z.ZodError) {
      console.error('Validation failed:', error.issues);
    } else if (error instanceof Anthropic.APIError) {
      console.error(`API error: ${error.status} - ${error.message}`);
    }
    throw error;
  }
}

const text = `Best regards,
Jane Smith, VP of Engineering
TechCorp Inc. | jane.smith@techcorp.io | (555) 123-4567`;

const contact = await extractContact(text);
console.log(`Name:    ${contact.name}`);
console.log(`Email:   ${contact.email}`);
console.log(`Phone:   ${contact.phone}`);
console.log(`Company: ${contact.company}`);
console.log(`Role:    ${contact.role}`);
Expected Output:
Name: Jane Smith Email: jane.smith@techcorp.io Phone: (555) 123-4567 Company: TechCorp Inc. Role: VP of Engineering
What Just Happened? You sent unstructured text (an email signature) to Claude with a forced tool call. Claude parsed the natural language and returned a structured tool_use block with typed fields. Pydantic then validated those fields against your schema. The result: five clean, typed fields extracted from a messy paragraph — no regex, no string splitting, no guessing. If any field had the wrong type or was missing, Pydantic would have raised a ValidationError with the exact field name and expected type.

Adding Retry with Error Feedback

Now for the part that makes this production-ready: automatic error recovery. The code below wraps the extraction in a retry loop that runs up to max_retries times. Here's the clever part: on each retry, it appends the specific validation error to the prompt. So instead of blindly asking Claude to try again, you're saying "the email field was null, but it's required — please fix that." Claude reads the error and self-corrects. This approach resolves ~90% of failures on the first retry.

The other key detail is the time.sleep(2 ** attempt) after each failure. This is exponential backoff: 2 seconds, then 4, then 8. Without it, rapid retries during an API outage just make the problem worse — you'd trigger rate limiting on top of the original issue. And always set a max retry count. Without one, a consistently failing input (imagine someone passes in gibberish text) will loop forever, burning tokens and money with no hope of success.

import anthropic
import time
from pydantic import BaseModel, ValidationError
from typing import Optional

client = anthropic.Anthropic()

class ContactInfo(BaseModel):
    name: str
    email: str
    phone: Optional[str] = None
    company: Optional[str] = None
    role: Optional[str] = None

def extract_with_retry(text: str, max_retries: int = 3) -> ContactInfo:
    """Extract contact info with retry on validation failure."""
    last_error = None

    for attempt in range(1, max_retries + 1):
        prompt = f"Extract contact information from this text:\n\n{text}"

        # On retry, include the previous error
        if last_error:
            prompt += f"\n\nPrevious attempt failed with: {last_error}"
            prompt += "\nPlease fix the output to match the required schema exactly."

        try:
            response = client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                tools=[{
                    "name": "extract_contact",
                    "description": "Extract contact info. ALL fields must be valid.",
                    "input_schema": ContactInfo.model_json_schema(),
                }],
                tool_choice={"type": "tool", "name": "extract_contact"},
                messages=[{"role": "user", "content": prompt}],
            )

            for block in response.content:
                if block.type == "tool_use":
                    contact = ContactInfo(**block.input)
                    print(f"Attempt {attempt}: Success!")
                    return contact

        except ValidationError as e:
            last_error = str(e)
            print(f"Attempt {attempt}: Validation error - {last_error}")
            time.sleep(2 ** attempt)  # Exponential backoff

        except anthropic.APIError as e:
            print(f"Attempt {attempt}: API error - {e.message}")
            time.sleep(2 ** attempt)

    raise RuntimeError(f"Failed after {max_retries} attempts. Last error: {last_error}")
import Anthropic from '@anthropic-ai/sdk';
import { z } from 'zod';

const client = new Anthropic();

const ContactInfo = z.object({
  name: z.string(),
  email: z.string().email(),
  phone: z.string().optional(),
  company: z.string().optional(),
  role: z.string().optional(),
});

async function extractWithRetry(text, maxRetries = 3) {
  let lastError = null;

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    let prompt = `Extract contact information from this text:\n\n${text}`;

    if (lastError) {
      prompt += `\n\nPrevious attempt failed with: ${lastError}`;
      prompt += '\nPlease fix the output to match the required schema exactly.';
    }

    try {
      const response = await client.messages.create({
        model: 'claude-sonnet-4-6',
        max_tokens: 1024,
        tools: [{
          name: 'extract_contact',
          description: 'Extract contact info. ALL fields must be valid.',
          input_schema: {
            type: 'object',
            properties: {
              name: { type: 'string' },
              email: { type: 'string' },
              phone: { type: 'string' },
              company: { type: 'string' },
              role: { type: 'string' },
            },
            required: ['name', 'email'],
          },
        }],
        tool_choice: { type: 'tool', name: 'extract_contact' },
        messages: [{ role: 'user', content: prompt }],
      });

      for (const block of response.content) {
        if (block.type === 'tool_use') {
          const contact = ContactInfo.parse(block.input);
          console.log(`Attempt ${attempt}: Success!`);
          return contact;
        }
      }
    } catch (error) {
      if (error instanceof z.ZodError) {
        lastError = error.issues.map(i => `${i.path}: ${i.message}`).join(', ');
        console.log(`Attempt ${attempt}: Validation error - ${lastError}`);
      } else if (error instanceof Anthropic.APIError) {
        console.log(`Attempt ${attempt}: API error - ${error.message}`);
      } else { throw error; }
      await new Promise(r => setTimeout(r, 2 ** attempt * 1000));
    }
  }
  throw new Error(`Failed after ${maxRetries} attempts. Last error: ${lastError}`);
}
What Just Happened? You built a complete extraction pipeline with automatic error recovery. On each attempt, the function (1) builds a prompt that includes any previous error, (2) calls Claude with forced tool use, (3) validates the response with Pydantic/Zod, and (4) either returns the validated result or captures the error for the next attempt. With exponential backoff and a circuit breaker after 3 failures, this pattern handles both transient API issues and genuine validation problems without crashing or looping forever.

Hands-On Exercise

What You'll Build

A complete contact extraction pipeline that compares prompt-only vs tool-use approaches, validates output with Pydantic/Zod, and recovers from failures automatically. You'll test against 5 real email signatures.

Time estimate: 25–35 minutes • Prerequisites: M01-M03 labs complete (API key set, SDK installed) • Files you'll create: extractor.py (or extractor.mjs)

Environment Setup

# Python
pip install "anthropic>=0.40.0" "pydantic>=2.0"
export ANTHROPIC_API_KEY="your-key-here"

# Node.js
npm install "@anthropic-ai/sdk@^0.40.0" zod
export ANTHROPIC_API_KEY="your-key-here"

Step 1: Define the Schema and Test Data

Before extracting anything, you need a data model and test cases. This step defines the ContactInfo schema and 5 real-world email signatures that range from easy to tricky. Having a fixed test set lets you objectively compare prompt-only vs tool-use approaches in Step 2.

Create a new file called extractor.py (or extractor.mjs):

import anthropic
import json
from pydantic import BaseModel, ValidationError
from typing import Optional

client = anthropic.Anthropic()

class ContactInfo(BaseModel):
    name: str
    email: str
    phone: Optional[str] = None
    company: Optional[str] = None
    role: Optional[str] = None

# 5 test email signatures — easy to hard
TEST_SIGNATURES = [
    "Best, Jane Smith | jane@acme.com | Acme Corp",
    "John Doe, Senior Engineer at MegaTech\njohn.doe@megatech.io | (555) 234-5678",
    "Cheers,\nDr. Maria García-López, Head of Research\nBioGen International\nmgarcia@biogen.int",
    "— Alex K. | Product @ StartupXYZ | alex@startupxyz.co | they/them",
    "Thanks!\nRobert \"Bob\" Williams III\nChief Financial Officer\nGlobal Finance Partners LLC\nrwilliams@gfp.com\n+1 (212) 555-0199",
]

print(f"Schema: {json.dumps(ContactInfo.model_json_schema(), indent=2)}")
print(f"\nTest signatures: {len(TEST_SIGNATURES)}")
import Anthropic from '@anthropic-ai/sdk';
import { z } from 'zod';

const client = new Anthropic();

const ContactInfo = z.object({
  name: z.string(),
  email: z.string().email(),
  phone: z.string().optional(),
  company: z.string().optional(),
  role: z.string().optional(),
});

const TEST_SIGNATURES = [
  "Best, Jane Smith | jane@acme.com | Acme Corp",
  "John Doe, Senior Engineer at MegaTech\njohn.doe@megatech.io | (555) 234-5678",
  "Cheers,\nDr. Maria García-López, Head of Research\nBioGen International\nmgarcia@biogen.int",
  "— Alex K. | Product @ StartupXYZ | alex@startupxyz.co | they/them",
  "Thanks!\nRobert \"Bob\" Williams III\nChief Financial Officer\nGlobal Finance Partners LLC\nrwilliams@gfp.com\n+1 (212) 555-0199",
];

console.log(`Test signatures: ${TEST_SIGNATURES.length}`);

Run it: python extractor.py (or node extractor.mjs)

Expected Output (Python):
Schema: { "properties": { "name": { "title": "Name", "type": "string" }, "email": { "title": "Email", "type": "string" }, ... }, "required": ["name", "email"], "title": "ContactInfo", "type": "object" } Test signatures: 5
✅ Checkpoint: If you see the JSON schema printed and "Test signatures: 5", Step 1 is working. The schema should show 5 properties with name and email as required. If you don't see the schema, make sure you're using Pydantic v2 (not v1).

Step 2: Extract with Tool Use + Validation

Now let's build the extraction function and see how it handles the full range of signatures. The interesting question isn't whether it works on the clean ones (Sig 1 is trivial) — it's whether it can handle Dr. María García-López's hyphenated name and Robert "Bob" Williams III's nickname in quotes. These are the cases where regex-based extraction falls apart, and tool use shines.

This function combines the tool definition, forced tool_choice, and Pydantic validation from the code walkthrough into a single reusable function. It uses the ContactInfo model and TEST_SIGNATURES from Step 1.

Add the following to extractor.py:

extract_tool = {
    "name": "extract_contact",
    "description": "Extract structured contact information from an email signature.",
    "input_schema": ContactInfo.model_json_schema(),
}

def extract_contact(text: str) -> ContactInfo:
    """Extract contact info using forced tool use + Pydantic validation."""
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        tools=[extract_tool],
        tool_choice={"type": "tool", "name": "extract_contact"},
        messages=[{"role": "user", "content": f"Extract contact info:\n\n{text}"}],
    )
    for block in response.content:
        if block.type == "tool_use":
            return ContactInfo(**block.input)
    raise ValueError("No tool_use block in response")

# Run against all 5 test signatures
successes = 0
for i, sig in enumerate(TEST_SIGNATURES, 1):
    try:
        contact = extract_contact(sig)
        print(f"✓ Sig {i}: {contact.name} <{contact.email}> @ {contact.company or 'N/A'}")
        successes += 1
    except (ValidationError, ValueError) as e:
        print(f"✗ Sig {i}: FAILED — {e}")
    except anthropic.APIError as e:
        print(f"✗ Sig {i}: API error — {e.message}")

print(f"\nResults: {successes}/{len(TEST_SIGNATURES)} extracted successfully")
const extractTool = {
  name: 'extract_contact',
  description: 'Extract structured contact information from an email signature.',
  input_schema: {
    type: 'object',
    properties: {
      name: { type: 'string', description: 'Full name' },
      email: { type: 'string', description: 'Email address' },
      phone: { type: 'string', description: 'Phone number if present' },
      company: { type: 'string', description: 'Company name if present' },
      role: { type: 'string', description: 'Job title if present' },
    },
    required: ['name', 'email'],
  },
};

async function extractContact(text) {
  const response = await client.messages.create({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    tools: [extractTool],
    tool_choice: { type: 'tool', name: 'extract_contact' },
    messages: [{ role: 'user', content: `Extract contact info:\n\n${text}` }],
  });
  for (const block of response.content) {
    if (block.type === 'tool_use') return ContactInfo.parse(block.input);
  }
  throw new Error('No tool_use block');
}

let successes = 0;
for (let i = 0; i < TEST_SIGNATURES.length; i++) {
  try {
    const contact = await extractContact(TEST_SIGNATURES[i]);
    console.log(`✓ Sig ${i+1}: ${contact.name} <${contact.email}> @ ${contact.company || 'N/A'}`);
    successes++;
  } catch (error) {
    console.log(`✗ Sig ${i+1}: FAILED — ${error.message?.slice(0, 80)}`);
  }
}
console.log(`\nResults: ${successes}/${TEST_SIGNATURES.length} extracted successfully`);

Run it: python extractor.py (or node extractor.mjs)

Expected Output:
✓ Sig 1: Jane Smith <jane@acme.com> @ Acme Corp ✓ Sig 2: John Doe <john.doe@megatech.io> @ MegaTech ✓ Sig 3: Dr. Maria García-López <mgarcia@biogen.int> @ BioGen International ✓ Sig 4: Alex K. <alex@startupxyz.co> @ StartupXYZ ✓ Sig 5: Robert "Bob" Williams III <rwilliams@gfp.com> @ Global Finance Partners LLC Results: 5/5 extracted successfully
✅ Checkpoint: If you see 5/5 (or at least 4/5) extracted successfully, Step 2 is working. Tool use handles tricky cases like hyphenated names, nicknames in quotes, and informal formatting that would break regex-based extraction.

Step 3: Add Retry with Error Feedback

Even with tool use, validation can occasionally fail. Maybe Claude returns an empty string for a required field, or interprets "j (at) co (dot) com" too literally. This step wraps the extraction in a retry loop that feeds specific error messages back to Claude. Here's the key idea: instead of blindly retrying, we tell Claude exactly what went wrong so it can fix the specific problem. This step builds on extract_contact() from Step 2.

Add the following to extractor.py:

import time

def extract_with_retry(text: str, max_retries: int = 3) -> ContactInfo:
    """Extract with automatic retry on validation failure."""
    last_error = None
    for attempt in range(1, max_retries + 1):
        prompt = f"Extract contact info:\n\n{text}"
        if last_error:
            prompt += f"\n\nPrevious attempt failed: {last_error}\nFix the output."
        try:
            response = client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                tools=[extract_tool],
                tool_choice={"type": "tool", "name": "extract_contact"},
                messages=[{"role": "user", "content": prompt}],
            )
            for block in response.content:
                if block.type == "tool_use":
                    return ContactInfo(**block.input)
        except ValidationError as e:
            last_error = str(e)
            print(f"  Attempt {attempt}: Validation error, retrying...")
            time.sleep(2 ** attempt)
        except anthropic.APIError as e:
            print(f"  Attempt {attempt}: API error — {e.message}")
            time.sleep(2 ** attempt)
    raise RuntimeError(f"Failed after {max_retries} attempts: {last_error}")

# Test retry with a deliberately tricky signature
tricky = "Contact: J. at some-company, email is j (at) co (dot) com, phone TBD"
try:
    result = extract_with_retry(tricky)
    print(f"Extracted: {result.name} <{result.email}>")
except RuntimeError as e:
    print(f"Gave up: {e}")
async function extractWithRetry(text, maxRetries = 3) {
  let lastError = null;
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    let prompt = `Extract contact info:\n\n${text}`;
    if (lastError) prompt += `\n\nPrevious attempt failed: ${lastError}\nFix the output.`;
    try {
      const response = await client.messages.create({
        model: 'claude-sonnet-4-6',
        max_tokens: 1024,
        tools: [extractTool],
        tool_choice: { type: 'tool', name: 'extract_contact' },
        messages: [{ role: 'user', content: prompt }],
      });
      for (const block of response.content) {
        if (block.type === 'tool_use') return ContactInfo.parse(block.input);
      }
    } catch (error) {
      if (error instanceof z.ZodError) {
        lastError = error.issues.map(i => `${i.path}: ${i.message}`).join(', ');
        console.log(`  Attempt ${attempt}: Validation error, retrying...`);
      } else { throw error; }
      await new Promise(r => setTimeout(r, 2 ** attempt * 1000));
    }
  }
  throw new Error(`Failed after ${maxRetries} attempts: ${lastError}`);
}

const tricky = "Contact: J. at some-company, email is j (at) co (dot) com, phone TBD";
try {
  const result = await extractWithRetry(tricky);
  console.log(`Extracted: ${result.name} <${result.email}>`);
} catch (error) {
  console.log(`Gave up: ${error.message}`);
}

Run it: python extractor.py (or node extractor.mjs)

Expected Output (one of these):
Attempt 1: Validation error, retrying... Extracted: J. <j@co.com> # OR, if it succeeds first try: Extracted: J. <j@co.com>
✅ Checkpoint: The tricky signature may succeed on the first try or require a retry. Either way, you should see either "Extracted: ..." or "Gave up: ..." — no unhandled crashes. If you see retry messages, that's the error-feedback mechanism working correctly.
Troubleshooting
  • ModuleNotFoundError: No module named 'pydantic' — Run pip install pydantic>=2.0. Pydantic v2 is required for model_json_schema().
  • Cannot find module 'zod' — Run npm install zod.
  • All 5 extractions fail — Check your API key is set. Run echo $ANTHROPIC_API_KEY to verify (Linux/Mac) or echo %ANTHROPIC_API_KEY% (Windows).
  • Retry loop takes too long — Exponential backoff means waits of 2s, 4s, 8s. If testing, reduce max_retries to 2 or remove the time.sleep() temporarily.
  • RuntimeError: Failed after 3 attempts — This is expected for very ambiguous input. The circuit breaker worked correctly. Try a cleaner test signature to confirm the pipeline works.

Verify Everything Works

Run the complete file to execute all steps in sequence:

python extractor.py    # or: node extractor.mjs
🎉 Congratulations! You've built a complete structured output pipeline with tool-use extraction, schema validation, and automatic error recovery. This pattern — define schema, force tool use, validate, retry with feedback — is the foundation for every data extraction agent in this course, and you'll extend it with real tool execution in M05.

Stretch Goals (Optional)

  • Build a generic extract(text, schema) function that accepts any Pydantic/Zod schema
  • Add a confidence score to each extracted field based on the source text

Multi-Modal Input: Vision and PDF

Agents don't only consume text. Claude can process imagesBinary visual data (JPEG, PNG, GIF, WebP) sent to Claude as base64-encoded strings or URLs, enabling the model to "see" and reason about visual content. and PDFsPortable Document Format files that Claude can read natively, extracting text, tables, and layout information without external OCR. directly, which unlocks powerful agent use cases like reading scanned documents, extracting data from photographs, and analyzing charts or diagrams.

Sending Images via Base64

To send an image to Claude, you encode it as a base64 string and include it in the message's content array with type: "image" and the appropriate media_type (e.g., image/jpeg, image/png). This lets agents process photos of receipts, screenshots of dashboards, scanned filings, or any visual input without needing a separate OCR pipeline.

Vision Use Cases for Agents

  • Scanned document extraction — read a scanned UCC filing or invoice and extract structured fields like filing numbers, debtor names, and dates
  • Photo-based data capture — extract product details from a photo of a shipping label or whiteboard notes from a meeting
  • Chart and diagram analysis — interpret a bar chart image and return the underlying data as JSON

PDF Processing

Claude supports native document understanding for PDFs, so you can send multi-page documents directly without external OCR tools. This is especially valuable for agents that need to process contracts, medical records, or government filings. Combine PDF input with tool use to extract structured data automatically.

# Pseudocode: an analyze_document tool for a multi-modal agent

TOOL analyze_document(document_base64, media_type):
    """Send a scanned document to Claude with vision and
    return structured data extracted from it."""

    SEND message to Claude:
        content = [
            { type: "image", source: base64_data, media_type: media_type },
            { type: "text", text: "Extract: filing number, debtor name,
              secured party, filing date. Return as JSON." }
        ]
        tools = [ structured_filing_schema ]

    PARSE tool_use response
    VALIDATE against schema
    RETURN structured filing record
Why It Matters Over 60% of business documents still arrive as scanned images or PDFs. An agent that can only read clean text misses the majority of real-world input. Multi-modal capabilities let your agent handle the messy reality of documents without bolting on separate OCR services or manual data entry steps.

Knowledge Check

Test your understanding of structured output, validation, and error recovery.

Q1: What is the most reliable way to get structured JSON output from Claude?

AAsk Claude nicely to "please respond in JSON format"
BSet temperature to 0 for deterministic output
CUse tool use with a JSON Schema definition — Claude is specifically trained for this format
DPost-process the text response with regex to extract JSON
Correct! Tool use is Claude's most reliable structured output mechanism. Claude is specifically trained to produce valid tool_use content blocks with properly typed parameters. It's far more reliable than prompt engineering alone.

Q2: Given this Pydantic model, which JSON response will FAIL validation?

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
A{"name": "Widget", "price": 9.99, "in_stock": true}
B{"name": "Widget", "price": "nine dollars", "in_stock": true}
C{"name": "Widget", "price": 9, "in_stock": false}
D{"name": "Widget", "price": 0.0, "in_stock": true}
Correct! "nine dollars" is a string, but the schema expects a float for the price field. Pydantic will raise a ValidationError. Option C works because Pydantic coerces int 9 to float 9.0.

Q3: What should you include in a retry prompt after a validation failure?

AJust repeat the original prompt unchanged
BThe word "IMPORTANT" in all caps to make Claude try harder
CA completely different prompt asking the same question
DThe original prompt plus the specific validation error message so Claude can self-correct
Correct! Including the exact error message (e.g., "field 'email' expected str, got None") lets Claude see what went wrong and fix it. This is far more effective than repeating the same prompt or using vague emphasis.

Q4: What does tool_choice: {"type": "tool", "name": "extract_contact"} do?

AForces Claude to use that specific tool, guaranteeing a structured tool_use response
BSuggests the tool but Claude can still respond with plain text
CAutomatically executes the tool on the server side
DAdds the tool to Claude's training data for better results
Correct! Setting tool_choice to a specific tool forces Claude to use it, guaranteeing you'll get a structured tool_use content block. This is the key to reliable structured output via tool use.

Q5: Why should retry logic use exponential backoff?

AIt makes Claude think harder by giving it more time
BIt prevents overwhelming the API during outages and gives transient errors time to resolve
CIt reduces token usage by spacing out requests
DIt's required by the Anthropic API terms of service
Correct! Exponential backoff (1s, 2s, 4s, 8s...) prevents your retries from making an API overload worse, and gives transient issues time to clear. It's a standard reliability pattern you'll see again in M17 (Circuit Breakers).

Q6: Fill in the blank to force Claude to use a specific tool:

response = client.messages.create(
    model="claude-sonnet-4-6",
    tools=[my_tool],
    ______={"type": "tool", "name": "my_tool"},
    messages=[...]
)
Atool_select
Bforce_tool
Ctool_choice
Drequired_tool
Correct! The parameter is tool_choice. Setting it to {"type": "tool", "name": "my_tool"} forces Claude to use that specific tool, guaranteeing structured output.

Module Summary

Key Takeaways

  • Structured output is the contract between the AI and your system — without it, downstream code breaks unpredictably.
  • Tool use is the most reliable method — Claude is specifically trained to produce valid tool_use content blocks with typed parameters.
  • Defense in depth — combine format constraints, stop sequences, and schema validation for near-100% reliability.
  • Pydantic & Zod turn "it usually works" into "it always works or fails explicitly" with typed, validated objects.
  • Retry with error feedback — include the validation error in the retry prompt so Claude can self-correct. Always use exponential backoff.

Next Module Preview: M05 — Function Calling

Now that you can get reliable structured output, you're ready for the pivotal moment in the course: giving Claude the ability to do things, not just generate text. In Module 5, you'll build your first tool-using agent — the moment Claude goes from chatbot to agent.