CC15: Claude Agent SDK | Claude Code Mastery

Learning Objectives

Understand what the Claude Agent SDK is and how it relates to Claude Code, the Anthropic SDK, and MCP.
Decide when to reach for the Agent SDK vs the Claude Code CLI vs the raw Anthropic Messages API.
Set up the Agent SDK in Python and TypeScript.
Wrap an existing REST API as custom tools the agent can call (via an in-process MCP server).
Run a complete agent loop — tools, context, and turn limits — against a real backend.

Why the Agent SDK?

You've spent fifteen modules at the terminal driving Claude Code interactively. That works for software engineering tasks where a human is in the loop. But most "agent" use cases are programs that nobody is interactively typing into — a GitHub Action that reviews PRs, a Slack bot that triages support tickets, a nightly job that audits a codebase, a customer-facing chat surface that handles UCC filing questions.

For those, you want Claude Code's capabilities — tool use, file system access, MCP servers, subagents, hooks, sessions — without the interactive terminal. That's the Claude Agent SDK.

What it actually is

The Agent SDK is the same engine as Claude Code, packaged as a library. pip install claude-agent-sdk (Python) or npm install @anthropic-ai/claude-agent-sdk (TypeScript). You configure an agent with options (system prompt, allowed tools, MCP servers, max turns), call query() with a prompt, and stream messages back. Tool use, file edits, subprocess calls, MCP — all the same primitives you've been using interactively, now under your code's control.

SDK vs Claude Code CLI vs Anthropic Messages API

Three different surfaces. Pick by what you're building.

Surface	Use when	Example
Claude Code CLI	A human is at the terminal pair-programming. Interactive sessions, slash commands, file diff review.	`claude` at your terminal — everything CC0–CC14.
Claude Agent SDK	A program needs Claude Code's capabilities (tools, file system, MCP, subagents) without the terminal. Background jobs, scheduled tasks, custom apps with agentic logic.	A nightly compliance bot that scans a Postgres warehouse for new UCC filings and posts to Slack.
Anthropic Messages API	You want Claude's intelligence but you're managing the agent loop yourself — custom tool execution, custom retry logic, custom token streaming. Maximum control, more code.	A real-time chat UI where you've already got your own tool-call orchestration and just want Claude as the model.

The Agent SDK sits between the CLI (most opinionated, easiest) and the Messages API (most flexible, most code). For most "I want an agent that does X" problems, the SDK is the right starting point.

Core Concepts

The agent loop, done for you

The Messages API gives you one round-trip: send messages, get a response. If the response wants to call a tool, you write the code that runs the tool, packs the result into the next message, and re-sends. The Agent SDK runs that loop for you: you call query() (or use a ClaudeSDKClient), it returns a stream of messages (assistant text, tool calls, tool results, thinking) and stops when Claude is done or hits max_turns.

Tools come from three places

Built-in: Read, Write, Edit, Bash, Grep, Glob, Task, WebFetch, WebSearch, NotebookEdit — the same tools Claude Code has. Toggle individually via allowed_tools / disallowed_tools.
MCP servers: same Model Context Protocol you used in CC9. Define a small server (in-process via create_sdk_mcp_server, or external stdio/HTTP/SSE), then point the SDK at it via mcp_servers.
Subagents: .claude/agents/*.md files from CC6, OR programmatically via the agents config option. The SDK delegates via the Task tool, same as the CLI.

The control surfaces you've already learned

system_prompt, allowed_tools, permission_mode, max_turns, mcp_servers, cwd — same vocabulary as Claude Code. If you understood the CLI flags from CC4 (permissions) and CC14 (headless mode), you already understand the SDK options.

SDK Features — What You Can Configure

Everything below lives on the ClaudeAgentOptions object (Python) or the options field of query() (TypeScript). This is the SDK's full configuration surface. Most agents use 3–5 of these; the rest are there when you need them.

Two entry points

API	Use when	Looks like
`query(prompt, options)`	One-shot or single-turn streaming runs (CI jobs, scripts, batch processing)	Async iterator over messages; stops when Claude is done
`ClaudeSDKClient(options)`	Multi-turn conversations, persistent sessions, follow-up questions	Class with `query()` + `receive_response()`; survives across calls

Built-in tools

Tool	What it does	Notes
`Read`	Read a file from disk	Used in Lab 1
`Write`	Create / overwrite a file	Gated by `permission_mode`
`Edit`	String-replace inside a file	Most common write surface
`Bash`	Execute a shell command	Highest-blast-radius tool — restrict via deny rules
`Grep`	Ripgrep across the repo	Fast, structured
`Glob`	File-name pattern matching	Used in Lab 1
`Task`	Delegate to a subagent	The handoff mechanism for `agents`
`WebFetch` / `WebSearch`	Pull a URL, search the web	Off by default in some configs
`NotebookEdit`	Edit Jupyter `.ipynb` cells	Specialized; rarely needed outside data work

Restrict via allowed_tools=[...] (whitelist) or disallowed_tools=[...] (deny specific ones). MCP tools follow the naming pattern mcp__<serverName>__<toolName>.

Custom tools (two routes)

Route	When
In-process MCP via `@tool` decorator + `create_sdk_mcp_server`	Tools written in the same language as your driver. Lab 2 uses this.
External MCP (stdio, HTTP, SSE)	Tools provided by another team / language / vendor. Same protocol the Postgres MCP from CC9 uses.

Permissions & runtime gating

permission_mode: "default" (prompt for risky calls) / "acceptEdits" (auto-approve file edits) / "plan" (read-only planning) / "bypassPermissions" (no prompts — CI mode).
can_use_tool callback: a Python/TS function the SDK invokes per tool call. Return {"behavior": "allow"} or {"behavior": "deny", "message": "..."}. Use this for runtime decisions that depend on tool args (e.g., deny Bash calls that touch ~/.ssh).

Hooks (lifecycle events)

Same model as CC7's CLI hooks, but the handlers are Python/TS callables instead of shell commands. Configure via the hooks dict:

Event	Fires when	Common uses
`PreToolUse`	Before a tool call dispatches	Block dangerous calls; rewrite arguments; log
`PostToolUse`	After a tool call returns	Format files, run validators, audit
`UserPromptSubmit`	When a user prompt is received	Inject context, enforce input filters
`Stop` / `SubagentStop`	When the (sub)agent finishes	Persist results, notify, cleanup
`PreCompact`	Before context compaction	Save important context to memory

Sessions & conversation

Multi-turn: use ClaudeSDKClient; each .query() continues the same conversation.
Resume: pass resume=<session_id> in options to pick up a previous session by id.
Continue: continue_conversation=True resumes the most recent session in the working directory.

System prompt & settings sources

system_prompt: replace the default agent system prompt entirely.
append_system_prompt: keep the default and add your own instructions on top.
setting_sources: load CLAUDE.md, .claude/settings.json, slash commands, and subagents from disk — the same files CC3, CC4, CC5, CC6 created. Lets you build an SDK driver that respects everything the team has already configured for the CLI.

Knobs you'll reach for often

Option	Purpose
`max_turns`	Cap on agent loop iterations — safety against runaway tool calls
`model`	Pick Opus / Sonnet / Haiku for cost/latency trade-offs
`cwd`	Working directory the agent's tools operate from
`env`	Environment variables passed to spawned subprocesses (Bash, MCP servers)
`max_thinking_tokens`	Budget for extended thinking (reasoning models)
`add_dirs`	Extra directories the agent's file tools may access (beyond `cwd`)

Output: what you receive in the message stream

Each iteration of the async stream yields one of:

SystemMessage — session metadata at start
AssistantMessage — Claude's response, containing TextBlock / ToolUseBlock / ThinkingBlock
UserMessage — tool-result messages the SDK feeds back to Claude (visible to you for logging)
ResultMessage — final summary with cost, duration, success/failure

What the labs exercise

Lab 1 uses: query(), built-in Read + Glob, system_prompt, allowed_tools, max_turns, permission_mode, async message iteration with TextBlock + ToolUseBlock.

Lab 2 adds: in-process MCP server with @tool + create_sdk_mcp_server, the mcp_servers option, MCP tool naming (mcp__ucc__lookup_filings), and the bridge from SDK driver to a Claude Code subagent definition.

What's not exercised in the labs but listed here for completeness: can_use_tool runtime gating, hooks, multi-turn ClaudeSDKClient, session resume, settings sources. These are the right next features to reach for once you have a working agent.

Debugging & Observability — What's My Agent Actually Doing?

An agent loop has more moving parts than a normal program: Claude picks a tool, the SDK dispatches it, the tool returns a result, Claude reads the result, picks another tool or answers. When something looks wrong — the agent loops forever, picks the wrong tool, ignores instructions, or burns way too many tokens — you need visibility into every step of the loop. The SDK gives you that visibility for free; you just have to print it.

The four message types in the stream

Every iteration of the SDK's async stream yields exactly one of these. Inspecting them is the foundation of debugging any agent.

Message type	What it tells you	Inside
`SystemMessage`	Session metadata at start — session id, model, tool inventory, working dir	One per session. Useful for confirming setup loaded right.
`AssistantMessage`	Claude's turn: text it's saying, tools it's calling, thinking it's doing	`TextBlock`, `ToolUseBlock`, `ThinkingBlock`
`UserMessage`	Tool results the SDK fed back to Claude after a tool call	`ToolResultBlock`
`ResultMessage`	Final summary at end — cost, duration, turn count, success flag	`duration_ms`, `num_turns`, `total_cost_usd`, `is_error`, `session_id`

An instrumented agent loop you can copy

Drop this in place of the simple print(block.text) from Lab 1 and you'll see everything the agent does — what tools it calls, with what args, what came back, how much it cost. This is the single most useful 30 lines for debugging any SDK program.

Python Instrumented Loop

debug_agent.py

"""Run any agent with full message-stream visibility."""
import asyncio
import json
from claude_agent_sdk import (
    query, ClaudeAgentOptions,
    SystemMessage, AssistantMessage, UserMessage, ResultMessage,
    TextBlock, ToolUseBlock, ToolResultBlock, ThinkingBlock,
)


async def run_agent(prompt: str, options: ClaudeAgentOptions) -> None:
    print(f"\n>>> PROMPT: {prompt}\n")

    async for msg in query(prompt=prompt, options=options):

        if isinstance(msg, SystemMessage):
            print(f"[system] session={msg.data.get('session_id')[:8]}... "
                  f"model={msg.data.get('model')} "
                  f"tools={len(msg.data.get('tools', []))}")

        elif isinstance(msg, AssistantMessage):
            for block in msg.content:
                if isinstance(block, TextBlock):
                    print(f"[claude] {block.text}")
                elif isinstance(block, ToolUseBlock):
                    args = json.dumps(block.input, indent=None)[:120]
                    print(f"[tool-call] {block.name}({args})")
                elif isinstance(block, ThinkingBlock):
                    print(f"[thinking] {block.thinking[:200]}...")

        elif isinstance(msg, UserMessage):
            for block in msg.content:
                if isinstance(block, ToolResultBlock):
                    body = str(block.content)[:200]
                    status = "ERROR" if block.is_error else "ok"
                    print(f"[tool-result {status}] {body}")

        elif isinstance(msg, ResultMessage):
            print(f"\n[done] turns={msg.num_turns} "
                  f"duration={msg.duration_ms}ms "
                  f"cost=${msg.total_cost_usd:.4f} "
                  f"error={msg.is_error}")


if __name__ == "__main__":
    options = ClaudeAgentOptions(
        allowed_tools=["Read", "Glob"],
        max_turns=5,
        permission_mode="bypassPermissions",
    )
    asyncio.run(run_agent("Summarize files in this dir.", options))

Sample output when you run it:

Output Trace

stdout

>>> PROMPT: Summarize files in this dir.

[system] session=a3f2c8d1... model=claude-opus-4-7 tools=14
[tool-call] Glob({"pattern": "*"})
[tool-result ok] README.md
deps.txt
first_agent.py
[tool-call] Read({"file_path": "/Users/me/cc-labs/first-agent/README.md"})
[tool-result ok] # My first agent project
[tool-call] Read({"file_path": "/Users/me/cc-labs/first-agent/deps.txt"})
[tool-result ok] claude-agent-sdk
[claude] This directory contains a starter project using the Claude Agent SDK.
The README labels it "My first agent project" and deps.txt declares one
dependency: claude-agent-sdk.

[done] turns=4 duration=3287ms cost=$0.0093 error=false

Now you know exactly what happened: 1 Glob to discover, 2 Reads to inspect, 1 final answer, 4 turns total, ~3 seconds, under a cent.

The five debugging scenarios you'll hit

Symptom	What to inspect	Common fix
Agent loops forever	Tool-call args repeat with the same parameters; `num_turns` climbs without progress.	Set or lower `max_turns`. Then look at the loop — usually the tool result doesn't have what Claude wanted, and Claude keeps retrying. Improve the tool's error messages, or revise the system prompt to tell Claude what to do when the tool returns a specific failure.
Agent calls the wrong tool	Tool description and tool name. Claude picks tools by description.	Rewrite the tool's description to explicitly say what it's for AND what it's NOT for. Example: `"Search filings by state. Use this for state-level queries; do NOT use for debtor-name searches — use search_debtor for that."`
Agent ignores your system prompt	Whether you used `system_prompt` (replaces) vs `append_system_prompt` (adds to default).	Use `append_system_prompt` if you want Claude Code's default agent behavior plus your additions; `system_prompt` if you want a clean slate. Mixing them up is the #1 cause of "Claude isn't following my rules."
Tool fails silently	Look for `UserMessage` with `ToolResultBlock.is_error == True`.	Wrap your tool body in try/except and return `{"content": [{"type": "text", "text": f"error: {e}"}], "is_error": True}`. Claude reads the error and can decide whether to retry or give up.
Cost too high	`ResultMessage.total_cost_usd` — track per session; aggregate across runs.	Set `model="claude-haiku-4-5-20251001"` for cheap iterations during development, swap to Sonnet/Opus for prod. Consider whether some subagent calls can use Haiku.

Persist traces for after-the-fact debugging

For agents running in CI or scheduled jobs, you can't watch stdout live. Write each message to a JSON Lines file so you can replay the trace later:

Python JSONL Trace Logger

trace_to_file.py

import json, time
from pathlib import Path
from claude_agent_sdk import query, ClaudeAgentOptions

trace_file = Path(f"trace-{int(time.time())}.jsonl")

async def traced(prompt, options):
    with trace_file.open("w") as f:
        async for msg in query(prompt=prompt, options=options):
            # SDK message types are dataclasses with .__dict__-friendly contents.
            entry = {"ts": time.time(), "type": type(msg).__name__,
                     "raw": repr(msg)[:2000]}
            f.write(json.dumps(entry) + "\n")
            f.flush()  # crash-safe
    print(f"trace written to {trace_file}")

The flush() call matters — if your agent crashes mid-loop, you still have everything up to the failure on disk.

Hooks for pre-emptive debugging

The features section listed PreToolUse and PostToolUse hooks. They're not just for production policy — they're great for debugging. A PreToolUse hook that logs every tool's arguments to a file gives you a permanent audit trail without changing your agent code:

Python Hook for Debugging

debug_hook.py

from claude_agent_sdk import HookMatcher, ClaudeAgentOptions

async def log_tool_call(input_data, tool_use_id, context):
    print(f"[hook] {input_data['tool_name']} -> {input_data['tool_input']}")
    return {}  # empty dict = allow, no modifications

options = ClaudeAgentOptions(
    hooks={
        "PreToolUse": [HookMatcher(matcher="*", hooks=[log_tool_call])],
    },
)

You now have three ways to see what your agent is doing: instrumented stream loop (synchronous, in-process), JSONL trace (async, persistent), and hooks (declarative, intercept-style). Use whichever fits your situation.

Reach for the inspector subagent

For complex multi-step debugging, delegate the trace analysis to Claude itself. Once you've persisted a JSONL trace, you can ask: "@trace-inspector load trace-1234.jsonl and tell me where the loop stalled." — using a Claude Code subagent like the one from CC6 but pointed at trace files instead of source code. Same pattern: a small agent with a tight tool surface (Read + Grep), reasoning over your agent's recorded behavior.

One thing not to do

Don't print message content in production logs without redacting. Tool inputs may contain user PII; tool results often contain DB rows. The same security rules from CC3's CLAUDE.md (never log raw SSN/EIN) apply to your trace files. Use Pii.mask() before writing, or write traces to a separate restricted log stream.

Lab 1 — Your First Agent (5 minutes, no MCP)

Before we wire in custom tools and a backend, build the smallest possible agent: install the SDK, write fewer than 20 lines, run it, see Claude pick up a built-in tool and answer a question about your filesystem. This is the "hello world" of the Agent SDK — everything that follows is configuration on top of this same shape.

Step 1 — Make a working directory and set your API key

mkdir -p ~/cc-labs/first-agent && cd ~/cc-labs/first-agent

export ANTHROPIC_API_KEY=sk-ant-...

Windows PowerShell: $env:ANTHROPIC_API_KEY = "sk-ant-..."

Step 2 — Install the Agent SDK

Pick Python or TypeScript — both flows work for the rest of this lab.

pip install claude-agent-sdk

Or:

npm init -y && npm install @anthropic-ai/claude-agent-sdk && npm install -D tsx typescript

Step 3 — Drop a couple of files for Claude to look at

So the agent has something concrete to read:

echo "# My first agent project" > README.md && echo "claude-agent-sdk" > deps.txt

Step 4 — Write the agent

How it works

query() is the SDK's one-shot agent runner: pass a prompt + options, get back an async stream of messages. The SDK runs the full agent loop internally — if Claude wants to call a tool (here: Read on a file), the SDK executes the tool, feeds the result back, and continues the conversation until Claude is done or you hit max_turns. allowed_tools=["Read"] means this agent can read files but cannot Bash, Edit, or Write — least privilege from the start.

Python:

Python Minimal Agent

~/cc-labs/first-agent/first_agent.py

"""Your first Claude Agent SDK program."""
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions, AssistantMessage, TextBlock


async def main() -> None:
    options = ClaudeAgentOptions(
        system_prompt="You are a concise project explainer.",
        allowed_tools=["Read", "Glob"],
        max_turns=4,
        permission_mode="bypassPermissions",
    )

    prompt = "Read the files in this directory and summarize the project in 2 sentences."

    async for msg in query(prompt=prompt, options=options):
        if isinstance(msg, AssistantMessage):
            for block in msg.content:
                if isinstance(block, TextBlock):
                    print(block.text, end="", flush=True)
    print()


if __name__ == "__main__":
    asyncio.run(main())

TypeScript:

TypeScript Minimal Agent

~/cc-labs/first-agent/first_agent.ts

// Your first Claude Agent SDK program.
import { query } from "@anthropic-ai/claude-agent-sdk";

const stream = query({
  prompt: "Read the files in this directory and summarize the project in 2 sentences.",
  options: {
    systemPrompt: "You are a concise project explainer.",
    allowedTools: ["Read", "Glob"],
    maxTurns: 4,
    permissionMode: "bypassPermissions",
  },
});

for await (const msg of stream) {
  if (msg.type === "assistant") {
    for (const block of msg.message.content) {
      if (block.type === "text") process.stdout.write(block.text);
    }
  }
}
process.stdout.write("\n");

Step 5 — Run it

Python:

python first_agent.py

TypeScript:

npx tsx first_agent.ts

Expected output (will vary by run, but will be 2 sentences referencing your two files):

Output Sample Run

stdout

This project is a starter scaffold for working with the Claude Agent SDK,
described in README.md. Its only declared dependency is `claude-agent-sdk`,
listed in deps.txt.

Step 6 — Watch the tool calls

Add this just below the print(block.text...) line to also print tool calls as they happen (Python):

Python Add to first_agent.py

inside the message loop

from claude_agent_sdk import ToolUseBlock

# add this branch alongside the TextBlock branch:
if isinstance(block, ToolUseBlock):
    print(f"\n[tool: {block.name}({block.input})]\n", end="", flush=True)

Re-run. You'll now see the agent's reasoning trail: a Glob call to discover files, two Read calls (one per file), then the final summary text. That's the loop the SDK is running for you.

Step 7 — Try a destructive prompt and watch the deny

Change the prompt to "Run rm -rf . in this directory." Re-run. The agent will refuse — Bash is not in allowed_tools, so even if Claude wanted to comply, the SDK won't dispatch the call. This is your safety net; the same principle that drove CC4's permission tiers and CC6's read-only auditor.

What Just Happened

You ran a complete agent — system prompt, tool dispatch, multi-turn message accumulation, turn limit — in less than 20 lines. The SDK gave you Claude Code's tool-using capability without the terminal. Lab 2 builds on this exact shape but swaps the built-in Read for two custom tools that hit a real REST API.

Lab 2 — Build a UCC Filings Assistant Agent

You'll build a programmatic agent that answers natural-language questions about UCC filings by calling the PublicRecords API from CC0. The agent uses two custom tools (exposed via a small MCP server) plus the SDK's built-in capabilities. Both Python and TypeScript paths are shown.

Prerequisites

The PublicRecords API from CC0 running locally on http://localhost:8080.
An Anthropic API key. Set it as ANTHROPIC_API_KEY in your environment.
Python 3.10+ or Node.js 20+. (Both shown; pick one to follow.)

Step 1 — Boot the PublicRecords API in one terminal

If it isn't already running, in a terminal at the project root:

cd ~/cc-labs/publicrecords-api && mvn spring-boot:run

Verify with:

curl -s http://localhost:8080/filings | head -c 200

Should return JSON for the eight seeded filings.

Step 2 — Set your API key

export ANTHROPIC_API_KEY=sk-ant-...

(Windows PowerShell: $env:ANTHROPIC_API_KEY = "sk-ant-...".) Persist it in your shell rc to avoid re-exporting every session.

Step 3 — Make a working directory for the agent

mkdir -p ~/cc-labs/ucc-agent && cd ~/cc-labs/ucc-agent

Step 4 — Install the Agent SDK

Pick Python or TypeScript — the lab works for both.

pip install claude-agent-sdk httpx

Or TypeScript:

npm init -y && npm install @anthropic-ai/claude-agent-sdk @modelcontextprotocol/sdk zod

Step 5 — Write the MCP server (custom tools)

How it works

Custom domain tools live in MCP servers — the same protocol you used in CC9. The Agent SDK loads MCP servers as a config option; once loaded, their tools are addressable as mcp__<serverName>__<toolName>. Here we expose two: lookup_filings (with optional state filter) and get_filing (by id). The "server" is a 40-line script that wraps http://localhost:8080.

Python version:

Python MCP Server

~/cc-labs/ucc-agent/ucc_mcp.py

"""UCC Filings MCP server — wraps the PublicRecords API as agent tools."""
import asyncio
import httpx
from claude_agent_sdk import create_sdk_mcp_server, tool

API = "http://localhost:8080"


@tool(
    "lookup_filings",
    "Search UCC filings by 2-letter US state code. Returns up to 50 matches.",
    {"state": str},
)
async def lookup_filings(args: dict) -> dict:
    state = args["state"].upper()
    async with httpx.AsyncClient(timeout=10) as client:
        r = await client.get(f"{API}/filings", params={"state": state})
        r.raise_for_status()
        rows = r.json()
    summary = [
        f"#{f['id']} | {f['state']} | {f['debtorName']} -> {f['securedParty']}"
        for f in rows[:50]
    ]
    return {"content": [{"type": "text", "text": "\n".join(summary) or "no matches"}]}


@tool(
    "get_filing",
    "Fetch the full record of a single UCC filing by numeric id.",
    {"filing_id": int},
)
async def get_filing(args: dict) -> dict:
    async with httpx.AsyncClient(timeout=10) as client:
        r = await client.get(f"{API}/filings/{args['filing_id']}")
        if r.status_code == 404:
            return {"content": [{"type": "text", "text": "filing not found"}]}
        r.raise_for_status()
        f = r.json()
    return {"content": [{"type": "text", "text": str(f)}]}


ucc_server = create_sdk_mcp_server(
    name="ucc",
    version="0.1.0",
    tools=[lookup_filings, get_filing],
)

TypeScript version:

TypeScript MCP Server

~/cc-labs/ucc-agent/ucc_mcp.ts

// UCC Filings MCP server — wraps the PublicRecords API as agent tools.
import { createSdkMcpServer, tool } from "@anthropic-ai/claude-agent-sdk";
import { z } from "zod";

const API = "http://localhost:8080";

export const uccServer = createSdkMcpServer({
  name: "ucc",
  version: "0.1.0",
  tools: [
    tool(
      "lookup_filings",
      "Search UCC filings by 2-letter US state code. Returns up to 50 matches.",
      { state: z.string().length(2) },
      async ({ state }) => {
        const r = await fetch(`${API}/filings?state=${state.toUpperCase()}`);
        if (!r.ok) throw new Error(`API ${r.status}`);
        const rows = (await r.json()) as Array<Record<string, unknown>>;
        const summary = rows
          .slice(0, 50)
          .map((f) => `#${f.id} | ${f.state} | ${f.debtorName} -> ${f.securedParty}`)
          .join("\n");
        return { content: [{ type: "text", text: summary || "no matches" }] };
      },
    ),
    tool(
      "get_filing",
      "Fetch the full record of a single UCC filing by numeric id.",
      { filing_id: z.number().int().positive() },
      async ({ filing_id }) => {
        const r = await fetch(`${API}/filings/${filing_id}`);
        if (r.status === 404) return { content: [{ type: "text", text: "filing not found" }] };
        if (!r.ok) throw new Error(`API ${r.status}`);
        const f = await r.json();
        return { content: [{ type: "text", text: JSON.stringify(f, null, 2) }] };
      },
    ),
  ],
});

Step 6 — Write the agent driver

How it works

The driver is the program that runs the agent loop. ClaudeAgentOptions wires up the MCP server (so its two tools are available to Claude), restricts the tool surface via allowed_tools (no Bash, no Edit — this agent only queries the API), and caps runaway loops with max_turns. The system prompt teaches the agent what UCC means and what tools to reach for.

Python version:

Python Agent Driver

~/cc-labs/ucc-agent/ucc_agent.py

"""UCC Filings Assistant — agent driver that uses the ucc MCP server."""
import asyncio
import sys
from claude_agent_sdk import query, ClaudeAgentOptions, AssistantMessage, TextBlock
from ucc_mcp import ucc_server

SYSTEM = """You are a UCC Filings Assistant for US public records.

Definitions:
- UCC-1 filing: a public lien notice; a "secured party" (lender) holding
  a security interest in collateral belonging to a "debtor" (borrower).

You have two tools:
- mcp__ucc__lookup_filings(state) — list matches for a US state
- mcp__ucc__get_filing(filing_id) — full record for one filing

Be concise. Always cite filing ids in your answer. If the user asks for a
state by full name, infer the 2-letter code (Texas -> TX).
"""


async def main(prompt: str) -> None:
    options = ClaudeAgentOptions(
        system_prompt=SYSTEM,
        mcp_servers={"ucc": ucc_server},
        allowed_tools=["mcp__ucc__lookup_filings", "mcp__ucc__get_filing"],
        max_turns=6,
        permission_mode="bypassPermissions",
    )

    async for msg in query(prompt=prompt, options=options):
        if isinstance(msg, AssistantMessage):
            for block in msg.content:
                if isinstance(block, TextBlock):
                    print(block.text, end="", flush=True)
    print()


if __name__ == "__main__":
    user_q = sys.argv[1] if len(sys.argv) > 1 else "Find me Texas filings."
    asyncio.run(main(user_q))

TypeScript version:

TypeScript Agent Driver

~/cc-labs/ucc-agent/ucc_agent.ts

// UCC Filings Assistant — agent driver that uses the ucc MCP server.
import { query } from "@anthropic-ai/claude-agent-sdk";
import { uccServer } from "./ucc_mcp";

const SYSTEM = `You are a UCC Filings Assistant for US public records.

Definitions:
- UCC-1 filing: a public lien notice; a "secured party" (lender) holding
  a security interest in collateral belonging to a "debtor" (borrower).

You have two tools:
- mcp__ucc__lookup_filings(state) — list matches for a US state
- mcp__ucc__get_filing(filing_id) — full record for one filing

Be concise. Always cite filing ids in your answer. If the user asks for a
state by full name, infer the 2-letter code (Texas -> TX).`;

const prompt = process.argv[2] ?? "Find me Texas filings.";

const stream = query({
  prompt,
  options: {
    systemPrompt: SYSTEM,
    mcpServers: { ucc: uccServer },
    allowedTools: ["mcp__ucc__lookup_filings", "mcp__ucc__get_filing"],
    maxTurns: 6,
    permissionMode: "bypassPermissions",
  },
});

for await (const msg of stream) {
  if (msg.type === "assistant") {
    for (const block of msg.message.content) {
      if (block.type === "text") process.stdout.write(block.text);
    }
  }
}
process.stdout.write("\n");

Step 7 — Run the agent

Python:

python ucc_agent.py "Find Texas filings and tell me which secured party shows up most"

TypeScript (using tsx for one-shot execution):

npx tsx ucc_agent.ts "Find Texas filings and tell me which secured party shows up most"

Expected output (shape will vary by Claude run):

Output Sample Run

stdout

Based on the Texas filings:

#1 — Lone Star Holdings LLC -> First National Bank
#2 — Pecos River Logistics Inc. -> Wells Fargo Equipment Finance

Each Texas filing has a different secured party — no single lender appears
more than once in the Texas subset. The two distinct secured parties are
First National Bank and Wells Fargo Equipment Finance.

What Just Happened

You ran a complete agent loop in production code, no terminal. The SDK:

Loaded your MCP server and registered lookup_filings + get_filing
Sent the prompt + system prompt + tool descriptions to Claude
Watched Claude pick lookup_filings with state="TX", called your tool, fed the result back
Watched Claude reason over the result and produce a final answer
Streamed the assistant text to your stdout

That's everything you'd otherwise hand-roll on top of the Messages API — tool dispatch, message accumulation, turn limits — collapsed into ~30 lines of driver code.

Step 8 — Stretch: connect this same agent to Claude Code as a subagent

The MCP server you just wrote is reusable. Add it to your .mcp.json from CC9, drop a subagent definition that uses these tools, and the same logic now runs inside Claude Code — same code, two delivery channels.

Markdown Subagent

.claude/agents/ucc-assistant.md

---
name: ucc-assistant
description: Answers natural-language questions about UCC public records
  by querying the local PublicRecords API. Use when the user asks about
  filings, debtors, secured parties, or collateral.
tools: mcp__ucc__lookup_filings, mcp__ucc__get_filing
model: haiku
---

You are a UCC Filings Assistant. You have two tools:
- `mcp__ucc__lookup_filings(state)` — list filings for a US state code
- `mcp__ucc__get_filing(filing_id)` — full record for one filing

Be concise. Always cite filing ids in your answer.

Now in Claude Code: "Use the ucc-assistant subagent to find me California filings." Same MCP server, same tools, same logic — routed through the CLI surface instead of your standalone Python program.

Stretch — Add a debtor-name search tool

The FilingRepository from CC0 only supports findByState. Add a derived query List<Filing> findByDebtorNameContainingIgnoreCase(String q), expose it as a new POST /search endpoint, and add a third tool search_debtor in your MCP server. Then ask the agent: "Find any UCC filing for a logistics company."

Lab complete — what you built

A standalone Python or TypeScript program that runs Claude as a tool-using agent against the PublicRecords API — no terminal, no human in the loop. The same MCP server can be reused as a Claude Code subagent for interactive use. You've now seen all three Claude surfaces in one project: Claude Code CLI (CC0–CC14), Agent SDK (this module), and the underlying Messages API (the layer the SDK is built on).

Lab 3 — Build a Visual Debugging UI for Your Agent Runs

The Debug & Observe section earlier in this module showed how to print messages, tool calls, and costs to stdout, and how to dump a repr()-based trace to JSONL. That works for one run. For a real tuning loop — tweak prompt, re-run, compare — stdout becomes a blur and a flat repr trace isn't introspectable. This lab turns the SDK's message stream into a browsable HTML timeline: every run captured, every tool call inspectable, every cost line visible at a glance. About 25 minutes.

How it works

The Agent SDK's query() yields a typed message stream — SystemMessage, AssistantMessage, UserMessage (tool results), ResultMessage. We tap that stream once at the top of the agent loop, serialize each message to a structured dict (not repr), and append it to traces/<timestamp>.jsonl. Then a ~70-line FastAPI app reads those files and renders them as a timeline. Re-run the agent: new trace appears. The viewer polls every 1.5s so you can tail a run live.

You're building the visual debugger the SDK doesn't ship with — small enough to vendor into any repo, transparent enough to extend (filtering, search, side-by-side run comparison).

Prerequisites

Lab 2 complete — you have ucc_mcp.py and ucc_agent.py under ~/cc-labs/ucc-agent/, and the PublicRecords API running on port 8080.
The Python environment from Lab 2 with claude-agent-sdk installed.
One extra dep for the viewer:

pip install fastapi uvicorn

Step 1 — Capture the message stream as structured JSONL

The trace_to_file.py snippet from the Debug section uses repr(msg)[:2000] — great for grep, terrible for a UI. We want each message rendered as a dict so the viewer can introspect content blocks, tool inputs, and costs individually.

Python New module

~/cc-labs/ucc-agent/trace_recorder.py

"""Capture an Agent SDK message stream to structured JSONL for the viewer."""
import json, time
from pathlib import Path
from dataclasses import asdict, is_dataclass

TRACE_DIR = Path("traces")
TRACE_DIR.mkdir(exist_ok=True)


def _serialize(obj):
    """Recursively convert SDK message objects to JSON-safe dicts."""
    if is_dataclass(obj):
        return _serialize(asdict(obj))
    if hasattr(obj, "__dict__"):
        return {k: _serialize(v) for k, v in vars(obj).items()
                if not k.startswith("_")}
    if isinstance(obj, list):
        return [_serialize(x) for x in obj]
    if isinstance(obj, dict):
        return {k: _serialize(v) for k, v in obj.items()}
    return obj  # primitives


class TraceRecorder:
    def __init__(self, label: str = "run"):
        self.path = TRACE_DIR / f"{int(time.time())}-{label}.jsonl"
        self._f = self.path.open("w", encoding="utf-8")

    def record(self, msg):
        entry = {
            "ts": time.time(),
            "type": type(msg).__name__,
            "data": _serialize(msg),
        }
        self._f.write(json.dumps(entry, default=str) + "\n")
        self._f.flush()  # so the viewer can tail live runs

    def close(self):
        self._f.close()
        return self.path

Now tap the loop in ucc_agent.py:

Python Modify driver

~/cc-labs/ucc-agent/ucc_agent.py

from trace_recorder import TraceRecorder

async def main():
    recorder = TraceRecorder(label="ucc-tx-query")
    try:
        async for msg in query(
            prompt="List the first 3 UCC filings in Texas with their debtors.",
            options=options,
        ):
            recorder.record(msg)   # <-- the only new line
            # ... your existing print / stream logic stays the same ...
    finally:
        path = recorder.close()
        print(f"\nTrace written: {path}")

Re-run the agent. A new file appears under traces/:

python ucc_agent.py && ls traces/

1715520138-ucc-tx-query.jsonl

Why JSONL over a database?

Append-only, crash-safe, diffable in git, greppable from the terminal, and the viewer can tail the file during a live run instead of waiting for completion. Teams trade up to SQLite or DuckDB only when they need cross-run analytics — for “what did my agent just do?”, JSONL is the right shape.

Step 2 — Spin up the viewer (FastAPI + a single HTML page)

Two endpoints (/ for the UI, /api/traces + /api/trace/<name> for data), one templated page. The viewer reads traces/ on every request — no DB, no build step.

Python Viewer server

~/cc-labs/ucc-agent/trace_viewer.py

"""Render captured agent traces as an HTML timeline."""
import json
from pathlib import Path
from fastapi import FastAPI, HTTPException
from fastapi.responses import HTMLResponse, JSONResponse

app = FastAPI()
TRACE_DIR = Path("traces")


@app.get("/api/traces")
def list_traces():
    files = sorted(TRACE_DIR.glob("*.jsonl"), reverse=True)
    return [{"name": f.name, "size": f.stat().st_size} for f in files]


@app.get("/api/trace/{name}")
def read_trace(name: str):
    if ".." in name or "/" in name:
        raise HTTPException(400, "invalid name")
    p = TRACE_DIR / name
    if not p.exists():
        raise HTTPException(404)
    entries = [json.loads(line) for line in p.read_text().splitlines() if line.strip()]
    return JSONResponse(entries)


@app.get("/", response_class=HTMLResponse)
def index():
    return Path(__file__).parent.joinpath("viewer.html").read_text()

And the page that does the rendering — one self-contained file, zero build step:

HTML Viewer UI

~/cc-labs/ucc-agent/viewer.html

<!DOCTYPE html>
<html><head><meta charset="utf-8"><title>Agent Trace Viewer</title>
<style>
  body{font:14px/1.5 system-ui;background:#0f172a;color:#e2e8f0;margin:0;display:flex;}
  aside{width:260px;background:#1e293b;height:100vh;overflow-y:auto;padding:1rem;}
  aside a{display:block;padding:0.4rem;color:#94a3b8;text-decoration:none;border-radius:4px;font-size:0.82rem;}
  aside a:hover, aside a.active{background:#334155;color:#fff;}
  main{flex:1;padding:1.5rem 2rem;overflow-y:auto;height:100vh;}
  .msg{border-left:3px solid #475569;padding:0.6rem 1rem;margin:0.5rem 0;background:#1e293b;border-radius:0 6px 6px 0;}
  .msg.SystemMessage{border-color:#64748b;}
  .msg.AssistantMessage{border-color:#60a5fa;}
  .msg.UserMessage{border-color:#a78bfa;}
  .msg.ResultMessage{border-color:#34d399;background:#022c22;}
  .type{font-size:0.7rem;text-transform:uppercase;letter-spacing:0.08em;color:#94a3b8;font-weight:600;}
  .ts{float:right;color:#64748b;font-size:0.75rem;}
  pre{background:#0f172a;padding:0.6rem;border-radius:4px;overflow-x:auto;font-size:0.78rem;margin:0.4rem 0 0;color:#cbd5e1;}
  .tool{color:#fbbf24;font-weight:600;}
  .cost{color:#34d399;font-weight:600;}
  .banner{padding:0.8rem 1rem;border-radius:6px;margin-bottom:1rem;}
</style></head><body>
<aside id="list"><h3 style="margin-top:0;font-size:0.85rem;color:#94a3b8;">Runs</h3></aside>
<main id="trace"><p style="color:#64748b">Pick a run on the left.</p></main>
<script>
const esc = s => String(s).replace(/[&<>]/g, c => ({'&':'&amp;','<':'&lt;','>':'&gt;'}[c]));
async function loadList(){
  const r = await fetch('/api/traces'); const files = await r.json();
  document.getElementById('list').innerHTML +=
    files.map(f => `<a href="#" data-name="${esc(f.name)}">${esc(f.name)}</a>`).join('');
  document.querySelectorAll('aside a').forEach(a => a.onclick = ev => {
    ev.preventDefault();
    document.querySelectorAll('aside a').forEach(x => x.classList.remove('active'));
    a.classList.add('active');
    render(a.dataset.name);
  });
}
function summarize(e){
  const d = e.data;
  if(e.type === 'AssistantMessage' && d.content){
    return d.content.map(c => {
      if(c.type === 'text') return `<div>${esc(c.text || '')}</div>`;
      if(c.type === 'tool_use')
        return `<div class="tool">🔧 ${esc(c.name)}</div><pre>${esc(JSON.stringify(c.input, null, 2))}</pre>`;
      if(c.type === 'thinking')
        return `<em style="color:#94a3b8">thinking: ${esc((c.thinking || '').slice(0,200))}...</em>`;
      return `<pre>${esc(JSON.stringify(c, null, 2))}</pre>`;
    }).join('');
  }
  if(e.type === 'UserMessage' && d.content){
    return d.content.map(c => c.type === 'tool_result'
      ? `<div class="tool">↩ tool_result</div><pre>${esc((typeof c.content === 'string' ? c.content : JSON.stringify(c.content, null, 2)).slice(0,400))}</pre>`
      : `<pre>${esc(JSON.stringify(c, null, 2))}</pre>`
    ).join('');
  }
  if(e.type === 'ResultMessage'){
    return `<div>turns: ${d.num_turns || '?'} · duration: ${((d.duration_ms || 0)/1000).toFixed(2)}s · <span class="cost">cost: $${(d.total_cost_usd || 0).toFixed(4)}</span></div>`;
  }
  return `<pre>${esc(JSON.stringify(d, null, 2).slice(0,300))}</pre>`;
}
async function render(name){
  const r = await fetch('/api/trace/' + encodeURIComponent(name));
  const entries = await r.json();
  const result = entries.find(e => e.type === 'ResultMessage');
  const banner = result
    ? `<div class="banner" style="background:#022c22">✅ ${result.data.num_turns || '?'} turns · ${((result.data.duration_ms || 0)/1000).toFixed(2)}s · <span class="cost">$${(result.data.total_cost_usd || 0).toFixed(4)}</span></div>`
    : `<div class="banner" style="background:#451a03;color:#fbbf24">⏳ Run in progress...</div>`;
  const body = entries.map(e => {
    const t = new Date(e.ts * 1000).toLocaleTimeString();
    return `<div class="msg ${e.type}"><span class="ts">${t}</span><div class="type">${e.type}</div>${summarize(e)}</div>`;
  }).join('');
  document.getElementById('trace').innerHTML = banner + body;
}
loadList();
// Live tail: re-render the active trace every 1.5s
setInterval(() => { const a = document.querySelector('aside a.active'); if(a) render(a.dataset.name); }, 1500);
</script></body></html>

Step 3 — Run it

uvicorn trace_viewer:app --reload --port 7000

Open http://localhost:7000. The sidebar lists every JSONL file under traces/; click one to render the timeline. Each message is color-coded by type:

Gray (SystemMessage) — init: model, MCP server load, available tools.
Blue (AssistantMessage) — Claude's text + tool_use blocks. Tool calls render as 🔧 with their input args expanded.
Purple (UserMessage) — tool_result blocks. The response your MCP tool returned, truncated to 400 chars.
Green (ResultMessage) — turns, duration, total cost. The receipt at the end of the run, also rolled up into the banner at the top.

Step 4 — Watch a live run

Open the viewer in one window. In another terminal, re-run the agent against a different prompt:

python ucc_agent.py

Refresh the viewer once — the new file appears in the sidebar. Click it; the viewer's setInterval polls every 1.5s and the recorder flush()es after every message, so the timeline streams in during the run. This is the inner-loop debug workflow: tweak prompt → re-run → watch tool selection in real time without scrolling stdout.

What Just Happened

You've replaced “tail the stdout” with “watch a structured timeline.” The same agent code runs unchanged — the recorder is a passive observer of the message stream. Anything the SDK emits, you can see. Anything Claude decided, you can trace.

Step 5 — Inspect a tool-selection bug end-to-end

Try this prompt against your agent: “How many filings are there in TX?”. Watch the timeline.

The first AssistantMessage shows Claude calling mcp__ucc__lookup_filings with state="TX" — expand the tool_use block.
The UserMessage that follows is the tool_result — the actual JSON your MCP server returned. Look at the shape.
The next AssistantMessage is Claude reasoning over that result. Did it count correctly? Did it cite a filing ID it shouldn't have?
The ResultMessage shows total cost. If a single Texas query cost $0.04, you're paying for too many turns — tighten the system prompt or add a count_filings tool to skip the listing step.

This is the debugging loop that text-streamed stdout makes painful and a UI makes obvious.

Step 6 — Extend: side-by-side run comparison

Use case: you changed the system prompt and want to see whether tool selection changed. Add a URL parameter to render two runs in columns:

http://localhost:7000/?a=1715520138-tx-query.jsonl&b=1715520245-tx-query.jsonl

This is left as an extension (the viewer is intentionally small). Reading new URLSearchParams, fetching both traces, and rendering into display:grid with two columns is ~25 lines of JS. The real payoff: catching prompt-change regressions before they reach your eval suite (CC11).

Step 7 — The production-grade path

For teams beyond one developer, swap the local viewer for a hosted observability platform. The TraceRecorder abstraction stays — you change only the sink:

Tool	How you'd wire it	Pick when
Langfuse (self-host or cloud)	Send each message as an `observation` via their Python SDK; one agent run = one trace.	You want hosted UI, multi-user access, eval scoring overlays.
OpenTelemetry + Honeycomb / Datadog / Jaeger	Wrap the loop in a span; emit one child span per tool call with input/output as attributes.	You already have OTel in your stack and want one pane of glass for app + agent.
This viewer + S3	Upload JSONL to S3 on run completion; viewer reads from S3 instead of local disk.	You want zero vendor lock-in and team-wide trace history.

For most teams the JSONL viewer is enough for months — ship it, learn what you actually need, then graduate.

Same security rules apply

Tool inputs may contain user PII; tool results often contain DB rows. Don't write trace files into directories that ship with logs to third parties without redaction. Apply the same masking rules the “One thing not to do” box in the Debug section called out — the viewer is convenient, which makes accidental PII exposure easier.

Lab complete — what you built

A self-hosted, zero-build, real-time visual debugger for Agent SDK runs. Recorder (~30 lines) hooks the message stream and writes structured JSONL. Viewer (1 FastAPI file + 1 HTML file, ~140 lines total) renders the JSONL as a color-coded timeline with tool-call inspection, cost roll-up, and a 1.5s live tail. The recorder works against any SDK agent; the viewer renders any trace. You've turned the opaque async iterator into a tool you can pair-debug agents with.

Knowledge Check

1. Your team needs a nightly cron job that audits a Postgres warehouse for new UCC filings and posts to Slack. Which Claude surface?

A

Claude Code CLI — just claude -p "audit and post to Slack" in a cron entry.

B

Claude Agent SDK — a Python program with Postgres + Slack MCP servers and the audit logic in code.

C

Anthropic Messages API — you'll write the tool loop and dispatch from scratch.

Correct. Background jobs that need tools, MCP, and an agent loop — without a terminal — are the SDK's sweet spot. CLI works in a pinch but you lose programmatic control over the loop, retries, and observability.

Look again. The cron job is unattended — no terminal. You want the SDK's programmatic control + ergonomic agent loop, not the CLI's interactive surface or the bare Messages API's hand-rolled loop.

2. In the lab, what's the role of `create_sdk_mcp_server` / `createSdkMcpServer`?

A

It launches the Claude Code interactive UI in a child process.

B

It packages your tool definitions into an in-process MCP server the SDK can load via mcp_servers.

C

It registers the agent with the Anthropic API so it can be invoked remotely.

Correct. The function returns an MCP server you wire into options.mcp_servers; the SDK runs it in-process, so there's no separate subprocess to manage.

Look again. It's the bridge between your @tool-decorated functions and the SDK's MCP loader. No remote registration, no UI — just an in-process server.

3. `allowed_tools=["mcpucclookup_filings", "mcpuccget_filing"]`. Why this list and not `None` (allow everything)?

A

The SDK requires it — agents won't run without an explicit list.

B

Defense in depth. With no list, the agent could call Bash/Edit/Write — capabilities a UCC lookup agent doesn't need and shouldn't have.

C

It's required for the MCP server to load correctly.

Correct. Same principle as CC4's permission tiers and CC6's tool-scoped subagents: an agent should be able to do its job and nothing else. A read-only data agent has no business running shell commands or editing files.

Look again. The SDK runs fine without an allowed_tools list. You write one to limit the agent's surface area — the same least-privilege principle that drove CC4's deny rules and CC6's read-only auditor subagent.

4. The Agent SDK and the Anthropic Messages API both let you build agents. What does the SDK do for you that the Messages API doesn't?

A

It uses a different model with better tool use.

B

It hosts your code in the cloud so you don't run a server.

C

It runs the agent loop (tool dispatch, multi-turn message accumulation, MCP integration, file system tools) so you don't hand-roll it.

Correct. Same model under the hood. The SDK's value is the agent scaffolding — the tool loop, MCP integration, built-in Read/Write/Bash, session handling. Pick Messages API only when you need full custom control.

Look again. Same model, same network, your code still runs locally. The SDK's job is to handle the agent loop and tool plumbing for you.

5. You wrote the `ucc_mcp.py` file once. Where can it be reused?

A

Only inside the Agent SDK driver from Step 6.

B

Only inside Claude Code as a project MCP server.

C

Both: the same MCP server can power your standalone agent (SDK driver) AND a Claude Code subagent invoked at the terminal.

Correct. MCP is the portability layer. One server definition, many consumers — SDK driver, CLI subagent, third-party MCP clients. Write tools once, reuse everywhere.

Look again. MCP is deliberately tool-agnostic about what consumes it. The Step 8 stretch shows the same ucc_mcp running inside a Claude Code subagent — same code, different delivery channel.

Summary — You've Reached the End of the Track

Sixteen modules and you've used Claude across all three programmer-facing surfaces:

Claude Code CLI (CC0–CC14) — interactive pair-programming, slash commands, subagents, hooks, MCP, headless CI.
Claude Agent SDK (CC15) — programmatic agent loop with the same primitives.
Anthropic Messages API (under both) — the foundation when you need maximum control over the loop.

Same project, same domain, same Java/Spring backend. The lessons compose: the CLAUDE.md from CC3 ships with your repo and is read by every Claude surface that opens that repo. The MCP server from CC9 powers both the CLI and your custom Agent SDK driver. The pii-auditor subagent from CC6 runs in the GitHub Action from CC14 and can also be invoked from a standalone SDK program. You write the tools once and use them everywhere.

Where to go next

Build a tiny in-house agent for your team's actual data — replace UCC filings with whatever domain you work in.
Add Postgres, GitHub, or Slack MCP servers to your SDK driver to handle multi-system workflows.
Wire your SDK agent into your CI alongside the headless CLI from CC14 — the SDK's programmatic control opens up workflows the CLI can't express cleanly.

CC15: Building Custom Agents with the Claude Agent SDK

Learning Objectives

Why the Agent SDK?

SDK vs Claude Code CLI vs Anthropic Messages API

Core Concepts

The agent loop, done for you

Tools come from three places

The control surfaces you've already learned

SDK Features — What You Can Configure

Two entry points

Built-in tools

Custom tools (two routes)

Permissions & runtime gating

Hooks (lifecycle events)

Sessions & conversation

System prompt & settings sources

Knobs you'll reach for often

Output: what you receive in the message stream

Debugging & Observability — What's My Agent Actually Doing?

The four message types in the stream

An instrumented agent loop you can copy

The five debugging scenarios you'll hit

Persist traces for after-the-fact debugging

Hooks for pre-emptive debugging

Reach for the inspector subagent

Lab 1 — Your First Agent (5 minutes, no MCP)

Step 1 — Make a working directory and set your API key

Step 2 — Install the Agent SDK

Step 3 — Drop a couple of files for Claude to look at

Step 4 — Write the agent

Step 5 — Run it

Step 6 — Watch the tool calls

Step 7 — Try a destructive prompt and watch the deny

Lab 2 — Build a UCC Filings Assistant Agent

Prerequisites

Step 1 — Boot the PublicRecords API in one terminal

Step 2 — Set your API key

Step 3 — Make a working directory for the agent

Step 4 — Install the Agent SDK

Step 5 — Write the MCP server (custom tools)

Step 6 — Write the agent driver

Step 7 — Run the agent

Step 8 — Stretch: connect this same agent to Claude Code as a subagent

Stretch — Add a debtor-name search tool

Lab 3 — Build a Visual Debugging UI for Your Agent Runs

Prerequisites

Step 1 — Capture the message stream as structured JSONL

Step 2 — Spin up the viewer (FastAPI + a single HTML page)

Step 3 — Run it

Step 4 — Watch a live run

Step 5 — Inspect a tool-selection bug end-to-end

Step 6 — Extend: side-by-side run comparison

Step 7 — The production-grade path

Knowledge Check

1. Your team needs a nightly cron job that audits a Postgres warehouse for new UCC filings and posts to Slack. Which Claude surface?

2. In the lab, what's the role of create_sdk_mcp_server / createSdkMcpServer?

3. allowed_tools=["mcp__ucc__lookup_filings", "mcp__ucc__get_filing"]. Why this list and not None (allow everything)?

4. The Agent SDK and the Anthropic Messages API both let you build agents. What does the SDK do for you that the Messages API doesn't?

5. You wrote the ucc_mcp.py file once. Where can it be reused?

Summary — You've Reached the End of the Track

Where to go next

2. In the lab, what's the role of `create_sdk_mcp_server` / `createSdkMcpServer`?

3. `allowed_tools=["mcpucclookup_filings", "mcpuccget_filing"]`. Why this list and not `None` (allow everything)?

5. You wrote the `ucc_mcp.py` file once. Where can it be reused?