CC15: Building Custom Agents with the Claude Agent SDK
The CC track has been about using Claude Code at the terminal. This final module flips it: you'll build a programmatic agent of your own using the Claude Agent SDK — the same engine that powers Claude Code, exposed as a Python and TypeScript library. By the end you'll have a UCC Filings Assistant agent that wraps the PublicRecords API from CC0 with custom tools.
Learning Objectives
- Understand what the Claude Agent SDK is and how it relates to Claude Code, the Anthropic SDK, and MCP.
- Decide when to reach for the Agent SDK vs the Claude Code CLI vs the raw Anthropic Messages API.
- Set up the Agent SDK in Python and TypeScript.
- Wrap an existing REST API as custom tools the agent can call (via an in-process MCP server).
- Run a complete agent loop — tools, context, and turn limits — against a real backend.
Why the Agent SDK?
You've spent fifteen modules at the terminal driving Claude Code interactively. That works for software engineering tasks where a human is in the loop. But most "agent" use cases are programs that nobody is interactively typing into — a GitHub Action that reviews PRs, a Slack bot that triages support tickets, a nightly job that audits a codebase, a customer-facing chat surface that handles UCC filing questions.
For those, you want Claude Code's capabilities — tool use, file system access, MCP servers, subagents, hooks, sessions — without the interactive terminal. That's the Claude Agent SDK.
The Agent SDK is the same engine as Claude Code, packaged as a library. pip install claude-agent-sdk (Python) or npm install @anthropic-ai/claude-agent-sdk (TypeScript). You configure an agent with options (system prompt, allowed tools, MCP servers, max turns), call query() with a prompt, and stream messages back. Tool use, file edits, subprocess calls, MCP — all the same primitives you've been using interactively, now under your code's control.
SDK vs Claude Code CLI vs Anthropic Messages API
Three different surfaces. Pick by what you're building.
| Surface | Use when | Example |
|---|---|---|
| Claude Code CLI | A human is at the terminal pair-programming. Interactive sessions, slash commands, file diff review. | claude at your terminal — everything CC0–CC14. |
| Claude Agent SDK | A program needs Claude Code's capabilities (tools, file system, MCP, subagents) without the terminal. Background jobs, scheduled tasks, custom apps with agentic logic. | A nightly compliance bot that scans a Postgres warehouse for new UCC filings and posts to Slack. |
| Anthropic Messages API | You want Claude's intelligence but you're managing the agent loop yourself — custom tool execution, custom retry logic, custom token streaming. Maximum control, more code. | A real-time chat UI where you've already got your own tool-call orchestration and just want Claude as the model. |
The Agent SDK sits between the CLI (most opinionated, easiest) and the Messages API (most flexible, most code). For most "I want an agent that does X" problems, the SDK is the right starting point.
Core Concepts
The agent loop, done for you
The Messages API gives you one round-trip: send messages, get a response. If the response wants to call a tool, you write the code that runs the tool, packs the result into the next message, and re-sends. The Agent SDK runs that loop for you: you call query() (or use a ClaudeSDKClient), it returns a stream of messages (assistant text, tool calls, tool results, thinking) and stops when Claude is done or hits max_turns.
Tools come from three places
- Built-in: Read, Write, Edit, Bash, Grep, Glob, Task, WebFetch, WebSearch, NotebookEdit — the same tools Claude Code has. Toggle individually via
allowed_tools/disallowed_tools. - MCP servers: same Model Context Protocol you used in CC9. Define a small server (in-process via
create_sdk_mcp_server, or external stdio/HTTP/SSE), then point the SDK at it viamcp_servers. - Subagents:
.claude/agents/*.mdfiles from CC6, OR programmatically via theagentsconfig option. The SDK delegates via theTasktool, same as the CLI.
The control surfaces you've already learned
system_prompt, allowed_tools, permission_mode, max_turns, mcp_servers, cwd — same vocabulary as Claude Code. If you understood the CLI flags from CC4 (permissions) and CC14 (headless mode), you already understand the SDK options.
SDK Features — What You Can Configure
Everything below lives on the ClaudeAgentOptions object (Python) or the options field of query() (TypeScript). This is the SDK's full configuration surface. Most agents use 3–5 of these; the rest are there when you need them.
Two entry points
| API | Use when | Looks like |
|---|---|---|
query(prompt, options) | One-shot or single-turn streaming runs (CI jobs, scripts, batch processing) | Async iterator over messages; stops when Claude is done |
ClaudeSDKClient(options) | Multi-turn conversations, persistent sessions, follow-up questions | Class with query() + receive_response(); survives across calls |
Built-in tools
| Tool | What it does | Notes |
|---|---|---|
Read | Read a file from disk | Used in Lab 1 |
Write | Create / overwrite a file | Gated by permission_mode |
Edit | String-replace inside a file | Most common write surface |
Bash | Execute a shell command | Highest-blast-radius tool — restrict via deny rules |
Grep | Ripgrep across the repo | Fast, structured |
Glob | File-name pattern matching | Used in Lab 1 |
Task | Delegate to a subagent | The handoff mechanism for agents |
WebFetch / WebSearch | Pull a URL, search the web | Off by default in some configs |
NotebookEdit | Edit Jupyter .ipynb cells | Specialized; rarely needed outside data work |
Restrict via allowed_tools=[...] (whitelist) or disallowed_tools=[...] (deny specific ones). MCP tools follow the naming pattern mcp__<serverName>__<toolName>.
Custom tools (two routes)
| Route | When |
|---|---|
In-process MCP via @tool decorator + create_sdk_mcp_server | Tools written in the same language as your driver. Lab 2 uses this. |
| External MCP (stdio, HTTP, SSE) | Tools provided by another team / language / vendor. Same protocol the Postgres MCP from CC9 uses. |
Permissions & runtime gating
permission_mode:"default"(prompt for risky calls) /"acceptEdits"(auto-approve file edits) /"plan"(read-only planning) /"bypassPermissions"(no prompts — CI mode).can_use_toolcallback: a Python/TS function the SDK invokes per tool call. Return{"behavior": "allow"}or{"behavior": "deny", "message": "..."}. Use this for runtime decisions that depend on tool args (e.g., denyBashcalls that touch~/.ssh).
Hooks (lifecycle events)
Same model as CC7's CLI hooks, but the handlers are Python/TS callables instead of shell commands. Configure via the hooks dict:
| Event | Fires when | Common uses |
|---|---|---|
PreToolUse | Before a tool call dispatches | Block dangerous calls; rewrite arguments; log |
PostToolUse | After a tool call returns | Format files, run validators, audit |
UserPromptSubmit | When a user prompt is received | Inject context, enforce input filters |
Stop / SubagentStop | When the (sub)agent finishes | Persist results, notify, cleanup |
PreCompact | Before context compaction | Save important context to memory |
Sessions & conversation
- Multi-turn: use
ClaudeSDKClient; each.query()continues the same conversation. - Resume: pass
resume=<session_id>in options to pick up a previous session by id. - Continue:
continue_conversation=Trueresumes the most recent session in the working directory.
System prompt & settings sources
system_prompt: replace the default agent system prompt entirely.append_system_prompt: keep the default and add your own instructions on top.setting_sources: loadCLAUDE.md,.claude/settings.json, slash commands, and subagents from disk — the same files CC3, CC4, CC5, CC6 created. Lets you build an SDK driver that respects everything the team has already configured for the CLI.
Knobs you'll reach for often
| Option | Purpose |
|---|---|
max_turns | Cap on agent loop iterations — safety against runaway tool calls |
model | Pick Opus / Sonnet / Haiku for cost/latency trade-offs |
cwd | Working directory the agent's tools operate from |
env | Environment variables passed to spawned subprocesses (Bash, MCP servers) |
max_thinking_tokens | Budget for extended thinking (reasoning models) |
add_dirs | Extra directories the agent's file tools may access (beyond cwd) |
Output: what you receive in the message stream
Each iteration of the async stream yields one of:
SystemMessage— session metadata at startAssistantMessage— Claude's response, containingTextBlock/ToolUseBlock/ThinkingBlockUserMessage— tool-result messages the SDK feeds back to Claude (visible to you for logging)ResultMessage— final summary with cost, duration, success/failure
Lab 1 uses: query(), built-in Read + Glob, system_prompt, allowed_tools, max_turns, permission_mode, async message iteration with TextBlock + ToolUseBlock.
Lab 2 adds: in-process MCP server with @tool + create_sdk_mcp_server, the mcp_servers option, MCP tool naming (mcp__ucc__lookup_filings), and the bridge from SDK driver to a Claude Code subagent definition.
What's not exercised in the labs but listed here for completeness: can_use_tool runtime gating, hooks, multi-turn ClaudeSDKClient, session resume, settings sources. These are the right next features to reach for once you have a working agent.
Debugging & Observability — What's My Agent Actually Doing?
An agent loop has more moving parts than a normal program: Claude picks a tool, the SDK dispatches it, the tool returns a result, Claude reads the result, picks another tool or answers. When something looks wrong — the agent loops forever, picks the wrong tool, ignores instructions, or burns way too many tokens — you need visibility into every step of the loop. The SDK gives you that visibility for free; you just have to print it.
The four message types in the stream
Every iteration of the SDK's async stream yields exactly one of these. Inspecting them is the foundation of debugging any agent.
| Message type | What it tells you | Inside |
|---|---|---|
SystemMessage | Session metadata at start — session id, model, tool inventory, working dir | One per session. Useful for confirming setup loaded right. |
AssistantMessage | Claude's turn: text it's saying, tools it's calling, thinking it's doing | TextBlock, ToolUseBlock, ThinkingBlock |
UserMessage | Tool results the SDK fed back to Claude after a tool call | ToolResultBlock |
ResultMessage | Final summary at end — cost, duration, turn count, success flag | duration_ms, num_turns, total_cost_usd, is_error, session_id |
An instrumented agent loop you can copy
Drop this in place of the simple print(block.text) from Lab 1 and you'll see everything the agent does — what tools it calls, with what args, what came back, how much it cost. This is the single most useful 30 lines for debugging any SDK program.
"""Run any agent with full message-stream visibility."""
import asyncio
import json
from claude_agent_sdk import (
query, ClaudeAgentOptions,
SystemMessage, AssistantMessage, UserMessage, ResultMessage,
TextBlock, ToolUseBlock, ToolResultBlock, ThinkingBlock,
)
async def run_agent(prompt: str, options: ClaudeAgentOptions) -> None:
print(f"\n>>> PROMPT: {prompt}\n")
async for msg in query(prompt=prompt, options=options):
if isinstance(msg, SystemMessage):
print(f"[system] session={msg.data.get('session_id')[:8]}... "
f"model={msg.data.get('model')} "
f"tools={len(msg.data.get('tools', []))}")
elif isinstance(msg, AssistantMessage):
for block in msg.content:
if isinstance(block, TextBlock):
print(f"[claude] {block.text}")
elif isinstance(block, ToolUseBlock):
args = json.dumps(block.input, indent=None)[:120]
print(f"[tool-call] {block.name}({args})")
elif isinstance(block, ThinkingBlock):
print(f"[thinking] {block.thinking[:200]}...")
elif isinstance(msg, UserMessage):
for block in msg.content:
if isinstance(block, ToolResultBlock):
body = str(block.content)[:200]
status = "ERROR" if block.is_error else "ok"
print(f"[tool-result {status}] {body}")
elif isinstance(msg, ResultMessage):
print(f"\n[done] turns={msg.num_turns} "
f"duration={msg.duration_ms}ms "
f"cost=${msg.total_cost_usd:.4f} "
f"error={msg.is_error}")
if __name__ == "__main__":
options = ClaudeAgentOptions(
allowed_tools=["Read", "Glob"],
max_turns=5,
permission_mode="bypassPermissions",
)
asyncio.run(run_agent("Summarize files in this dir.", options))
Sample output when you run it:
>>> PROMPT: Summarize files in this dir.
[system] session=a3f2c8d1... model=claude-opus-4-7 tools=14
[tool-call] Glob({"pattern": "*"})
[tool-result ok] README.md
deps.txt
first_agent.py
[tool-call] Read({"file_path": "/Users/me/cc-labs/first-agent/README.md"})
[tool-result ok] # My first agent project
[tool-call] Read({"file_path": "/Users/me/cc-labs/first-agent/deps.txt"})
[tool-result ok] claude-agent-sdk
[claude] This directory contains a starter project using the Claude Agent SDK.
The README labels it "My first agent project" and deps.txt declares one
dependency: claude-agent-sdk.
[done] turns=4 duration=3287ms cost=$0.0093 error=false
Now you know exactly what happened: 1 Glob to discover, 2 Reads to inspect, 1 final answer, 4 turns total, ~3 seconds, under a cent.
The five debugging scenarios you'll hit
| Symptom | What to inspect | Common fix |
|---|---|---|
| Agent loops forever | Tool-call args repeat with the same parameters; num_turns climbs without progress. |
Set or lower max_turns. Then look at the loop — usually the tool result doesn't have what Claude wanted, and Claude keeps retrying. Improve the tool's error messages, or revise the system prompt to tell Claude what to do when the tool returns a specific failure. |
| Agent calls the wrong tool | Tool description and tool name. Claude picks tools by description. | Rewrite the tool's description to explicitly say what it's for AND what it's NOT for. Example: "Search filings by state. Use this for state-level queries; do NOT use for debtor-name searches — use search_debtor for that." |
| Agent ignores your system prompt | Whether you used system_prompt (replaces) vs append_system_prompt (adds to default). |
Use append_system_prompt if you want Claude Code's default agent behavior plus your additions; system_prompt if you want a clean slate. Mixing them up is the #1 cause of "Claude isn't following my rules." |
| Tool fails silently | Look for UserMessage with ToolResultBlock.is_error == True. |
Wrap your tool body in try/except and return {"content": [{"type": "text", "text": f"error: {e}"}], "is_error": True}. Claude reads the error and can decide whether to retry or give up. |
| Cost too high | ResultMessage.total_cost_usd — track per session; aggregate across runs. |
Set model="claude-haiku-4-5-20251001" for cheap iterations during development, swap to Sonnet/Opus for prod. Consider whether some subagent calls can use Haiku. |
Persist traces for after-the-fact debugging
For agents running in CI or scheduled jobs, you can't watch stdout live. Write each message to a JSON Lines file so you can replay the trace later:
import json, time
from pathlib import Path
from claude_agent_sdk import query, ClaudeAgentOptions
trace_file = Path(f"trace-{int(time.time())}.jsonl")
async def traced(prompt, options):
with trace_file.open("w") as f:
async for msg in query(prompt=prompt, options=options):
# SDK message types are dataclasses with .__dict__-friendly contents.
entry = {"ts": time.time(), "type": type(msg).__name__,
"raw": repr(msg)[:2000]}
f.write(json.dumps(entry) + "\n")
f.flush() # crash-safe
print(f"trace written to {trace_file}")
The flush() call matters — if your agent crashes mid-loop, you still have everything up to the failure on disk.
Hooks for pre-emptive debugging
The features section listed PreToolUse and PostToolUse hooks. They're not just for production policy — they're great for debugging. A PreToolUse hook that logs every tool's arguments to a file gives you a permanent audit trail without changing your agent code:
from claude_agent_sdk import HookMatcher, ClaudeAgentOptions
async def log_tool_call(input_data, tool_use_id, context):
print(f"[hook] {input_data['tool_name']} -> {input_data['tool_input']}")
return {} # empty dict = allow, no modifications
options = ClaudeAgentOptions(
hooks={
"PreToolUse": [HookMatcher(matcher="*", hooks=[log_tool_call])],
},
)
You now have three ways to see what your agent is doing: instrumented stream loop (synchronous, in-process), JSONL trace (async, persistent), and hooks (declarative, intercept-style). Use whichever fits your situation.
Reach for the inspector subagent
For complex multi-step debugging, delegate the trace analysis to Claude itself. Once you've persisted a JSONL trace, you can ask: "@trace-inspector load trace-1234.jsonl and tell me where the loop stalled." — using a Claude Code subagent like the one from CC6 but pointed at trace files instead of source code. Same pattern: a small agent with a tight tool surface (Read + Grep), reasoning over your agent's recorded behavior.
Don't print message content in production logs without redacting. Tool inputs may contain user PII; tool results often contain DB rows. The same security rules from CC3's CLAUDE.md (never log raw SSN/EIN) apply to your trace files. Use Pii.mask() before writing, or write traces to a separate restricted log stream.
Lab 1 — Your First Agent (5 minutes, no MCP)
Before we wire in custom tools and a backend, build the smallest possible agent: install the SDK, write fewer than 20 lines, run it, see Claude pick up a built-in tool and answer a question about your filesystem. This is the "hello world" of the Agent SDK — everything that follows is configuration on top of this same shape.
Step 1 — Make a working directory and set your API key
mkdir -p ~/cc-labs/first-agent && cd ~/cc-labs/first-agentexport ANTHROPIC_API_KEY=sk-ant-...Windows PowerShell: $env:ANTHROPIC_API_KEY = "sk-ant-..."
Step 2 — Install the Agent SDK
Pick Python or TypeScript — both flows work for the rest of this lab.
pip install claude-agent-sdkOr:
npm init -y && npm install @anthropic-ai/claude-agent-sdk && npm install -D tsx typescriptStep 3 — Drop a couple of files for Claude to look at
So the agent has something concrete to read:
echo "# My first agent project" > README.md && echo "claude-agent-sdk" > deps.txtStep 4 — Write the agent
query() is the SDK's one-shot agent runner: pass a prompt + options, get back an async stream of messages. The SDK runs the full agent loop internally — if Claude wants to call a tool (here: Read on a file), the SDK executes the tool, feeds the result back, and continues the conversation until Claude is done or you hit max_turns. allowed_tools=["Read"] means this agent can read files but cannot Bash, Edit, or Write — least privilege from the start.
Python:
"""Your first Claude Agent SDK program."""
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions, AssistantMessage, TextBlock
async def main() -> None:
options = ClaudeAgentOptions(
system_prompt="You are a concise project explainer.",
allowed_tools=["Read", "Glob"],
max_turns=4,
permission_mode="bypassPermissions",
)
prompt = "Read the files in this directory and summarize the project in 2 sentences."
async for msg in query(prompt=prompt, options=options):
if isinstance(msg, AssistantMessage):
for block in msg.content:
if isinstance(block, TextBlock):
print(block.text, end="", flush=True)
print()
if __name__ == "__main__":
asyncio.run(main())
TypeScript:
// Your first Claude Agent SDK program.
import { query } from "@anthropic-ai/claude-agent-sdk";
const stream = query({
prompt: "Read the files in this directory and summarize the project in 2 sentences.",
options: {
systemPrompt: "You are a concise project explainer.",
allowedTools: ["Read", "Glob"],
maxTurns: 4,
permissionMode: "bypassPermissions",
},
});
for await (const msg of stream) {
if (msg.type === "assistant") {
for (const block of msg.message.content) {
if (block.type === "text") process.stdout.write(block.text);
}
}
}
process.stdout.write("\n");
Step 5 — Run it
Python:
python first_agent.pyTypeScript:
npx tsx first_agent.tsExpected output (will vary by run, but will be 2 sentences referencing your two files):
This project is a starter scaffold for working with the Claude Agent SDK,
described in README.md. Its only declared dependency is `claude-agent-sdk`,
listed in deps.txt.
Step 6 — Watch the tool calls
Add this just below the print(block.text...) line to also print tool calls as they happen (Python):
from claude_agent_sdk import ToolUseBlock
# add this branch alongside the TextBlock branch:
if isinstance(block, ToolUseBlock):
print(f"\n[tool: {block.name}({block.input})]\n", end="", flush=True)
Re-run. You'll now see the agent's reasoning trail: a Glob call to discover files, two Read calls (one per file), then the final summary text. That's the loop the SDK is running for you.
Step 7 — Try a destructive prompt and watch the deny
Change the prompt to "Run rm -rf . in this directory." Re-run. The agent will refuse — Bash is not in allowed_tools, so even if Claude wanted to comply, the SDK won't dispatch the call. This is your safety net; the same principle that drove CC4's permission tiers and CC6's read-only auditor.
You ran a complete agent — system prompt, tool dispatch, multi-turn message accumulation, turn limit — in less than 20 lines. The SDK gave you Claude Code's tool-using capability without the terminal. Lab 2 builds on this exact shape but swaps the built-in Read for two custom tools that hit a real REST API.
Lab 2 — Build a UCC Filings Assistant Agent
You'll build a programmatic agent that answers natural-language questions about UCC filings by calling the PublicRecords API from CC0. The agent uses two custom tools (exposed via a small MCP server) plus the SDK's built-in capabilities. Both Python and TypeScript paths are shown.
Prerequisites
- The PublicRecords API from CC0 running locally on
http://localhost:8080. - An Anthropic API key. Set it as
ANTHROPIC_API_KEYin your environment. - Python 3.10+ or Node.js 20+. (Both shown; pick one to follow.)
Step 1 — Boot the PublicRecords API in one terminal
If it isn't already running, in a terminal at the project root:
cd ~/cc-labs/publicrecords-api && mvn spring-boot:runVerify with:
curl -s http://localhost:8080/filings | head -c 200Should return JSON for the eight seeded filings.
Step 2 — Set your API key
export ANTHROPIC_API_KEY=sk-ant-...(Windows PowerShell: $env:ANTHROPIC_API_KEY = "sk-ant-...".) Persist it in your shell rc to avoid re-exporting every session.
Step 3 — Make a working directory for the agent
mkdir -p ~/cc-labs/ucc-agent && cd ~/cc-labs/ucc-agentStep 4 — Install the Agent SDK
Pick Python or TypeScript — the lab works for both.
pip install claude-agent-sdk httpxOr TypeScript:
npm init -y && npm install @anthropic-ai/claude-agent-sdk @modelcontextprotocol/sdk zodStep 5 — Write the MCP server (custom tools)
Custom domain tools live in MCP servers — the same protocol you used in CC9. The Agent SDK loads MCP servers as a config option; once loaded, their tools are addressable as mcp__<serverName>__<toolName>. Here we expose two: lookup_filings (with optional state filter) and get_filing (by id). The "server" is a 40-line script that wraps http://localhost:8080.
Python version:
"""UCC Filings MCP server — wraps the PublicRecords API as agent tools."""
import asyncio
import httpx
from claude_agent_sdk import create_sdk_mcp_server, tool
API = "http://localhost:8080"
@tool(
"lookup_filings",
"Search UCC filings by 2-letter US state code. Returns up to 50 matches.",
{"state": str},
)
async def lookup_filings(args: dict) -> dict:
state = args["state"].upper()
async with httpx.AsyncClient(timeout=10) as client:
r = await client.get(f"{API}/filings", params={"state": state})
r.raise_for_status()
rows = r.json()
summary = [
f"#{f['id']} | {f['state']} | {f['debtorName']} -> {f['securedParty']}"
for f in rows[:50]
]
return {"content": [{"type": "text", "text": "\n".join(summary) or "no matches"}]}
@tool(
"get_filing",
"Fetch the full record of a single UCC filing by numeric id.",
{"filing_id": int},
)
async def get_filing(args: dict) -> dict:
async with httpx.AsyncClient(timeout=10) as client:
r = await client.get(f"{API}/filings/{args['filing_id']}")
if r.status_code == 404:
return {"content": [{"type": "text", "text": "filing not found"}]}
r.raise_for_status()
f = r.json()
return {"content": [{"type": "text", "text": str(f)}]}
ucc_server = create_sdk_mcp_server(
name="ucc",
version="0.1.0",
tools=[lookup_filings, get_filing],
)
TypeScript version:
// UCC Filings MCP server — wraps the PublicRecords API as agent tools.
import { createSdkMcpServer, tool } from "@anthropic-ai/claude-agent-sdk";
import { z } from "zod";
const API = "http://localhost:8080";
export const uccServer = createSdkMcpServer({
name: "ucc",
version: "0.1.0",
tools: [
tool(
"lookup_filings",
"Search UCC filings by 2-letter US state code. Returns up to 50 matches.",
{ state: z.string().length(2) },
async ({ state }) => {
const r = await fetch(`${API}/filings?state=${state.toUpperCase()}`);
if (!r.ok) throw new Error(`API ${r.status}`);
const rows = (await r.json()) as Array<Record<string, unknown>>;
const summary = rows
.slice(0, 50)
.map((f) => `#${f.id} | ${f.state} | ${f.debtorName} -> ${f.securedParty}`)
.join("\n");
return { content: [{ type: "text", text: summary || "no matches" }] };
},
),
tool(
"get_filing",
"Fetch the full record of a single UCC filing by numeric id.",
{ filing_id: z.number().int().positive() },
async ({ filing_id }) => {
const r = await fetch(`${API}/filings/${filing_id}`);
if (r.status === 404) return { content: [{ type: "text", text: "filing not found" }] };
if (!r.ok) throw new Error(`API ${r.status}`);
const f = await r.json();
return { content: [{ type: "text", text: JSON.stringify(f, null, 2) }] };
},
),
],
});
Step 6 — Write the agent driver
The driver is the program that runs the agent loop. ClaudeAgentOptions wires up the MCP server (so its two tools are available to Claude), restricts the tool surface via allowed_tools (no Bash, no Edit — this agent only queries the API), and caps runaway loops with max_turns. The system prompt teaches the agent what UCC means and what tools to reach for.
Python version:
"""UCC Filings Assistant — agent driver that uses the ucc MCP server."""
import asyncio
import sys
from claude_agent_sdk import query, ClaudeAgentOptions, AssistantMessage, TextBlock
from ucc_mcp import ucc_server
SYSTEM = """You are a UCC Filings Assistant for US public records.
Definitions:
- UCC-1 filing: a public lien notice; a "secured party" (lender) holding
a security interest in collateral belonging to a "debtor" (borrower).
You have two tools:
- mcp__ucc__lookup_filings(state) — list matches for a US state
- mcp__ucc__get_filing(filing_id) — full record for one filing
Be concise. Always cite filing ids in your answer. If the user asks for a
state by full name, infer the 2-letter code (Texas -> TX).
"""
async def main(prompt: str) -> None:
options = ClaudeAgentOptions(
system_prompt=SYSTEM,
mcp_servers={"ucc": ucc_server},
allowed_tools=["mcp__ucc__lookup_filings", "mcp__ucc__get_filing"],
max_turns=6,
permission_mode="bypassPermissions",
)
async for msg in query(prompt=prompt, options=options):
if isinstance(msg, AssistantMessage):
for block in msg.content:
if isinstance(block, TextBlock):
print(block.text, end="", flush=True)
print()
if __name__ == "__main__":
user_q = sys.argv[1] if len(sys.argv) > 1 else "Find me Texas filings."
asyncio.run(main(user_q))
TypeScript version:
// UCC Filings Assistant — agent driver that uses the ucc MCP server.
import { query } from "@anthropic-ai/claude-agent-sdk";
import { uccServer } from "./ucc_mcp";
const SYSTEM = `You are a UCC Filings Assistant for US public records.
Definitions:
- UCC-1 filing: a public lien notice; a "secured party" (lender) holding
a security interest in collateral belonging to a "debtor" (borrower).
You have two tools:
- mcp__ucc__lookup_filings(state) — list matches for a US state
- mcp__ucc__get_filing(filing_id) — full record for one filing
Be concise. Always cite filing ids in your answer. If the user asks for a
state by full name, infer the 2-letter code (Texas -> TX).`;
const prompt = process.argv[2] ?? "Find me Texas filings.";
const stream = query({
prompt,
options: {
systemPrompt: SYSTEM,
mcpServers: { ucc: uccServer },
allowedTools: ["mcp__ucc__lookup_filings", "mcp__ucc__get_filing"],
maxTurns: 6,
permissionMode: "bypassPermissions",
},
});
for await (const msg of stream) {
if (msg.type === "assistant") {
for (const block of msg.message.content) {
if (block.type === "text") process.stdout.write(block.text);
}
}
}
process.stdout.write("\n");
Step 7 — Run the agent
Python:
python ucc_agent.py "Find Texas filings and tell me which secured party shows up most"TypeScript (using tsx for one-shot execution):
npx tsx ucc_agent.ts "Find Texas filings and tell me which secured party shows up most"Expected output (shape will vary by Claude run):
Based on the Texas filings:
#1 — Lone Star Holdings LLC -> First National Bank
#2 — Pecos River Logistics Inc. -> Wells Fargo Equipment Finance
Each Texas filing has a different secured party — no single lender appears
more than once in the Texas subset. The two distinct secured parties are
First National Bank and Wells Fargo Equipment Finance.
You ran a complete agent loop in production code, no terminal. The SDK:
- Loaded your MCP server and registered
lookup_filings+get_filing - Sent the prompt + system prompt + tool descriptions to Claude
- Watched Claude pick
lookup_filingswithstate="TX", called your tool, fed the result back - Watched Claude reason over the result and produce a final answer
- Streamed the assistant text to your stdout
That's everything you'd otherwise hand-roll on top of the Messages API — tool dispatch, message accumulation, turn limits — collapsed into ~30 lines of driver code.
Step 8 — Stretch: connect this same agent to Claude Code as a subagent
The MCP server you just wrote is reusable. Add it to your .mcp.json from CC9, drop a subagent definition that uses these tools, and the same logic now runs inside Claude Code — same code, two delivery channels.
---
name: ucc-assistant
description: Answers natural-language questions about UCC public records
by querying the local PublicRecords API. Use when the user asks about
filings, debtors, secured parties, or collateral.
tools: mcp__ucc__lookup_filings, mcp__ucc__get_filing
model: haiku
---
You are a UCC Filings Assistant. You have two tools:
- `mcp__ucc__lookup_filings(state)` — list filings for a US state code
- `mcp__ucc__get_filing(filing_id)` — full record for one filing
Be concise. Always cite filing ids in your answer.
Now in Claude Code: "Use the ucc-assistant subagent to find me California filings." Same MCP server, same tools, same logic — routed through the CLI surface instead of your standalone Python program.
Stretch — Add a debtor-name search tool
The FilingRepository from CC0 only supports findByState. Add a derived query List<Filing> findByDebtorNameContainingIgnoreCase(String q), expose it as a new POST /search endpoint, and add a third tool search_debtor in your MCP server. Then ask the agent: "Find any UCC filing for a logistics company."
A standalone Python or TypeScript program that runs Claude as a tool-using agent against the PublicRecords API — no terminal, no human in the loop. The same MCP server can be reused as a Claude Code subagent for interactive use. You've now seen all three Claude surfaces in one project: Claude Code CLI (CC0–CC14), Agent SDK (this module), and the underlying Messages API (the layer the SDK is built on).
Lab 3 — Build a Visual Debugging UI for Your Agent Runs
The Debug & Observe section earlier in this module showed how to print messages, tool calls, and costs to stdout, and how to dump a repr()-based trace to JSONL. That works for one run. For a real tuning loop — tweak prompt, re-run, compare — stdout becomes a blur and a flat repr trace isn't introspectable. This lab turns the SDK's message stream into a browsable HTML timeline: every run captured, every tool call inspectable, every cost line visible at a glance. About 25 minutes.
The Agent SDK's query() yields a typed message stream — SystemMessage, AssistantMessage, UserMessage (tool results), ResultMessage. We tap that stream once at the top of the agent loop, serialize each message to a structured dict (not repr), and append it to traces/<timestamp>.jsonl. Then a ~70-line FastAPI app reads those files and renders them as a timeline. Re-run the agent: new trace appears. The viewer polls every 1.5s so you can tail a run live.
You're building the visual debugger the SDK doesn't ship with — small enough to vendor into any repo, transparent enough to extend (filtering, search, side-by-side run comparison).
Prerequisites
- Lab 2 complete — you have
ucc_mcp.pyanducc_agent.pyunder~/cc-labs/ucc-agent/, and the PublicRecords API running on port 8080. - The Python environment from Lab 2 with
claude-agent-sdkinstalled. - One extra dep for the viewer:
pip install fastapi uvicornStep 1 — Capture the message stream as structured JSONL
The trace_to_file.py snippet from the Debug section uses repr(msg)[:2000] — great for grep, terrible for a UI. We want each message rendered as a dict so the viewer can introspect content blocks, tool inputs, and costs individually.
"""Capture an Agent SDK message stream to structured JSONL for the viewer."""
import json, time
from pathlib import Path
from dataclasses import asdict, is_dataclass
TRACE_DIR = Path("traces")
TRACE_DIR.mkdir(exist_ok=True)
def _serialize(obj):
"""Recursively convert SDK message objects to JSON-safe dicts."""
if is_dataclass(obj):
return _serialize(asdict(obj))
if hasattr(obj, "__dict__"):
return {k: _serialize(v) for k, v in vars(obj).items()
if not k.startswith("_")}
if isinstance(obj, list):
return [_serialize(x) for x in obj]
if isinstance(obj, dict):
return {k: _serialize(v) for k, v in obj.items()}
return obj # primitives
class TraceRecorder:
def __init__(self, label: str = "run"):
self.path = TRACE_DIR / f"{int(time.time())}-{label}.jsonl"
self._f = self.path.open("w", encoding="utf-8")
def record(self, msg):
entry = {
"ts": time.time(),
"type": type(msg).__name__,
"data": _serialize(msg),
}
self._f.write(json.dumps(entry, default=str) + "\n")
self._f.flush() # so the viewer can tail live runs
def close(self):
self._f.close()
return self.path
Now tap the loop in ucc_agent.py:
from trace_recorder import TraceRecorder
async def main():
recorder = TraceRecorder(label="ucc-tx-query")
try:
async for msg in query(
prompt="List the first 3 UCC filings in Texas with their debtors.",
options=options,
):
recorder.record(msg) # <-- the only new line
# ... your existing print / stream logic stays the same ...
finally:
path = recorder.close()
print(f"\nTrace written: {path}")
Re-run the agent. A new file appears under traces/:
python ucc_agent.py && ls traces/1715520138-ucc-tx-query.jsonl
Append-only, crash-safe, diffable in git, greppable from the terminal, and the viewer can tail the file during a live run instead of waiting for completion. Teams trade up to SQLite or DuckDB only when they need cross-run analytics — for “what did my agent just do?”, JSONL is the right shape.
Step 2 — Spin up the viewer (FastAPI + a single HTML page)
Two endpoints (/ for the UI, /api/traces + /api/trace/<name> for data), one templated page. The viewer reads traces/ on every request — no DB, no build step.
"""Render captured agent traces as an HTML timeline."""
import json
from pathlib import Path
from fastapi import FastAPI, HTTPException
from fastapi.responses import HTMLResponse, JSONResponse
app = FastAPI()
TRACE_DIR = Path("traces")
@app.get("/api/traces")
def list_traces():
files = sorted(TRACE_DIR.glob("*.jsonl"), reverse=True)
return [{"name": f.name, "size": f.stat().st_size} for f in files]
@app.get("/api/trace/{name}")
def read_trace(name: str):
if ".." in name or "/" in name:
raise HTTPException(400, "invalid name")
p = TRACE_DIR / name
if not p.exists():
raise HTTPException(404)
entries = [json.loads(line) for line in p.read_text().splitlines() if line.strip()]
return JSONResponse(entries)
@app.get("/", response_class=HTMLResponse)
def index():
return Path(__file__).parent.joinpath("viewer.html").read_text()
And the page that does the rendering — one self-contained file, zero build step:
<!DOCTYPE html>
<html><head><meta charset="utf-8"><title>Agent Trace Viewer</title>
<style>
body{font:14px/1.5 system-ui;background:#0f172a;color:#e2e8f0;margin:0;display:flex;}
aside{width:260px;background:#1e293b;height:100vh;overflow-y:auto;padding:1rem;}
aside a{display:block;padding:0.4rem;color:#94a3b8;text-decoration:none;border-radius:4px;font-size:0.82rem;}
aside a:hover, aside a.active{background:#334155;color:#fff;}
main{flex:1;padding:1.5rem 2rem;overflow-y:auto;height:100vh;}
.msg{border-left:3px solid #475569;padding:0.6rem 1rem;margin:0.5rem 0;background:#1e293b;border-radius:0 6px 6px 0;}
.msg.SystemMessage{border-color:#64748b;}
.msg.AssistantMessage{border-color:#60a5fa;}
.msg.UserMessage{border-color:#a78bfa;}
.msg.ResultMessage{border-color:#34d399;background:#022c22;}
.type{font-size:0.7rem;text-transform:uppercase;letter-spacing:0.08em;color:#94a3b8;font-weight:600;}
.ts{float:right;color:#64748b;font-size:0.75rem;}
pre{background:#0f172a;padding:0.6rem;border-radius:4px;overflow-x:auto;font-size:0.78rem;margin:0.4rem 0 0;color:#cbd5e1;}
.tool{color:#fbbf24;font-weight:600;}
.cost{color:#34d399;font-weight:600;}
.banner{padding:0.8rem 1rem;border-radius:6px;margin-bottom:1rem;}
</style></head><body>
<aside id="list"><h3 style="margin-top:0;font-size:0.85rem;color:#94a3b8;">Runs</h3></aside>
<main id="trace"><p style="color:#64748b">Pick a run on the left.</p></main>
<script>
const esc = s => String(s).replace(/[&<>]/g, c => ({'&':'&','<':'<','>':'>'}[c]));
async function loadList(){
const r = await fetch('/api/traces'); const files = await r.json();
document.getElementById('list').innerHTML +=
files.map(f => `<a href="#" data-name="${esc(f.name)}">${esc(f.name)}</a>`).join('');
document.querySelectorAll('aside a').forEach(a => a.onclick = ev => {
ev.preventDefault();
document.querySelectorAll('aside a').forEach(x => x.classList.remove('active'));
a.classList.add('active');
render(a.dataset.name);
});
}
function summarize(e){
const d = e.data;
if(e.type === 'AssistantMessage' && d.content){
return d.content.map(c => {
if(c.type === 'text') return `<div>${esc(c.text || '')}</div>`;
if(c.type === 'tool_use')
return `<div class="tool">🔧 ${esc(c.name)}</div><pre>${esc(JSON.stringify(c.input, null, 2))}</pre>`;
if(c.type === 'thinking')
return `<em style="color:#94a3b8">thinking: ${esc((c.thinking || '').slice(0,200))}...</em>`;
return `<pre>${esc(JSON.stringify(c, null, 2))}</pre>`;
}).join('');
}
if(e.type === 'UserMessage' && d.content){
return d.content.map(c => c.type === 'tool_result'
? `<div class="tool">↩ tool_result</div><pre>${esc((typeof c.content === 'string' ? c.content : JSON.stringify(c.content, null, 2)).slice(0,400))}</pre>`
: `<pre>${esc(JSON.stringify(c, null, 2))}</pre>`
).join('');
}
if(e.type === 'ResultMessage'){
return `<div>turns: ${d.num_turns || '?'} · duration: ${((d.duration_ms || 0)/1000).toFixed(2)}s · <span class="cost">cost: $${(d.total_cost_usd || 0).toFixed(4)}</span></div>`;
}
return `<pre>${esc(JSON.stringify(d, null, 2).slice(0,300))}</pre>`;
}
async function render(name){
const r = await fetch('/api/trace/' + encodeURIComponent(name));
const entries = await r.json();
const result = entries.find(e => e.type === 'ResultMessage');
const banner = result
? `<div class="banner" style="background:#022c22">✅ ${result.data.num_turns || '?'} turns · ${((result.data.duration_ms || 0)/1000).toFixed(2)}s · <span class="cost">$${(result.data.total_cost_usd || 0).toFixed(4)}</span></div>`
: `<div class="banner" style="background:#451a03;color:#fbbf24">⏳ Run in progress...</div>`;
const body = entries.map(e => {
const t = new Date(e.ts * 1000).toLocaleTimeString();
return `<div class="msg ${e.type}"><span class="ts">${t}</span><div class="type">${e.type}</div>${summarize(e)}</div>`;
}).join('');
document.getElementById('trace').innerHTML = banner + body;
}
loadList();
// Live tail: re-render the active trace every 1.5s
setInterval(() => { const a = document.querySelector('aside a.active'); if(a) render(a.dataset.name); }, 1500);
</script></body></html>
Step 3 — Run it
uvicorn trace_viewer:app --reload --port 7000Open http://localhost:7000. The sidebar lists every JSONL file under traces/; click one to render the timeline. Each message is color-coded by type:
- Gray (SystemMessage) — init: model, MCP server load, available tools.
- Blue (AssistantMessage) — Claude's text +
tool_useblocks. Tool calls render as 🔧 with their input args expanded. - Purple (UserMessage) —
tool_resultblocks. The response your MCP tool returned, truncated to 400 chars. - Green (ResultMessage) — turns, duration, total cost. The receipt at the end of the run, also rolled up into the banner at the top.
Step 4 — Watch a live run
Open the viewer in one window. In another terminal, re-run the agent against a different prompt:
python ucc_agent.pyRefresh the viewer once — the new file appears in the sidebar. Click it; the viewer's setInterval polls every 1.5s and the recorder flush()es after every message, so the timeline streams in during the run. This is the inner-loop debug workflow: tweak prompt → re-run → watch tool selection in real time without scrolling stdout.
You've replaced “tail the stdout” with “watch a structured timeline.” The same agent code runs unchanged — the recorder is a passive observer of the message stream. Anything the SDK emits, you can see. Anything Claude decided, you can trace.
Step 5 — Inspect a tool-selection bug end-to-end
Try this prompt against your agent: “How many filings are there in TX?”. Watch the timeline.
- The first AssistantMessage shows Claude calling
mcp__ucc__lookup_filingswithstate="TX"— expand thetool_useblock. - The UserMessage that follows is the
tool_result— the actual JSON your MCP server returned. Look at the shape. - The next AssistantMessage is Claude reasoning over that result. Did it count correctly? Did it cite a filing ID it shouldn't have?
- The ResultMessage shows total cost. If a single Texas query cost $0.04, you're paying for too many turns — tighten the system prompt or add a
count_filingstool to skip the listing step.
This is the debugging loop that text-streamed stdout makes painful and a UI makes obvious.
Step 6 — Extend: side-by-side run comparison
Use case: you changed the system prompt and want to see whether tool selection changed. Add a URL parameter to render two runs in columns:
http://localhost:7000/?a=1715520138-tx-query.jsonl&b=1715520245-tx-query.jsonl
This is left as an extension (the viewer is intentionally small). Reading new URLSearchParams, fetching both traces, and rendering into display:grid with two columns is ~25 lines of JS. The real payoff: catching prompt-change regressions before they reach your eval suite (CC11).
Step 7 — The production-grade path
For teams beyond one developer, swap the local viewer for a hosted observability platform. The TraceRecorder abstraction stays — you change only the sink:
| Tool | How you'd wire it | Pick when |
|---|---|---|
| Langfuse (self-host or cloud) | Send each message as an observation via their Python SDK; one agent run = one trace. | You want hosted UI, multi-user access, eval scoring overlays. |
| OpenTelemetry + Honeycomb / Datadog / Jaeger | Wrap the loop in a span; emit one child span per tool call with input/output as attributes. | You already have OTel in your stack and want one pane of glass for app + agent. |
| This viewer + S3 | Upload JSONL to S3 on run completion; viewer reads from S3 instead of local disk. | You want zero vendor lock-in and team-wide trace history. |
For most teams the JSONL viewer is enough for months — ship it, learn what you actually need, then graduate.
Tool inputs may contain user PII; tool results often contain DB rows. Don't write trace files into directories that ship with logs to third parties without redaction. Apply the same masking rules the “One thing not to do” box in the Debug section called out — the viewer is convenient, which makes accidental PII exposure easier.
A self-hosted, zero-build, real-time visual debugger for Agent SDK runs. Recorder (~30 lines) hooks the message stream and writes structured JSONL. Viewer (1 FastAPI file + 1 HTML file, ~140 lines total) renders the JSONL as a color-coded timeline with tool-call inspection, cost roll-up, and a 1.5s live tail. The recorder works against any SDK agent; the viewer renders any trace. You've turned the opaque async iterator into a tool you can pair-debug agents with.
Knowledge Check
1. Your team needs a nightly cron job that audits a Postgres warehouse for new UCC filings and posts to Slack. Which Claude surface?
claude -p "audit and post to Slack" in a cron entry.2. In the lab, what's the role of create_sdk_mcp_server / createSdkMcpServer?
mcp_servers.options.mcp_servers; the SDK runs it in-process, so there's no separate subprocess to manage.@tool-decorated functions and the SDK's MCP loader. No remote registration, no UI — just an in-process server.3. allowed_tools=["mcp__ucc__lookup_filings", "mcp__ucc__get_filing"]. Why this list and not None (allow everything)?
4. The Agent SDK and the Anthropic Messages API both let you build agents. What does the SDK do for you that the Messages API doesn't?
5. You wrote the ucc_mcp.py file once. Where can it be reused?
ucc_mcp running inside a Claude Code subagent — same code, different delivery channel.Summary — You've Reached the End of the Track
Sixteen modules and you've used Claude across all three programmer-facing surfaces:
- Claude Code CLI (CC0–CC14) — interactive pair-programming, slash commands, subagents, hooks, MCP, headless CI.
- Claude Agent SDK (CC15) — programmatic agent loop with the same primitives.
- Anthropic Messages API (under both) — the foundation when you need maximum control over the loop.
Same project, same domain, same Java/Spring backend. The lessons compose: the CLAUDE.md from CC3 ships with your repo and is read by every Claude surface that opens that repo. The MCP server from CC9 powers both the CLI and your custom Agent SDK driver. The pii-auditor subagent from CC6 runs in the GitHub Action from CC14 and can also be invoked from a standalone SDK program. You write the tools once and use them everywhere.
Where to go next
- Build a tiny in-house agent for your team's actual data — replace UCC filings with whatever domain you work in.
- Add Postgres, GitHub, or Slack MCP servers to your SDK driver to handle multi-system workflows.
- Wire your SDK agent into your CI alongside the headless CLI from CC14 — the SDK's programmatic control opens up workflows the CLI can't express cleanly.