Capstone 7-A — Agent Evolution: Healthcare Pre-Auth

Project Brief

Three Ways to Build a House

Imagine you decide to build the same house three times. The first time, you fell every tree, mill every plank, and hammer every nail by hand. You learn exactly where the load-bearing walls go, why the joists are spaced 16 inches apart, and what happens when a sill plate is undersized. The build takes six months, but you understand every joint in the structure.

The second time, you order pre-cut lumber, pre-fab trusses, and use a nail gun. You finish in six weeks. The walls go up in the same places, but you are no longer wondering why — you are picking which tools to use and where to apply them. You build twice as fast and the structure is just as strong, but only because you already know what a structure should look like.

The third time, you hand a contractor a set of architectural drawings. Two weeks later, the house is done — built by people you never met to specifications you wrote in plain English. You are now an architect. The previous two builds taught you what to draw and what to leave to the crew.

This capstone is those three builds, compressed into one week. You will write the same pre-auth decision agent in raw API code, then with the Agent SDK and Claude Code, then as a 12-section spec that Claude Code reads and implements. By the end, you will know in your hands — not just in theory — why each layer of abstraction exists, what it costs, and when to reach for each one.

What You'll Build

You build a Pre-Authorization Decision Agent. It takes a pre-auth request — a procedure code (CPT), diagnosis code (ICD-10), member ID, and provider NPI — and reasons through five tool calls: looks up the payer's clinical criteria, checks whether the diagnosis meets medical necessity, verifies the provider's network status, pulls the member's benefit summary, and produces a structured determination (APPROVE / DENY / REQUEST_INFO) with a written clinical justification citing the policy section it relied on.

You build it three times:

Iteration 1 Raw API loop — ~250 lines, ~3 hours, hand-coded everything (M15B way)
Iteration 2 Agent SDK + Claude Code — ~120 lines, ~2 hours (M25–M26 way)
Iteration 3 Spec-driven — ~100 lines of spec, ~1 hour (production way)

HIPAA & PHI Are Not Optional

This scenario lets you exercise the same compliance constraints production health agents face: PHI redaction in hooks (member IDs and patient names should never enter logs verbatim), audit trail requirements (every tool call and determination must be persisted with a timestamp and case ID), and determination explainability (the rationale must cite specific clinical criteria sections, not "Claude said so"). All three iterations enforce the same rules — what changes is whether you write the redaction logic inline (Iter 1), as a hook (Iter 2), or specify it (Iter 3).

Why It Matters

Production teams do not pick "raw API vs SDK vs spec" in the abstract — they pick based on what the team already understands and what the problem requires. If you skip Iteration 1, you cannot debug Iteration 3 when the generated code does something weird. If you skip Iteration 3, you are 10x slower than the teams shipping agents in 2026. The point of building the same thing three times is that the differences teach you the trade-offs in a way no diagram ever will. You are not learning three different agents. You are learning three different levels of abstraction.

The Three-Iteration Concept

Each iteration produces a working agent that solves the SAME pre-auth decision with the SAME five tools and SAME mock data. The agent's output is identical across all three. What changes is everything around the agent: the lines you wrote, the time you spent, the abstractions you used, and the way you debug when something breaks.

An agent is "a system prompt + tools + a loop." Iteration 1 makes you build all three from scratch. Iteration 2 keeps the system prompt and tools the same, but the SDK runs the loop for you. Iteration 3 keeps everything the same, but Claude Code writes the prompt, tools, AND loop from a spec you gave it. The agent is unchanged. You changed.

A Common Misunderstanding

"Iteration 3 is just better — why bother with the others?" Because Iteration 3's generated code is not magic. When it produces a verify_diagnosis_match that does substring comparison instead of ICD-10 hierarchy traversal, you have to read the generated tools.py, find the bug, and fix it — either in the code or in the spec. Without Iteration 1's familiarity with what tool calls and message shapes look like, you cannot tell what the generated code is doing wrong. Iteration 3 is fast precisely because you can read its output.

The Scenario — Pre-Auth Decision Agent

The agent takes a pre-authorization request and produces a determination: APPROVE / DENY / REQUEST_INFO with a clinical justification. The five-tool decision pattern mirrors what real prior-authorization software does, just with mock data instead of payer APIs.

Business Question (use this in all three iterations)

"Should this pre-auth for knee replacement (CPT 27447) with diagnosis M17.11 (Unilateral primary osteoarthritis, right knee) be approved under Aetna for member ID A123456 with provider NPI 1234567890?"

Tools (5)

lookup_clinical_criteria(cpt_code, payer) — medical necessity rules
verify_diagnosis_match(icd10_code, criteria) — checks ICD-10 against criteria
check_network_status(provider_npi, payer) — in-network / out-of-network
get_benefit_summary(member_id, cpt_code) — coverage, copay, deductible
generate_determination(case_data) — produces structured APPROVE/DENY/REQUEST_INFO

Mock Data Shape

15 pre-auth requests across 5 procedures × 3 payers
Procedures: MRI knee (CPT 73721), Knee replacement (27447), Specialty drug (J9035), Cardiac cath (93458), Physical therapy (97110)
Payers: Aetna, UnitedHealth, BlueCross
Files: preauth_requests.json, clinical_criteria.json, provider_directory.json, benefits.json

Want a Different Domain?

Switch to Domain B (B2B Order Exception) or Domain C (UCC Risk Analyzer — default). The lab structure is identical; only the tools and data change.

Animation 1: Three-Lane Evolution

Watch three lanes count down lines of code while the same six capabilities populate underneath each. The agent's capabilities never change — what shrinks is the code you write to express them.

Code Shrinks — Capabilities Stay

Iteration 1: Raw API

0

lines you wrote

Iteration 2: Agent SDK

0

lines you wrote

Iteration 3: Spec-Driven

0

lines of spec

Animation 2: Code Size Waterfall

Iteration 1 is 250 lines you wrote. Iteration 2 is 120 lines you wrote. Iteration 3 is 100 lines of spec plus ~300 lines of code Claude Code generated — shown stacked. The total system size grows; the lines on your keyboard fall off a cliff.

Lines of Code: You Wrote vs Generated

250

Iter 1

250 lines hand-written

120

Iter 2

120 lines hand-written

300

100

Iter 3

100 spec + 300 generated

Iter 1 hand-written Iter 2 hand-written Iter 3 spec (you wrote) Iter 3 generated by Claude Code

Animation 3: Time Comparison

Iteration 1 is the longest because you build everything — loop, validation, PHI redaction, audit, sessions, deployment. Iteration 2 cuts the loop and guardrails (the SDK and hooks handle them). Iteration 3 cuts almost everything except the thinking: writing what the agent should do.

Wall-Clock Hours per Iteration

Iter 1

3.0 h

3 hr

Iter 2

2.0 h

2 hr

Iter 3

1.0 h

1 hr

Total: 6 hours across 3 sessions for the same agent, three different builds.

Animation 4: Architecture Per Iteration

Each iteration has the same logical architecture but the physical architecture differs. Click the tabs to compare.

System Architecture — Three Versions

YOU OWN EVERYTHING IN BLUE +----------------------------------------------------------+ | agent.py (~250 lines, all hand-written) | | | | while True: <-- loop YOU wrote | | response = client.messages.create(...) | | if response.stop_reason == "end_turn": break | | for block in response.content: | | if block.type == "tool_use": | | validate_input(block.input) <-- YOU wrote | | check_cost_cap() <-- YOU wrote | | log_with_timestamp(block) <-- YOU wrote | | result = execute_tool(...) <-- YOU wrote | | redact_phi(result) <-- YOU wrote (HIPAA)| | append_messages(...) <-- YOU wrote | | write_audit_log(...) <-- YOU wrote | +----------------------------------------------------------+ | | | v v v tools.py circuit_breaker audit_log.jsonl (5 clinical (YOU built) (YOU rotate) tools YOU wrote) | v +----------------------------------------------------------+ | server.py (FastAPI) <-- YOU wrote everything | | Dockerfile <-- YOU wrote | +----------------------------------------------------------+

YOU OWN PINK; SDK OWNS GRAY +----------------------------------------------------------+ | agent.py (~90 lines) | | | | @tool("lookup_clinical_criteria", ..., {...}) | | async def lookup_clinical_criteria(args): ... | | (5 @tool functions total) << YOU wrote | | | | preauth_server = create_sdk_mcp_server( | | name="preauth_tools", tools=[...]) | | | | OPTIONS = ClaudeAgentOptions( | | system_prompt="Pre-auth analyst...", | | mcp_servers={"preauth": preauth_server}, | | allowed_tools=["mcp__preauth__..."], | | hooks={"PreToolUse": [HookMatcher(validate_input)]}, | | ) | | | | async for msg in query(prompt=..., options=OPTIONS): ... | | | | -- LOOP, MESSAGE-PASSING, RETRIES, STREAMING ---- | | -- LIVE INSIDE claude-agent-sdk. YOU DO NOT WRITE THEM. -| +----------------------------------------------------------+ | | v v +--------------------+ +--------------------+ | Claude Code | | claude-agent-sdk | | generated: | | managed: | | - server.py | | - query() loop | | - Dockerfile | | - MCP transport | | - tests | | - HookMatcher | | .claude/ | | - resume tokens | | settings.json | | - streaming | | + hooks/*.py | +--------------------+ +--------------------+

YOU OWN ONLY THE GREEN BOX +----------------------------------------------------------+ | spec/agent-spec.md (~100 lines) << YOU wrote | | ---------------------------------- | | # Sections | | 1. Overview 8. API Wrapper | | 2. Configuration 9. Deployment | | 3. Tools (5 clinical) 10. Tests | | 4. System Prompt 11. Evaluation (15 pre-auths) | | 5. Hooks (PHI redact) 12. File Structure | | 6. Sessions | | 7. Mock Data | +----------------------------------------------------------+ | v +----------------------------------------------------------+ | /generate-from-spec spec/agent-spec.md | | Claude Code reads spec, generates ~18 files: | | agent.py (claude-agent-sdk) | sessions.py | | mock_data/*.json (4 files) | server.py | Dockerfile | | .claude/settings.json | hooks/*.py | .claude/commands/ | | tests/test_*.py x5 | spec/agent-spec.md (you) | | appendix/manual-loop.py (Iter-1 reference) | +----------------------------------------------------------+

Animation 5: Spec-to-Code Flow

Watch the 12-section spec on the left get read line-by-line. Claude Code (the engine in the middle) translates each section into generated code on the right. Files appear as their corresponding spec section is consumed: section 3 (Tools) generates tools.py, section 5 (Hooks) generates hooks.py, and so on.

agent-spec.md → Claude Code → 18 Files

⚙

Claude Code

read → plan → write

Prerequisites

Required Modules

M05 — Tool Use: Tool definitions, tool_use blocks, the message loop
M12 — ReAct Agents: Multi-step reasoning across the 5-tool decision flow
M15B — Build Complete Agent: The whole Iter-1 mental model
M16–M17 — Guardrails & HITL: Hooks, especially PHI redaction
M19 — Tracing: Optional but useful for Iter 2 debugging
M21, M22B — Deployment: FastAPI + Docker + Tier 1
M25, M26 — Claude Code & Hooks: CLAUDE.md, slash commands, hooks API

If You Have Not Done M15B / M26

Iteration 1 is a re-implementation of the M15B reference agent for the pre-auth scenario. Do M15B first if you have not built an agent from raw API calls. Iteration 2 leans on the claude-agent-sdk patterns taught in M26 (Hooks & Sessions & Agent SDK) — reach for it if @tool / HookMatcher / ClaudeAgentOptions feels unfamiliar.

Tools You'll Need Installed

Python 3.10+ with pip
Claude Code CLI (npm i -g @anthropic-ai/claude-code) — for Iter 2 and 3
Docker Desktop for the Tier 1 deployment
ANTHROPIC_API_KEY environment variable
Optional: a Langfuse account for Iter 2 tracing

SESSION 1

Iteration 1: Raw API Loop

Build the agent the M15B way. You write the loop, the validation, the PHI redaction, the audit, the sessions, and the deployment. Every line is yours. Every bug is yours to find.

~3 hours~250 lines7 files0 abstractions

Step 1: Setup & Mock Data

15 minmock_data/*.json

What & Why: Create the project folder, install anthropic + FastAPI, then build the four mock JSON files your tools will read. Mock data is what separates a "demo" agent from a "doesn't compile" agent — without realistic clinical criteria and ICD-10 mappings, every tool call returns garbage and you cannot tell if the loop is broken or the data is.

mkdir -p agent-iter1-raw/mock_data && cd agent-iter1-raw
python -m venv venv && source venv/bin/activate    # Windows: venv\Scripts\activate
pip install "anthropic>=0.40" "fastapi>=0.110" "uvicorn>=0.27" "pydantic>=2.0"

// mock_data/clinical_criteria.json (excerpt)
{
  "27447_AETNA": {
    "cpt_code": "27447",
    "procedure": "Total knee arthroplasty",
    "payer": "Aetna",
    "policy_id": "CPB-0660",
    "approved_indications": [
      "M17.10", "M17.11", "M17.12",
      "M17.30", "M17.31", "M17.32"
    ],
    "criteria": [
      "Symptomatic osteoarthritis confirmed by imaging",
      "Failed conservative therapy (NSAIDs, PT) for >= 6 months",
      "BMI < 40 OR documented exceptions"
    ]
  }
}

// mock_data/preauth_requests.json (excerpt)
[
  {
    "case_id": "PA-2025-0001",
    "member_id": "A123456",
    "provider_npi": "1234567890",
    "cpt_code": "27447",
    "icd10_code": "M17.11",
    "payer": "Aetna",
    "submitted": "2025-04-15"
  }
]

// mock_data/provider_directory.json (excerpt)
{
  "1234567890": {
    "npi": "1234567890",
    "name": "Dr. J. Smith",
    "specialty": "Orthopedic Surgery",
    "networks": ["Aetna", "BlueCross"]
  }
}

// mock_data/benefits.json (excerpt)
{
  "A123456_27447": {
    "covered": true,
    "deductible_remaining": 850.00,
    "copay": 250.00,
    "coinsurance_pct": 20,
    "prior_auth_required": true
  }
}

Build all four JSON files with realistic data: 15 pre-auth requests across 5 procedures (CPT 73721, 27447, J9035, 93458, 97110) and 3 payers (Aetna, UnitedHealth, BlueCross). Include 2 cases that should DENY (no medical necessity match), 2 that should REQUEST_INFO (out-of-network provider), and the rest that should APPROVE.

Run: python -c "import json; print(len(json.load(open('mock_data/preauth_requests.json'))), 'requests')"

15 requests

Checkpoint

You should see 15 requests. If you see 0 or a JSON parse error, check that all four files are valid JSON.

Step 2: Define Tools as JSON Schema

15 mintools.py

What & Why: The Anthropic API needs every tool described as a JSON Schema object so Claude knows what arguments to pass. Pre-auth tools have stricter parameter shapes than most agents — ICD-10 codes follow a specific regex, NPIs are exactly 10 digits, CPT codes are 5-digit strings. Get the schema wrong and Claude either ignores the tool or passes the wrong types.

"""tools.py — pre-auth tool schemas + executors for the raw API loop."""
import json
from pathlib import Path

DATA = Path("mock_data")
CRITERIA = json.loads((DATA / "clinical_criteria.json").read_text())
PROVIDERS = json.loads((DATA / "provider_directory.json").read_text())
BENEFITS = json.loads((DATA / "benefits.json").read_text())

TOOLS = [
    {
        "name": "lookup_clinical_criteria",
        "description": "Get medical-necessity criteria for a CPT procedure under a specific payer.",
        "input_schema": {
            "type": "object",
            "properties": {
                "cpt_code": {"type": "string", "pattern": "^[0-9]{5}$|^[A-Z][0-9]{4}$"},
                "payer": {"type": "string", "enum": ["Aetna", "UnitedHealth", "BlueCross"]},
            },
            "required": ["cpt_code", "payer"],
        },
    },
    {
        "name": "verify_diagnosis_match",
        "description": "Check if an ICD-10 code matches the approved indications in given criteria.",
        "input_schema": {
            "type": "object",
            "properties": {
                "icd10_code": {"type": "string"},
                "criteria_id": {"type": "string"},
            },
            "required": ["icd10_code", "criteria_id"],
        },
    },
    {
        "name": "check_network_status",
        "description": "Return the provider's in-network status for a given payer.",
        "input_schema": {
            "type": "object",
            "properties": {
                "provider_npi": {"type": "string", "pattern": "^[0-9]{10}$"},
                "payer": {"type": "string"},
            },
            "required": ["provider_npi", "payer"],
        },
    },
    {
        "name": "get_benefit_summary",
        "description": "Return coverage, copay, deductible-remaining for the member + procedure.",
        "input_schema": {
            "type": "object",
            "properties": {
                "member_id": {"type": "string"},
                "cpt_code": {"type": "string"},
            },
            "required": ["member_id", "cpt_code"],
        },
    },
    {
        "name": "generate_determination",
        "description": "Produce a final structured determination from gathered case data.",
        "input_schema": {
            "type": "object",
            "properties": {
                "case_id": {"type": "string"},
                "decision": {"type": "string", "enum": ["APPROVE", "DENY", "REQUEST_INFO"]},
                "rationale": {"type": "string"},
                "policy_citation": {"type": "string"},
            },
            "required": ["case_id", "decision", "rationale"],
        },
    },
]

def execute_tool(name, args):
    if name == "lookup_clinical_criteria":
        key = f"{args['cpt_code']}_{args['payer'].upper()}"
        return CRITERIA.get(key, {"error": "no policy found"})
    if name == "verify_diagnosis_match":
        # criteria_id is e.g. "27447_AETNA" or a policy_id
        for k, c in CRITERIA.items():
            if k == args["criteria_id"] or c.get("policy_id") == args["criteria_id"]:
                return {"match": args["icd10_code"] in c["approved_indications"],
                        "approved_indications": c["approved_indications"]}
        return {"error": "criteria not found"}
    if name == "check_network_status":
        p = PROVIDERS.get(args["provider_npi"])
        if not p: return {"error": "provider not found"}
        in_net = args["payer"] in p.get("networks", [])
        return {"npi": p["npi"], "specialty": p["specialty"],
                "in_network": in_net, "payer": args["payer"]}
    if name == "get_benefit_summary":
        key = f"{args['member_id']}_{args['cpt_code']}"
        return BENEFITS.get(key, {"covered": False, "error": "no benefit record"})
    if name == "generate_determination":
        return {"case_id": args["case_id"], "decision": args["decision"],
                "rationale": args["rationale"],
                "policy_citation": args.get("policy_citation", ""),
                "ts": "2025-04-15T12:00:00Z"}
    raise ValueError(f"Unknown tool: {name}")

Checkpoint

Run

python -c "from tools import execute_tool; print(execute_tool('check_network_status', {'provider_npi': '1234567890', 'payer': 'Aetna'}))"

. Expected: {'npi': '1234567890', 'specialty': 'Orthopedic Surgery', 'in_network': True, 'payer': 'Aetna'}.

Step 3: Build the While Loop

45 minagent.pyThe CORE of Iter 1

What & Why: This is the heart of the iteration — the agentic loop you will replace twice over. Write it once by hand and you will recognize what the SDK and Claude Code generate later. The system prompt is the most important part for pre-auth: Claude needs to know to call all 5 tools in the right order, never approve without a network check, and always cite a policy.

"""agent.py — the raw API loop. Pre-auth decision flow.

Note: we split the loop into a private _run_messages() helper that takes a
message list and returns (text, updated_messages). run_agent() is a thin
wrapper for the single-shot case; in Step 6 the session manager will call
_run_messages directly to support multi-turn case follow-ups."""
import json, anthropic
from tools import TOOLS, execute_tool

client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"
MAX_TURNS = 12
SYSTEM = """You are a pre-authorization decision agent for medical procedures.
For every case, follow this order:
1. lookup_clinical_criteria for the CPT + payer
2. verify_diagnosis_match for the patient's ICD-10
3. check_network_status for the provider + payer
4. get_benefit_summary for the member + CPT
5. generate_determination with APPROVE / DENY / REQUEST_INFO

Always cite the policy_id in your rationale. Never APPROVE without a successful
diagnosis match AND in-network provider AND covered benefit. Use REQUEST_INFO
when missing data, DENY when criteria fail."""

def _run_messages(messages: list) -> tuple[str, list]:
    """Drive the tool-use loop on the given messages list. Returns (final_text, messages)."""
    for turn in range(MAX_TURNS):
        response = client.messages.create(
            model=MODEL, max_tokens=4096, system=SYSTEM,
            tools=TOOLS, messages=messages,
        )
        # Always append assistant turn BEFORE handling tool calls.
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            text = next(b.text for b in response.content if b.type == "text")
            return text, messages

        if response.stop_reason == "tool_use":
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    try:
                        result = execute_tool(block.name, block.input)
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": json.dumps(result, default=str),
                        })
                    except Exception as e:
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": f"ERROR: {e}",
                            "is_error": True,
                        })
            messages.append({"role": "user", "content": tool_results})
            continue

        raise RuntimeError(f"Unexpected stop_reason: {response.stop_reason}")
    raise RuntimeError(f"Agent exceeded {MAX_TURNS} turns without finishing")

def run_agent(question: str) -> str:
    """Single-shot entry point. Wraps _run_messages with a fresh history."""
    text, _ = _run_messages([{"role": "user", "content": question}])
    return text

if __name__ == "__main__":
    q = ("Should pre-auth PA-2025-0001 (CPT 27447, ICD-10 M17.11, "
         "member A123456, provider NPI 1234567890, payer Aetna) be approved?")
    print(run_agent(q))

// agent.ts — the raw API loop. Pre-auth decision flow.
import Anthropic from "@anthropic-ai/sdk";
import { TOOLS, executeTool } from "./tools.js";

const client = new Anthropic();
const MODEL = "claude-sonnet-4-6";
const SYSTEM = `You are a pre-authorization decision agent for medical procedures.
For every case, follow this order:
1. lookup_clinical_criteria for the CPT + payer
2. verify_diagnosis_match for the patient's ICD-10
3. check_network_status for the provider + payer
4. get_benefit_summary for the member + CPT
5. generate_determination with APPROVE / DENY / REQUEST_INFO

Always cite the policy_id in your rationale. Never APPROVE without successful
diagnosis match AND in-network provider AND covered benefit. Use REQUEST_INFO
when missing data, DENY when criteria fail.`;

export async function runAgent(question: string, maxTurns = 12): Promise<string> {
  const messages: Anthropic.MessageParam[] = [{ role: "user", content: question }];
  for (let turn = 0; turn < maxTurns; turn++) {
    const response = await client.messages.create({
      model: MODEL, max_tokens: 4096, system: SYSTEM, tools: TOOLS, messages,
    });
    messages.push({ role: "assistant", content: response.content });
    if (response.stop_reason === "end_turn") {
      const text = response.content.find(b => b.type === "text");
      return text?.type === "text" ? text.text : "";
    }
    if (response.stop_reason === "tool_use") {
      const toolResults: Anthropic.ToolResultBlockParam[] = [];
      for (const block of response.content) {
        if (block.type === "tool_use") {
          try {
            const result = await executeTool(block.name, block.input);
            toolResults.push({ type: "tool_result", tool_use_id: block.id,
                                content: JSON.stringify(result) });
          } catch (e) {
            toolResults.push({ type: "tool_result", tool_use_id: block.id,
                                content: `ERROR: ${(e as Error).message}`, is_error: true });
          }
        }
      }
      messages.push({ role: "user", content: toolResults });
      continue;
    }
    throw new Error(`Unexpected stop_reason: ${response.stop_reason}`);
  }
  throw new Error(`Agent exceeded ${maxTurns} turns without finishing`);
}

Run: python agent.py

Expected output (paraphrased — Claude generates fresh text each run):

Determination: APPROVE
Case: PA-2025-0001 | Member: A123456 | CPT: 27447 (knee replacement)
Diagnosis: M17.11 (right knee primary osteoarthritis) — matches Aetna policy CPB-0660
approved indications. Provider NPI 1234567890 confirmed in-network. Member benefit
verified: covered, $250 copay, $850 deductible remaining.

Rationale: All Aetna CPB-0660 medical-necessity criteria satisfied (symptomatic OA
confirmed, in-network provider, benefit verified). Approving per policy CPB-0660 v2024.

✅ Checkpoint — Step 3

You should see Determination: APPROVE with a rationale citing policy CPB-0660, in-network confirmation, and covered benefit. If the agent stops after only 1–2 tool calls, the system prompt is too vague — reinforce the 5-step order with an explicit "use ALL FIVE tools before generating the determination."

Troubleshooting

Agent returns DENY when it should APPROVE → check that verify_diagnosis_match returns match: true for M17.11 against the CPB-0660 indications. Mock data must include M17.11 in the approved_indications list.
tool_use_id error on second turn → the assistant turn must be appended to messages BEFORE the tool_result turn. Check the order in _run_messages.
Agent loops forever, hits MAX_TURNS → the system prompt isn't telling it when to stop. Add: "Once you have called generate_determination, stop — do not search further."
Agent hallucinates a CPT code or member ID not in the question → tighten the system prompt: "Use ONLY the CPT code, ICD-10 code, NPI, and member ID provided in the user's request. Do not invent values."

Step 4: Add Guardrails Manually (incl. PHI Redaction)

30 minguardrails.py

What & Why: A loop that does whatever Claude asks is not safe to ship in healthcare. Production pre-auth agents need at minimum: input validation (reject malformed CPT/NPI/ICD-10), PHI redaction (member IDs, names, DOBs scrubbed from logs), a cost cap, and a circuit breaker. You write all four by hand.

"""guardrails.py — HIPAA-aware checks for pre-auth agents."""
import re

# PHI patterns to redact from logs (NOT from the agent's working memory)
SSN_RE = re.compile(r"\b\d{3}-\d{2}-\d{4}\b")
PHONE_RE = re.compile(r"\b\d{3}-\d{3}-\d{4}\b")
MEMBER_ID_RE = re.compile(r"\b[A-Z]\d{6,9}\b")        # e.g. A123456
DOB_RE = re.compile(r"\b\d{4}-\d{2}-\d{2}\b")         # ISO date

CPT_RE = re.compile(r"^[0-9]{5}$|^[A-Z][0-9]{4}$")
NPI_RE = re.compile(r"^[0-9]{10}$")
ICD10_RE = re.compile(r"^[A-Z]\d{2}(\.\d{1,4})?$")

COST_LIMIT_TOKENS = 50_000
CIRCUIT_FAIL_THRESHOLD = 3

def validate_input(tool_name: str, args: dict):
    if tool_name == "lookup_clinical_criteria":
        if not CPT_RE.match(args.get("cpt_code", "")):
            raise ValueError(f"Invalid CPT code: {args.get('cpt_code')}")
    if tool_name == "verify_diagnosis_match":
        if not ICD10_RE.match(args.get("icd10_code", "")):
            raise ValueError(f"Invalid ICD-10 code: {args.get('icd10_code')}")
    if tool_name == "check_network_status":
        if not NPI_RE.match(args.get("provider_npi", "")):
            raise ValueError(f"Invalid NPI (must be 10 digits): {args.get('provider_npi')}")

def redact_phi(payload: str) -> str:
    """Scrub PHI from a string before writing to logs/audit."""
    payload = SSN_RE.sub("[SSN_REDACTED]", payload)
    payload = PHONE_RE.sub("[PHONE_REDACTED]", payload)
    payload = MEMBER_ID_RE.sub("[MEMBER_ID_REDACTED]", payload)
    payload = DOB_RE.sub("[DOB_REDACTED]", payload)
    return payload

class CircuitBreaker:
    def __init__(self): self.failures = 0
    def record(self, ok):
        self.failures = 0 if ok else self.failures + 1
        if self.failures >= CIRCUIT_FAIL_THRESHOLD:
            raise RuntimeError("Circuit breaker tripped — aborting")

Now wire the guardrails into _run_messages in agent.py. Replace your existing _run_messages with the version below. Note: redact_phi is called only when writing to the audit log (Step 5) — the unredacted member_id flows back to Claude so the agent can chain tool calls (lookup → verify → check_network → benefits → determination).

# Add to the imports at the top of agent.py:
from guardrails import validate_input, CircuitBreaker, COST_LIMIT_TOKENS   # NEW

_breaker = CircuitBreaker()        # NEW — module-level instance

def _run_messages(messages: list) -> tuple[str, list]:
    total_tokens = 0                # NEW — cost cap counter
    for turn in range(MAX_TURNS):
        response = client.messages.create(
            model=MODEL, max_tokens=4096, system=SYSTEM,
            tools=TOOLS, messages=messages,
        )
        total_tokens += (response.usage.input_tokens
                         + response.usage.output_tokens)              # NEW
        if total_tokens > COST_LIMIT_TOKENS:                          # NEW
            raise RuntimeError(f"Cost cap exceeded: {total_tokens} tokens")

        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            text = next(b.text for b in response.content if b.type == "text")
            return text, messages

        if response.stop_reason == "tool_use":
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    try:
                        validate_input(block.name, block.input)        # NEW — CPT/NPI/ICD-10 check
                        result = execute_tool(block.name, block.input)
                        _breaker.record(ok=True)                        # NEW
                        # Pass the UNREDACTED result back to Claude.
                        # The agent needs the literal member_id to chain.
                        # PHI redaction happens inside append_audit (Step 5).
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": json.dumps(result, default=str),
                        })
                    except Exception as e:
                        _breaker.record(ok=False)                       # NEW — trips on 3rd fail
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": f"ERROR: {e}",
                            "is_error": True,
                        })
            messages.append({"role": "user", "content": tool_results})
            continue

        raise RuntimeError(f"Unexpected stop_reason: {response.stop_reason}")

    raise RuntimeError(f"Agent exceeded {MAX_TURNS} turns without finishing")

Test that the guardrails fire on bad input:

python -c "from agent import run_agent; print(run_agent('Pre-auth for CPT BAD123 ICD M17.11 NPI 12345 Aetna member A123456'))"
# Expected: agent attempts lookup_clinical_criteria(cpt_code='BAD123'),
# validate_input raises ValueError, error flows back as is_error: true,
# Claude either reformulates with a valid CPT or apologizes.

✅ Checkpoint — Step 4

The clean PA-2025-0001 query still produces APPROVE (same as Step 3). The "BAD123" query gets rejected at validate_input — you should see the agent reformulate or apologize. If RuntimeError: Cost cap exceeded, your loop is unbounded; verify the token sum is being checked after every messages.create.

⚠️ PHI Redaction is Trickier Than It Looks

Notice we did NOT call redact_phi on the tool_result that flows back to Claude. This is critical: the agent NEEDS the unredacted member_id (e.g., A123456) to call get_benefit_summary in the next turn. If you redact too aggressively, the agent passes [MEMBER_ID_REDACTED] to get_benefit_summary and the lookup fails with "no benefit record".

PHI redaction happens in Step 5's append_audit function — only when writing to the persistent audit log, not in the runtime message stream.

Troubleshooting

ImportError: cannot import name 'validate_input' from 'guardrails' → guardrails.py isn't in the project folder, or the function name is misspelled.
Agent fails on get_benefit_summary with "no benefit record" → you redacted the member_id in the tool_result. Don't — only redact in append_audit.
Circuit breaker trips on first call → _breaker.record(ok=False) is in the wrong branch. It belongs only in the except, not after execute_tool.

Step 5: Add HIPAA Audit Logging

15 minaudit.py + agent.py wiring

What & Why: HIPAA requires audit trails for any system that touches PHI. Every tool call needs a timestamped, redacted record: case_id, tool, redacted inputs, redacted output summary, token count. Critical: redaction happens here when writing to disk — the agent's runtime message stream stays unredacted so tool chaining works (per Step 4 callout).

Create a new file audit.py in the project folder:

"""audit.py — HIPAA-compliant audit log writer."""
import json, datetime
from guardrails import redact_phi

def append_audit(tool_name: str, args: dict, result, tokens: int,
                 case_id: str = "default") -> None:
    rec = {
        "ts": datetime.datetime.utcnow().isoformat() + "Z",
        "case_id": case_id,
        "tool": tool_name,
        "input_redacted": redact_phi(json.dumps(args, default=str)),
        "output_summary": redact_phi(str(result))[:200],
        "tokens": tokens,
    }
    with open("audit_log.jsonl", "a") as f:
        f.write(json.dumps(rec) + "\n")

Now wire it into _run_messages. Add the import and the append_audit call right after the successful execute_tool:

# Add to the imports at the top of agent.py:
from audit import append_audit                    # NEW

# Inside _run_messages, in the try-block right after `result = execute_tool(...)`:
                        result = execute_tool(block.name, block.input)
                        _breaker.record(ok=True)
                        # Use the case_id from the user's first message (PA-YYYY-NNNN).
                        case_id = next(
                            (w for w in messages[0]["content"].split() if w.startswith("PA-")),
                            "default"
                        )
                        append_audit(                              # NEW
                            tool_name=block.name,
                            args=block.input,
                            result=result,
                            tokens=total_tokens,
                            case_id=case_id,
                        )
                        tool_results.append({...})
                        # ... rest of try-block unchanged ...

Run: python agent.py

Then inspect the audit file:

cat audit_log.jsonl   # macOS/Linux
type audit_log.jsonl  # Windows cmd
Get-Content audit_log.jsonl  # Windows PowerShell

Expected output (5 lines, one per tool call) — notice the member_id is redacted:

{"ts": "2026-05-09T12:01:14Z", "case_id": "PA-2025-0001", "tool": "lookup_clinical_criteria", "input_redacted": "{\"cpt_code\": \"27447\", \"payer\": \"Aetna\"}", "output_summary": "{\"policy_id\": \"CPB-0660\", \"approved_indications\": ...}", "tokens": 1842}
{"ts": "2026-05-09T12:01:18Z", "case_id": "PA-2025-0001", "tool": "verify_diagnosis_match", "input_redacted": "{\"icd10_code\": \"M17.11\", \"criteria_id\": \"27447_AETNA\"}", "output_summary": "{\"match\": true, ...}", "tokens": 2510}
{"ts": "2026-05-09T12:01:22Z", "case_id": "PA-2025-0001", "tool": "check_network_status", "input_redacted": "{\"provider_npi\": \"1234567890\", \"payer\": \"Aetna\"}", "output_summary": "{\"in_network\": true, ...}", "tokens": 3185}
{"ts": "2026-05-09T12:01:26Z", "case_id": "PA-2025-0001", "tool": "get_benefit_summary", "input_redacted": "{\"member_id\": \"[MEMBER_ID_REDACTED]\", \"cpt_code\": \"27447\"}", "output_summary": "{\"covered\": true, \"copay\": 250, ...}", "tokens": 3920}
{"ts": "2026-05-09T12:01:30Z", "case_id": "PA-2025-0001", "tool": "generate_determination", "input_redacted": "{\"case_id\": \"PA-2025-0001\", \"decision\": \"APPROVE\", ...}", "output_summary": "{\"case_id\": \"PA-2025-0001\", \"decision\": \"APPROVE\"}", "tokens": 4480}

✅ Checkpoint — Step 5

You should see 5 lines in audit_log.jsonl (one per tool call). The input_redacted for get_benefit_summary should contain [MEMBER_ID_REDACTED] — NOT A123456. The agent's response (printed to stdout) should still APPROVE the case — if it returns a "member not found" error, you accidentally redacted the member_id in the runtime stream, not just the audit log.

Troubleshooting

ImportError: No module named 'audit' → audit.py is in the wrong folder. Move it next to agent.py.
audit_log.jsonl shows literal member_id A123456 → either you forgot the redact_phi call inside append_audit, or your MEMBER_ID_RE regex doesn't match the format. Test it: python -c "from guardrails import redact_phi; print(redact_phi('A123456'))" should print [MEMBER_ID_REDACTED].
Agent returns "no benefit record" → you wrapped the tool_result in redact_phi before sending it back to Claude. Don't — only the audit write redacts.
case_id is "default" instead of "PA-2025-0001" → the parsing logic doesn't handle your prompt format. Pass case_id explicitly via the user's first message.

Step 6: Multi-Turn Case Sessions

15 minsession.py

What & Why: Clinical reviewers ask follow-ups on the same case: "What if the provider were out-of-network?" or "What about the same procedure but for a UnitedHealth member?" Iter 1 implements multi-turn case sessions by maintaining a per-case messages list, appending the new user message, and reusing the same _run_messages helper from Step 3. A sliding window keeps the list bounded.

Create a new file session.py in the project folder:

"""session.py — multi-turn case sessions over the same _run_messages helper."""
from agent import _run_messages

SESSIONS: dict[str, list] = {}   # case_id -> messages list
WINDOW = 24                      # 5 tool calls per turn × ~5 turns + buffer

def chat(case_id: str, user_msg: str) -> str:
    """Append the user's message to the case session, run the loop, return the answer."""
    msgs = SESSIONS.setdefault(case_id, [])
    msgs.append({"role": "user", "content": user_msg})
    answer, msgs = _run_messages(msgs)
    # Sliding window: keep only the last WINDOW messages so context doesn't grow forever.
    SESSIONS[case_id] = msgs[-WINDOW:]
    return answer

Try it — multi-turn from the Python REPL:

python -c "
from session import chat
print(chat('PA-2025-0001', 'Should pre-auth PA-2025-0001 (CPT 27447, ICD M17.11, member A123456, NPI 1234567890, payer Aetna) be approved?'))
print('---')
print(chat('PA-2025-0001', 'What if the provider NPI were 9999999999 (out-of-network) instead?'))
"

Expected behavior: the second call references the same case context (CPT, ICD, member) but with the changed NPI — should produce REQUEST_INFO or DENY because the provider is out-of-network. Without session continuity, the agent would have no idea what "the same case" means.

✅ Checkpoint — Step 6

The second call references CPT 27447 / M17.11 (proving the prior context survived) and produces a DIFFERENT determination because the NPI changed. If the second call says "I need more information" without the original CPT/ICD context, the session isn't carrying over — verify SESSIONS is populated after the first call.

Troubleshooting

ImportError: cannot import name '_run_messages' from 'agent' → you skipped Step 3's refactor of agent.py. Go back and split the loop into _run_messages + run_agent.
Each call starts a fresh case → SESSIONS is module-level. If you're calling from separate Python processes (e.g., via subprocess), use a database or Redis instead.
Context window exceeded after a few turns → WINDOW = 24 may be too generous. Each tool call adds 2 messages (assistant + tool_result), so 5 tools = 10 messages per turn. Drop to 12 for shorter cases.

Step 7: Deploy as FastAPI + Docker

20 minserver.py + Dockerfile

What & Why: Wrap the agent in an HTTP API. Same Tier-1 deployment as M22B.

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from agent import run_agent
from session import chat

app = FastAPI()
class Q(BaseModel): question: str
class C(BaseModel): case_id: str; message: str

@app.get("/health")
def health(): return {"status": "ok"}

@app.post("/preauth")
def preauth(q: Q):
    try: return {"determination": run_agent(q.question)}
    except Exception as e: raise HTTPException(500, str(e))

@app.post("/chat")
def chat_ep(c: C):
    return {"answer": chat(c.case_id, c.message)}

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
ENV PYTHONUNBUFFERED=1
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8000"]

Also create requirements.txt at the project root:

anthropic>=0.40
fastapi>=0.110
uvicorn>=0.27
pydantic>=2.0

Run locally first (no Docker):

uvicorn server:app --reload --port 8000
# In another terminal:
curl localhost:8000/health
# Expected: {"status":"ok"}

curl -X POST localhost:8000/preauth -H "Content-Type: application/json" \
     -d '{"question":"Is PA-2025-0001 approvable?"}'
# Expected: {"determination":"Determination: APPROVE..."}

Then build and run with Docker:

docker build -t iter1-a .
docker run --rm -p 8000:8000 -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY iter1-a

Troubleshooting

docker: command not found → install Docker Desktop and confirm docker --version works.
OSError: [Errno 98] Address already in use → port 8000 is taken. --port 8001 for uvicorn or -p 8001:8000 for docker.
Container starts but /preauth returns 500 with "ANTHROPIC_API_KEY not set" → you forgot the -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY flag on docker run.
HIPAA concern: audit_log.jsonl is on the container's writable layer → in production, mount a volume: -v $(pwd)/audit:/app/audit and have append_audit write to /app/audit/audit_log.jsonl.

🎉 Iteration 1 Complete — End-to-End Verification

You should now have 10 files in your project folder:

agent-iter1-raw/
│── agent.py            # _run_messages + run_agent
│── tools.py            # 5 pre-auth tool schemas + execute_tool
│── mock_data/
│─   │── preauth_requests.json     # 15 cases across 5 procedures × 3 payers
│─   │── clinical_criteria.json    # CPB-0660 etc.
│─   │── provider_directory.json   # 10 providers w/ networks
│─   │── benefits.json             # 30 member-procedure records
│── guardrails.py       # validate_input + redact_phi + CircuitBreaker
│── audit.py            # HIPAA-compliant append_audit
│── session.py          # multi-turn case sessions
│── server.py           # FastAPI handlers
│── Dockerfile | requirements.txt

Run the full Iter-1 acceptance test:

# 1. Single pre-auth (should APPROVE)
curl -s -X POST localhost:8000/preauth -H "Content-Type: application/json" \
     -d '{"question":"Is PA-2025-0001 (CPT 27447, ICD M17.11, member A123456, NPI 1234567890, Aetna) approvable?"}' | python -m json.tool

# 2. Multi-turn case session (out-of-network what-if)
curl -s -X POST localhost:8000/chat -H "Content-Type: application/json" \
     -d '{"case_id":"PA-2025-0001","message":"Should this be approved? CPT 27447, ICD M17.11, member A123456, NPI 1234567890, Aetna"}' | python -m json.tool
curl -s -X POST localhost:8000/chat -H "Content-Type: application/json" \
     -d '{"case_id":"PA-2025-0001","message":"What if NPI 9999999999 (out-of-network) instead?"}' | python -m json.tool

# 3. Guardrail fires on bad CPT
curl -s -X POST localhost:8000/preauth -H "Content-Type: application/json" \
     -d '{"question":"Is CPT BAD123 approvable?"}' | python -m json.tool

# 4. HIPAA: audit log shows redacted PHI
docker exec $(docker ps -q --filter ancestor=iter1-a) cat audit_log.jsonl | head -5
# Look for "[MEMBER_ID_REDACTED]" instead of literal "A123456"

You pass Iter 1 if: (1) /health returns ok; (2) PA-2025-0001 returns APPROVE with policy CPB-0660 cited; (3) the second /chat call returns DIFFERENT determination (REQUEST_INFO or DENY) because the NPI changed; (4) the BAD123 query gets reformulated by the agent; (5) audit_log.jsonl has redacted PHI but the agent's runtime stream still resolves member benefits correctly.

Iteration 1 Metrics

Files: 10 (agent.py, tools.py, mock_data/×4, guardrails.py, audit.py, session.py, server.py, Dockerfile, requirements.txt)
Lines you wrote: ~250
Time: ~3 hours
Abstractions used: none — just anthropic Messages API and FastAPI

Debugging in Iteration 1: Print Statements + Manual Inspection

When the agent gives wrong output in Iter 1, you debug like you debug any Python program: by reading your own code. There is no abstraction between you and Claude.

A. Add a debug_turn() helper

Drop this into agent.py and call it after each messages.create:

def debug_turn(turn_num, response, messages):
    print(f"\n=== TURN {turn_num} === stop_reason: {response.stop_reason}")
    for block in response.content:
        if block.type == "tool_use":
            print(f"  TOOL: {block.name}({block.input})")
        elif block.type == "text":
            print(f"  TEXT: {block.text[:120]}...")
    print(f"  Tokens: in={response.usage.input_tokens} out={response.usage.output_tokens}")

B. Inspect messages list manually

The most common Iter-1 bug is malformed messages. Add print(json.dumps(messages, default=str, indent=2)) before each messages.create. You should see strict alternation.

C. Common Iter-1 bugs and how to spot them

Wrong tool_use_id in tool_result → 400 from API. Match the id on the tool_use block.
Forgot to append the assistant turn → API complains about message order.
Loop never stops → Claude keeps asking for tools (system prompt too vague), or you forgot to handle end_turn.
Domain-specific: agent calls verify_diagnosis_match with icd10_code = "M17" instead of "M17.11". The validation regex catches it. Check the system prompt's instruction to use the FULL ICD-10 code from the request.
Domain-specific: agent APPROVES even when check_network_status returned in_network: false. The system prompt did not enforce the gating rule firmly enough. Strengthen with: "Never APPROVE if in_network is false — that case is REQUEST_INFO."

D. Debug exercise (do this before moving on)

In tools.py, change the icd10_code regex to be too strict (require 4 digits after the dot). Re-run the agent. You should see validate_input raise a ValueError, the agent receive the error in tool_result.is_error, and either retry with a different code or fail gracefully. Restore the regex and re-run. This is the muscle memory that lets you debug Iter 3 generated code later.

SESSION 2

Iteration 2: Agent SDK + Claude Code

Now you let the SDK run the loop, hooks handle PHI redaction and validation, sessions handle multi-turn, and Claude Code does most of the typing. Same agent. Half the lines. Different debugging.

~2 hours~120 lines8 filesSDK + hooks + sessions

Step 8: Create CLAUDE.md via Claude Code

10 minCLAUDE.md

What & Why: CLAUDE.md is the project memory file Claude Code reads at every prompt. For pre-auth, this is also where you encode the decision-order rules so Claude Code can generate a system prompt that gets it right the first time.

mkdir agent-iter2-sdk && cd agent-iter2-sdk
python -m venv venv && source venv/bin/activate    # Windows: venv\Scripts\activate
pip install "claude-agent-sdk>=0.2" "fastapi>=0.110" "uvicorn>=0.27" "pydantic>=2.0"
npm i -g @anthropic-ai/claude-code   # if not already installed
claude
> /init

# Agent: Pre-Authorization Decision Agent (Iteration 2)

## Stack
- Python 3.11+
- `claude-agent-sdk` (the official Agent SDK — NOT a wrapper around `client.messages.create()`)
- FastAPI + Docker for deployment
- Mock data in mock_data/*.json (4 files)

## File Layout
- agent.py            — query() entry point + 5 @tool-decorated MCP tools + create_sdk_mcp_server
- hooks/              — PreToolUse + PostToolUse hook scripts
- .claude/settings.json — hooks registration (matchers + commands)
- sessions.py         — multi-case session resume
- server.py           — FastAPI async wrapper

## Compliance Rules (encode these in hooks)
- PHI MUST be redacted in PostToolUse hooks BEFORE writing to audit log
- PHI MUST NOT be stripped from the tool_result returned to the agent
  (the agent needs the literal member_id to chain the next tool call)
- All determinations MUST cite a policy_id

## System Prompt Rules
For every case, follow this order:
1. lookup_clinical_criteria(cpt_code, payer)
2. verify_diagnosis_match(icd10_code, criteria_id from step 1)
3. check_network_status(provider_npi, payer)
4. get_benefit_summary(member_id, cpt_code)
5. generate_determination(...)

Never APPROVE without successful match AND in-network AND covered.
REQUEST_INFO when missing data; DENY when criteria fail.

Step 9: Build Agent with @tool Decorators (claude-agent-sdk)

15 minagent.py

What & Why: The claude-agent-sdk lets you define tools as @tool-decorated async functions registered with an in-process MCP server. query() drives the loop. This is a real package — do NOT simulate it with client.messages.create().

What the Real SDK Looks Like

If you've used client.messages.create() in Iter 1, you might expect the SDK to be a thin Agent class wrapping it. It is not. The SDK is built around MCP tools + an async query() generator + options/hooks via ClaudeAgentOptions. Tools return {"content": [{"type": "text", "text": ...}]} (MCP shape), not bare Python values.

> Create agent.py using `claude-agent-sdk`. Define five pre-auth tools as
> @tool-decorated async functions returning MCP-shaped {"content":[...]}:
> lookup_clinical_criteria, verify_diagnosis_match, check_network_status,
> get_benefit_summary, generate_determination. Wire them into a
> create_sdk_mcp_server, build ClaudeAgentOptions with the system prompt
> from CLAUDE.md, and expose async run(question) driving query() and
> concatenating AssistantMessage text.

"""agent.py — claude-agent-sdk version. ~90 lines incl. 5 tools."""
import json
from pathlib import Path
from claude_agent_sdk import (
    query, tool, create_sdk_mcp_server,
    ClaudeAgentOptions, AssistantMessage,
)

DATA = Path("mock_data")
CRITERIA = json.loads((DATA / "clinical_criteria.json").read_text())
PROVIDERS = json.loads((DATA / "provider_directory.json").read_text())
BENEFITS = json.loads((DATA / "benefits.json").read_text())

@tool("lookup_clinical_criteria",
      "Get medical-necessity criteria for a CPT + payer.",
      {"cpt_code": str, "payer": str})
async def lookup_clinical_criteria(args):
    rec = CRITERIA.get(f"{args['cpt_code']}_{args['payer'].upper()}",
                       {"error": "no policy"})
    return {"content": [{"type": "text", "text": json.dumps(rec)}]}

@tool("verify_diagnosis_match",
      "Check if an ICD-10 matches the approved indications.",
      {"icd10_code": str, "criteria_id": str})
async def verify_diagnosis_match(args):
    for k, c in CRITERIA.items():
        if k == args["criteria_id"] or c.get("policy_id") == args["criteria_id"]:
            out = {"match": args["icd10_code"] in c["approved_indications"],
                   "approved_indications": c["approved_indications"]}
            return {"content": [{"type": "text", "text": json.dumps(out)}]}
    return {"content": [{"type": "text", "text": json.dumps({"error": "criteria not found"})}]}

@tool("check_network_status",
      "Return in-network status for a provider + payer.",
      {"provider_npi": str, "payer": str})
async def check_network_status(args):
    p = PROVIDERS.get(args["provider_npi"])
    if not p:
        out = {"error": "provider not found"}
    else:
        out = {"npi": p["npi"], "specialty": p["specialty"],
               "in_network": args["payer"] in p.get("networks", []),
               "payer": args["payer"]}
    return {"content": [{"type": "text", "text": json.dumps(out)}]}

@tool("get_benefit_summary",
      "Coverage, copay, deductible-remaining for member + procedure.",
      {"member_id": str, "cpt_code": str})
async def get_benefit_summary(args):
    rec = BENEFITS.get(f"{args['member_id']}_{args['cpt_code']}",
                       {"covered": False, "error": "no benefit record"})
    return {"content": [{"type": "text", "text": json.dumps(rec)}]}

@tool("generate_determination",
      "Produce final structured determination.",
      {"case_id": str, "decision": str, "rationale": str, "policy_citation": str})
async def generate_determination(args):
    assert args["decision"] in ("APPROVE", "DENY", "REQUEST_INFO")
    out = {"case_id": args["case_id"], "decision": args["decision"],
           "rationale": args["rationale"],
           "policy_citation": args.get("policy_citation", "")}
    return {"content": [{"type": "text", "text": json.dumps(out)}]}

preauth_server = create_sdk_mcp_server(
    name="preauth_tools", version="1.0.0",
    tools=[lookup_clinical_criteria, verify_diagnosis_match,
           check_network_status, get_benefit_summary, generate_determination],
)

OPTIONS = ClaudeAgentOptions(
    model="claude-sonnet-4-6",
    system_prompt=("You are a pre-auth decision agent. Always call tools in "
                   "this order: lookup_clinical_criteria, verify_diagnosis_match, "
                   "check_network_status, get_benefit_summary, generate_determination. "
                   "Never APPROVE without successful match AND in-network AND covered. "
                   "Always cite policy_id in rationale."),
    mcp_servers={"preauth": preauth_server},
    allowed_tools=[f"mcp__preauth__{n}" for n in (
        "lookup_clinical_criteria", "verify_diagnosis_match",
        "check_network_status", "get_benefit_summary", "generate_determination")],
    max_turns=12,
)

async def run(question: str) -> str:
    parts = []
    async for msg in query(prompt=question, options=OPTIONS):
        if isinstance(msg, AssistantMessage):
            for block in msg.content:
                if getattr(block, "text", None):
                    parts.append(block.text)
    return "\n".join(parts)

// agent.ts — @anthropic-ai/claude-agent-sdk version
import { query, tool, createSdkMcpServer } from "@anthropic-ai/claude-agent-sdk";
import { z } from "zod";
import * as fs from "fs";

const CRITERIA  = JSON.parse(fs.readFileSync("mock_data/clinical_criteria.json", "utf8"));
const PROVIDERS = JSON.parse(fs.readFileSync("mock_data/provider_directory.json", "utf8"));
const BENEFITS  = JSON.parse(fs.readFileSync("mock_data/benefits.json", "utf8"));

const lookupClinicalCriteria = tool(
  "lookup_clinical_criteria",
  "Get medical-necessity criteria for a CPT + payer.",
  { cpt_code: z.string(), payer: z.string() },
  async (args) => {
    const rec = CRITERIA[`${args.cpt_code}_${args.payer.toUpperCase()}`]
              ?? { error: "no policy" };
    return { content: [{ type: "text", text: JSON.stringify(rec) }] };
  }
);
// (verify_diagnosis_match, check_network_status, get_benefit_summary,
//  generate_determination defined the same way — see Python for full set)

const preauthServer = createSdkMcpServer({
  name: "preauth_tools",
  tools: [lookupClinicalCriteria, /* ... 4 more ... */],
});

const OPTIONS = {
  model: "claude-sonnet-4-6",
  systemPrompt: "You are a pre-auth decision agent. Always call tools in order: " +
                "lookup_clinical_criteria, verify_diagnosis_match, check_network_status, " +
                "get_benefit_summary, generate_determination. Never APPROVE without " +
                "match AND in-network AND covered. Always cite policy_id.",
  mcpServers: { preauth: preauthServer },
  allowedTools: [
    "mcp__preauth__lookup_clinical_criteria",
    "mcp__preauth__verify_diagnosis_match",
    "mcp__preauth__check_network_status",
    "mcp__preauth__get_benefit_summary",
    "mcp__preauth__generate_determination",
  ],
  maxTurns: 12,
};

export async function run(question: string): Promise<string> {
  const parts: string[] = [];
  for await (const msg of query({ prompt: question, options: OPTIONS })) {
    if (msg.type === "assistant") {
      for (const block of msg.content) {
        if ("text" in block) parts.push(block.text);
      }
    }
  }
  return parts.join("\n");
}

What Just Disappeared

You no longer write the message loop, the stop_reason check, the tool_result append, or JSON schema dicts. The 90-line raw loop from Iter 1 collapsed to one async for msg in query(...) (Python) / for await (TS). Same five-tool decision flow, ~90 lines.

Troubleshooting

ModuleNotFoundError: No module named 'claude_agent_sdk' → pip install "claude-agent-sdk>=0.2" in your venv.
ImportError: cannot import name 'Agent' from 'anthropic' → you're trying the old fictional API. The real SDK is claude_agent_sdk.
Tool call rejected with "not allowed" → add the tool's mcp__preauth__<name> entry to allowed_tools.

Step 10: HIPAA Hooks via .claude/settings.json + HookMatcher

20 minhooks/*.py + .claude/settings.json

What & Why: The SDK supports two hook surfaces: (a) file-based hooks in .claude/settings.json that shell out to scripts (production-friendly, language-agnostic, ideal for the audit-redaction pipeline) and (b) in-process hooks via HookMatcher in ClaudeAgentOptions(hooks={...}) for validation that needs to deny the call. We use both. Critical: the audit redactor only modifies what gets WRITTEN to the audit log, not what flows back to the agent — otherwise tool chaining breaks.

{
  "hooks": {
    "PreToolUse": [
      { "matcher": "*", "command": "python hooks/log_redacted.py" }
    ],
    "PostToolUse": [
      { "matcher": "*", "command": "python hooks/audit_redacted.py" }
    ]
  }
}

"""hooks/audit_redacted.py — redact PHI only for the audit log,
pass the original payload back to the agent unchanged."""
import sys, json, re, datetime

MEMBER_RE = re.compile(r"\b[A-Z]\d{6,9}\b")
SSN_RE    = re.compile(r"\b\d{3}-\d{2}-\d{4}\b")
PHONE_RE  = re.compile(r"\b\d{3}-\d{3}-\d{4}\b")
DOB_RE    = re.compile(r"\b\d{4}-\d{2}-\d{2}\b")

def _redact(s: str) -> str:
    s = MEMBER_RE.sub("[MEMBER_ID_REDACTED]", s)
    s = SSN_RE.sub("[SSN_REDACTED]", s)
    s = PHONE_RE.sub("[PHONE_REDACTED]", s)
    s = DOB_RE.sub("[DOB_REDACTED]", s)
    return s

payload = json.load(sys.stdin)
tool_name   = payload.get("tool_name", "")
tool_input  = payload.get("tool_input", {})
tool_result = payload.get("tool_result")

rec = {"ts": datetime.datetime.utcnow().isoformat() + "Z",
       "tool": tool_name,
       "input_redacted": _redact(json.dumps(tool_input)),
       "output_summary": _redact(json.dumps(tool_result))[:200]}
with open("audit_log.jsonl", "a") as f:
    f.write(json.dumps(rec) + "\n")

# CRITICAL: return the ORIGINAL payload so the agent gets unredacted PHI
# for chaining (e.g., member_id needs to flow into the next tool call).
json.dump(payload, sys.stdout)

"""Add to agent.py: in-process input validation via HookMatcher."""
import re
from claude_agent_sdk import HookMatcher

CPT_RE   = re.compile(r"^[0-9]{5}$|^[A-Z][0-9]{4}$")
NPI_RE   = re.compile(r"^[0-9]{10}$")
ICD10_RE = re.compile(r"^[A-Z]\d{2}(\.\d{1,4})?$")

async def validate_input(input_data, tool_use_id, context):
    name  = input_data.get("tool_name", "")
    args  = input_data.get("tool_input", {}) or {}
    fail = None
    if name.endswith("lookup_clinical_criteria") and not CPT_RE.match(args.get("cpt_code", "")):
        fail = f"Invalid CPT: {args.get('cpt_code')!r}"
    elif name.endswith("verify_diagnosis_match") and not ICD10_RE.match(args.get("icd10_code", "")):
        fail = f"Invalid ICD-10: {args.get('icd10_code')!r}"
    elif name.endswith("check_network_status") and not NPI_RE.match(args.get("provider_npi", "")):
        fail = f"Invalid NPI (must be 10 digits): {args.get('provider_npi')!r}"
    if fail:
        return {"hookSpecificOutput": {"hookEventName": "PreToolUse",
                                        "permissionDecision": "deny",
                                        "permissionDecisionReason": fail}}
    return {}

# Then update OPTIONS in agent.py:
OPTIONS = ClaudeAgentOptions(
    # ... existing fields ...
    hooks={"PreToolUse": [HookMatcher(matcher="mcp__preauth__*",
                                      hooks=[validate_input])]},
)

Create the supporting log_redacted.py hook — same stdin/stdout pattern, prints to stderr (so the log line itself doesn't accidentally leak PHI to anything that captures stdout):

"""hooks/log_redacted.py — PreToolUse: print redacted call info to stderr."""
import sys, json, re, datetime

MEMBER_RE = re.compile(r"\b[A-Z]\d{6,9}\b")
SSN_RE    = re.compile(r"\b\d{3}-\d{2}-\d{4}\b")

payload = json.load(sys.stdin)
ts = datetime.datetime.utcnow().isoformat() + "Z"
name = payload.get("tool_name", "?")
args_str = json.dumps(payload.get("tool_input", {}))
args_str = MEMBER_RE.sub("[MEMBER_ID_REDACTED]", args_str)
args_str = SSN_RE.sub("[SSN_REDACTED]", args_str)
print(f"[{ts}] PRE  {name}({args_str})", file=sys.stderr)
json.dump(payload, sys.stdout)   # CRITICAL: pass-through unchanged

Smoke-test each hook script standalone before wiring them up:

# Test the log hook (should print redacted to stderr, pass through stdout):
echo '{"tool_name":"get_benefit_summary","tool_input":{"member_id":"A123456","cpt_code":"27447"}}' \
  | python hooks/log_redacted.py

# Test the audit hook (should append a redacted line to audit_log.jsonl):
echo '{"tool_name":"get_benefit_summary","tool_input":{"member_id":"A123456"},"tool_result":{"covered":true}}' \
  | python hooks/audit_redacted.py
cat audit_log.jsonl

Then run the agent end-to-end:

python -c "import asyncio, agent; print(asyncio.run(agent.run('Is PA-2025-0001 approvable? CPT 27447 ICD M17.11 member A123456 NPI 1234567890 Aetna')))"

✅ Checkpoint — Step 10

You should see (1) [timestamp] PRE log lines on stderr with member_id REDACTED, (2) the agent's APPROVE determination on stdout with the literal member_id correctly used in the rationale, (3) audit_log.jsonl has redacted records. Try a 2-character CPT — the in-process HookMatcher validator should deny it.

⚠️ PHI Redaction is Trickier Than It Looks (still)

The agent NEEDS the unredacted member_id and provider_npi in its working memory to call the next tool. The hook script writes the REDACTED record to audit_log.jsonl, but it returns the ORIGINAL payload via stdout. If you redact the stdout payload too, the agent can't chain — get_benefit_summary will see [MEMBER_ID_REDACTED] as the member_id and 404. Test by running the agent and confirming both: (1) audit_log.jsonl has redacted records, (2) the agent still completes all 5 tool calls successfully.

Troubleshooting

Hooks don't run → the SDK looks for .claude/settings.json in cwd. Run from the project root.
Audit log shows literal member_id → you forgot to call _redact() inside the audit script. Test the redactor standalone first.
Agent fails on get_benefit_summary with "no benefit record" → you redacted the stdout payload too aggressively. Only redact what's WRITTEN to the audit log; pass through the original payload to stdout.
In-process validator never fires → check that hooks={"PreToolUse": [HookMatcher(...)]} is a kwarg on ClaudeAgentOptions with the matcher pattern "mcp__preauth__*".

Step 11: Sessions — Multi-Case + Fork

15 minsessions.py

What & Why: Pre-auth reviewers often want what-ifs: "What if the member were on UnitedHealth instead?" session.fork() clones the conversation at a point and runs a hypothetical without polluting the official decision history.

"""sessions.py — multi-case via SDK resume tokens."""
from dataclasses import replace
from claude_agent_sdk import query, AssistantMessage
from agent import OPTIONS

SESSIONS: dict[str, str] = {}   # case_id -> resume token (session_id)

async def _drive(prompt, options):
    parts, sid = [], None
    async for msg in query(prompt=prompt, options=options):
        if isinstance(msg, AssistantMessage):
            for block in msg.content:
                if getattr(block, "text", None): parts.append(block.text)
        s = getattr(msg, "session_id", None)
        if s: sid = s
    return "\n".join(parts), sid

async def chat(case_id: str, msg: str) -> str:
    resume = SESSIONS.get(case_id)
    options = replace(OPTIONS, resume=resume) if resume else OPTIONS
    text, sid = await _drive(msg, options)
    if sid: SESSIONS[case_id] = sid
    return text

async def what_if(case_id: str, hypothetical: str) -> str:
    """Fork: 'what if the payer were UnitedHealth?' — do NOT save the new sid."""
    resume = SESSIONS.get(case_id)
    options = replace(OPTIONS, resume=resume) if resume else OPTIONS
    text, _ = await _drive(hypothetical, options)
    return text

Try it — multi-case + fork demo:

python -c "
import asyncio
from sessions import chat, what_if

async def main():
    print('T1:', await chat('PA-2025-0001', 'Is this approvable? CPT 27447, ICD M17.11, member A123456, NPI 1234567890, Aetna'))
    print('FORK:', await what_if('PA-2025-0001', 'What if the payer were UnitedHealth instead?'))
    print('T2:', await chat('PA-2025-0001', 'Stick with Aetna. What about a peer-to-peer review?'))

asyncio.run(main())
"

✅ Checkpoint — Step 11

T1 produces APPROVE under Aetna. FORK shows what would happen with UnitedHealth (different policy, possibly different determination) WITHOUT polluting the main case session. T2 continues from T1 (Aetna context preserved) and addresses the peer-to-peer ask. If T2 references the UnitedHealth hypothetical, your what_if is leaking state into SESSIONS.

Step 12: Slash Commands

15 min.claude/commands/*.md (3 files)

What & Why: /run-preauth PA-2025-0001, /test-agent, /eval-agent turn the agent into a one-line workflow inside Claude Code. Reviewers can adjudicate cases without leaving their IDE.

Create 3 files in .claude/commands/:

---
description: Run the pre-auth agent on a case_id from preauth_requests.json
argument-hint: [case_id]
---
Look up case $ARGUMENTS in mock_data/preauth_requests.json. Build the question
string. Run `python -c "import asyncio, agent; print(asyncio.run(agent.run(q)))"`
where q is the question. Print determination, rationale, policy citation,
total tokens, total cost from the ResultMessage emitted by query().

---
description: Run the unit test suite for the pre-auth agent
---
Run `pytest tests/ -v`. Critical tests: test_phi_not_in_audit (PHI must be
redacted in audit_log.jsonl), test_member_id_passes_to_get_benefit_summary
(unredacted in agent stream), test_approves_pa_2025_0001 (canonical case
should APPROVE with policy CPB-0660 cited).

---
description: Run the 15-case evaluation suite
---
Read test_scenarios.json (15 pre-auth cases: 11 APPROVE, 2 DENY, 2 REQUEST_INFO).
For each, call agent.run(), score on: correct decision, policy_id cited, all 5
tools called, tone is appropriate for clinical context. Report per-case score
and overall percentage.

✅ Checkpoint — Step 12

When you type / in Claude Code, the three commands appear in autocomplete. /run-preauth PA-2025-0001 produces the APPROVE determination. /test-agent requires a tests/ folder — in Iter 3 the spec generates these for you.

Step 13: Deploy via Claude Code

15 minserver.py + Dockerfile

What & Why: Same FastAPI + Docker pattern as Iter 1, but Claude Code writes it.

> Create server.py and Dockerfile. Endpoints: GET /health,
> POST /preauth (single-shot determination), POST /chat (case_id + message).
> Async FastAPI handlers awaiting agent.run() and sessions.chat() (both are
> async coroutines from claude-agent-sdk). python:3.11-slim base, install
> claude-agent-sdk + dependencies, expose 8000. Mount .claude/ into the
> container so settings.json + hook scripts resolve at runtime.

"""server.py — async FastAPI wrapper around the SDK pre-auth agent."""
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from agent import run as agent_run
from sessions import chat as session_chat

app = FastAPI(title="Pre-Auth Decision Agent (Iter 2 — SDK)")

class Q(BaseModel): question: str
class C(BaseModel): case_id: str; message: str

@app.get("/health")
def health(): return {"status": "ok", "iter": 2}

@app.post("/preauth")
async def preauth(q: Q):
    try: return {"determination": await agent_run(q.question)}
    except Exception as e: raise HTTPException(500, str(e))

@app.post("/chat")
async def chat_ep(c: C):
    try: return {"answer": await session_chat(c.case_id, c.message)}
    except Exception as e: raise HTTPException(500, str(e))

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
ENV PYTHONUNBUFFERED=1
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8000"]

Run locally first, then Docker:

uvicorn server:app --reload --port 8000
# In another terminal:
curl localhost:8000/health
curl -X POST localhost:8000/preauth -H "Content-Type: application/json" \
     -d '{"question":"Is PA-2025-0001 approvable?"}'

# Then build the container (mounts .claude/ at COPY time):
docker build -t iter2-a .
docker run --rm -p 8000:8000 -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY iter2-a

🎉 Iteration 2 Complete — End-to-End Verification

You should now have ~14 files in your project:

agent-iter2-sdk/
│── CLAUDE.md
│── agent.py            # query() + 5 @tool functions + create_sdk_mcp_server
│── sessions.py         # chat() + what_if() over SDK resume tokens
│── mock_data/ ×4       # same as Iter 1
│── .claude/
│─   │── settings.json
│─   │── commands/run-preauth.md, test-agent.md, eval-agent.md
│── hooks/
│─   │── log_redacted.py, audit_redacted.py
│── server.py | Dockerfile | requirements.txt

Acceptance test — same shape as Iter 1:

# 1. Single pre-auth (should APPROVE, same as Iter 1)
curl -s -X POST localhost:8000/preauth -H "Content-Type: application/json" \
     -d '{"question":"Is PA-2025-0001 approvable?"}' | python -m json.tool

# 2. Multi-case with SDK resume tokens
curl -s -X POST localhost:8000/chat -H "Content-Type: application/json" \
     -d '{"case_id":"PA-2025-0001","message":"Is this approvable? CPT 27447, ICD M17.11"}' | python -m json.tool
curl -s -X POST localhost:8000/chat -H "Content-Type: application/json" \
     -d '{"case_id":"PA-2025-0001","message":"What about a peer-to-peer review?"}' | python -m json.tool

# 3. PHI verification
docker exec $(docker ps -q --filter ancestor=iter2-a) cat audit_log.jsonl | head -5
# Look for [MEMBER_ID_REDACTED] but the agent's response should still resolve benefits.

You pass Iter 2 if: all 3 outputs are functionally equivalent to Iter 1, but you wrote ~120 lines instead of ~250. Iter-2's hooks and SDK do the work the Iter-1 loop did manually.

Iteration 2 Metrics

Files: ~10 (CLAUDE.md, agent.py, sessions.py, server.py, Dockerfile, .claude/settings.json, hooks/{log_redacted,audit_redacted}.py, slash commands) + 4 mock JSON
Lines you wrote: ~120
Time: ~2 hours
Abstractions used: claude-agent-sdk (query / @tool / MCP server / ClaudeAgentOptions / HookMatcher) + .claude/settings.json + Claude Code

Iteration	Primary debug method	Secondary	Speed to fix
1 Raw	print() in the loop	Manual message inspection	Slow (find the line)
2 SDK	Hooks + Console Web UI	Langfuse traces	Medium (modular probes)
3 Spec	Spec vs code comparison	Tests + evals + Console	Fast (Claude Code finds it)

Metric	Iter 1: Raw API	Iter 2: Agent SDK	Iter 3: Spec-Driven
Lines YOU wrote	~250	~120	~100 (spec only)
Time to build	~3 hours	~2 hours	~1 hour
Agent output	Baseline	Same	Same
PHI redaction	Inline in loop	One `.claude/settings.json` entry + a 20-line stdin/stdout script	5 lines in spec section 5
Multi-case sessions	Manual history dict	SDK sessions	SDK sessions (generated)
Adding a 6th tool (peer-to-peer)	Edit 3 files + add validation	One Claude Code prompt	Update spec, ask to regen
Tests (must pass for HIPAA)	Manual	Claude Code generated	Spec generates them
Documentation for auditors	Separate	CLAUDE.md	Spec IS the doc auditors read
Control over internals	Full	SDK-managed	Least direct (but reviewable)
Understanding needed	Every line	SDK abstractions	Architecture-level
Debugging	print() in loop	Hooks + Console + Langfuse	Spec compare + tests + evals
Onboarding a new clinician-engineer	Read 7 files	Read CLAUDE.md + 8 files	Read 1 spec file

Capstone 7-A — Agent Evolution: Healthcare Pre-Auth

Project Brief

The Three-Iteration Concept

The Scenario — Pre-Auth Decision Agent

Tools (5)

Mock Data Shape

Animation 1: Three-Lane Evolution

Animation 2: Code Size Waterfall

Animation 3: Time Comparison

Animation 4: Architecture Per Iteration

Animation 5: Spec-to-Code Flow

Prerequisites

Iteration 1: Raw API Loop

Debugging in Iteration 1: Print Statements + Manual Inspection

Iteration 2: Agent SDK + Claude Code

Debugging in Iteration 2: Hooks + Console Web UI + Langfuse

Iteration 3: Spec-Driven

Debugging in Iteration 3: Spec Comparison + Tests + Evals

The Comparison Table

Grading Rubric

Reflection Prompts

Knowledge Check

Q1: All three iterations produce the same pre-auth determination. What is the most defensible reason to still go through Iteration 1 rather than skipping straight to Iteration 3?

Q2: Your post_tool_use hook redacts PHI from `tool_result` AND returns the redacted value to the agent. What breaks?

Q3: In Iteration 3, an eval case fails because the agent APPROVED a request from an out-of-network provider. The CORRECT fix is:

Q4: The most common Iteration 1 bug in the pre-auth scenario is:

Q5: You need to add ICD-10 hierarchy lookup (e.g., "M17.11" should match a criteria entry of "M17"). Which iteration requires the LEAST disruptive change?

Q6: When would you NOT pick Iteration 3 (spec-driven) for a real production pre-auth system?

Q7: The agent sometimes calls `verify_diagnosis_match` with `icd10_code = "M17"` instead of `"M17.11"` from the request. The bug is most likely:

Going Further (Optional)

Capstone 7-A — Agent Evolution: Healthcare Pre-Auth

Project Brief

The Three-Iteration Concept

The Scenario — Pre-Auth Decision Agent

Tools (5)

Mock Data Shape

Animation 1: Three-Lane Evolution

Animation 2: Code Size Waterfall

Animation 3: Time Comparison

Animation 4: Architecture Per Iteration

Animation 5: Spec-to-Code Flow

Prerequisites

Iteration 1: Raw API Loop

Debugging in Iteration 1: Print Statements + Manual Inspection

Iteration 2: Agent SDK + Claude Code

Debugging in Iteration 2: Hooks + Console Web UI + Langfuse

Iteration 3: Spec-Driven

Debugging in Iteration 3: Spec Comparison + Tests + Evals

The Comparison Table

Grading Rubric

Reflection Prompts

Knowledge Check

Q1: All three iterations produce the same pre-auth determination. What is the most defensible reason to still go through Iteration 1 rather than skipping straight to Iteration 3?

Q2: Your post_tool_use hook redacts PHI from tool_result AND returns the redacted value to the agent. What breaks?

Q3: In Iteration 3, an eval case fails because the agent APPROVED a request from an out-of-network provider. The CORRECT fix is:

Q4: The most common Iteration 1 bug in the pre-auth scenario is:

Q5: You need to add ICD-10 hierarchy lookup (e.g., "M17.11" should match a criteria entry of "M17"). Which iteration requires the LEAST disruptive change?

Q6: When would you NOT pick Iteration 3 (spec-driven) for a real production pre-auth system?

Q7: The agent sometimes calls verify_diagnosis_match with icd10_code = "M17" instead of "M17.11" from the request. The bug is most likely:

Going Further (Optional)

Q2: Your post_tool_use hook redacts PHI from `tool_result` AND returns the redacted value to the agent. What breaks?

Q7: The agent sometimes calls `verify_diagnosis_match` with `icd10_code = "M17"` instead of `"M17.11"` from the request. The bug is most likely: