Building AI Agents with Claude
Capstone Project 3
Capstone 3 of 53–4 hoursDomain C — Public Records / UCC
← Capstone 2-C 🏠 Home M23: Capstone Guide →

Capstone 3 — Domain C: Entity Resolution Agent

Build a ReAct agent that reasons through multi-step entity resolution — searching filings, computing fuzzy matches, cross-referencing registries, and merging entity profiles.

Prerequisites

Complete M05 (Function Calling), M06 (Multi-Tool Orchestration), M12 (ReAct Agent Loop), and M13 (Planning & Task Decomposition) before starting this capstone. You should be comfortable defining tool schemas, handling tool_use / tool_result message flows, and building agentic loops that check stop_reason.

Project Brief

Business Context

Business names in UCC filings are notoriously inconsistent. The same company appears as "Acme Logistics LLC", "ACME LOGISTICS, L.L.C.", "Acme Logistics Company", and "Acme Logistics Inc." across different states. A human analyst looking at these four names must decide: are these all the same company, or four different companies?

The pain is scale and subtlety. A commercial data provider processes millions of filings. Manual entity resolution costs $2–5 per entity. At 100,000 entities per month, that is $200K–$500K annually in analyst time. Worse, humans are inconsistent — one analyst merges two entities, another keeps them separate, and the database has conflicting records.

Your agent automates this: given a business name, it searches filings across states, computes fuzzy match scoresNumerical similarity scores between two text strings computed using algorithms like Levenshtein distance, Jaro-Winkler, and token sort ratio. Higher scores indicate more similar strings. Used to determine whether two different-looking business names refer to the same real entity. between candidates, cross-references official business registries, and produces a merged entity profile with a confidence score. The agent must REASON about each match — "ACME LOGISTICS, L.L.C." vs "Acme Logistics LLC" is clearly the same entity, but "Acme Logistics Inc." in a different state needs more investigation.

What You Will Build

A ReAct agentAn agent that follows the Reason-Act-Observe loop: it thinks about what to do next (Thought), takes an action by calling a tool (Action), observes the result (Observation), then reasons again. This explicit reasoning chain makes the agent's decision process transparent and debuggable. with 5 tools that implements the Thought-Action-Observation cycle for entity resolution:

  • search_filings_by_name — find filing candidates across states
  • fuzzy_match_score — compute similarity between name pairs
  • get_filing_details — retrieve full filing data for comparison
  • get_business_registry_data — cross-reference official SOS registrations
  • merge_entity_profile — create a unified profile from matched entities

The key challenge: the agent must decide at each step what to do next based on what it just learned. If fuzzy match scores are high, proceed to merge. If they are ambiguous, gather more evidence from the registry. If they are low, mark as distinct. This branching logic is what makes it a reasoning agent, not a script.

Environment Setup

You need Python 3.10+ or Node.js 18+ and an Anthropic API key. Run these commands to create your project:

mkdir capstone-3-entity-resolution && cd capstone-3-entity-resolution
python -m venv venv && source venv/bin/activate   # Windows: venv\Scripts\Activate.ps1

# Pin dependencies for reproducibility
echo "anthropic>=0.40.0" > requirements.txt
pip install -r requirements.txt
export ANTHROPIC_API_KEY=your-key-here             # Windows: set ANTHROPIC_API_KEY=your-key-here

File Structure

capstone-3-entity-resolution/
├── entity_tools.py      # Fuzzy matching + mock data tools
├── entity_agent.py      # Entity resolution agent with ReAct loop
├── entity_tools.ts      # TypeScript tools
├── entity_agent.ts      # TypeScript agent
└── requirements.txt     # Dependencies

Domain Glossary

Entity Resolution
Determining whether two or more records refer to the same real-world entity. Also called record linkage, deduplication, or entity matching.
Fuzzy Matching
Comparing strings using approximate algorithms (Levenshtein, Jaro-Winkler, token sort) that return a similarity score rather than requiring exact equality.
Canonical Name
The "official" version of an entity's name chosen as the standard reference. Usually matches the SOS business registration.
Business Registry
The official state database of registered business entities (LLCs, corporations). Contains the legal entity name, formation date, status, and registered agent.
Confidence Score
A 0.0-1.0 number indicating how certain the agent is that two entities are the same. Above 0.9 = definite match. 0.7-0.9 = likely match. Below 0.7 = needs more evidence.
Token Sort Ratio
A fuzzy matching algorithm that sorts the words in both strings alphabetically before comparing. "Acme Logistics LLC" and "LLC Logistics Acme" would score 100% — order does not matter.

ReAct Architecture

The agent follows a Thought → Action → Observation loop. Each cycle, it reasons about what it knows, what it needs, and which tool to call next. Here is a typical resolution trace:

ReAct Reasoning Trace — Entity Resolution
THOUGHTI need to find UCC filings for "Acme Logistics LLC". Let me search across states.
ACTIONsearch_filings_by_name("Acme Logistics LLC", match_type="fuzzy")
OBSERVE3 candidates: "ACME LOGISTICS, L.L.C." (DE), "Acme Logistics Company" (DE), "Acme Logistics Inc." (NY)
THOUGHTI have 3 candidates. Let me check fuzzy match scores for the first one.
ACTIONfuzzy_match_score("Acme Logistics LLC", "ACME LOGISTICS, L.L.C.")
OBSERVEtoken_sort_ratio: 0.98, recommendation: "likely_match"
THOUGHTHigh match score (0.98). Same state (DE). Let me verify with the business registry.
ACTIONget_business_registry_data("Acme Logistics LLC", "DE")
OBSERVERegistered: "Acme Logistics LLC", address: 456 Commerce Blvd, Dover, DE — MATCH
RESULTMerge confirmed: 2 DE entities are the same. NY entity needs separate investigation.
Entity Merge Decision Flow
Acme Logistics LLC
Input entity — DE
↓ fuzzy search
ACME LOGISTICS, L.L.C.
DE — 3 filings
0.98 MATCH
Acme Logistics Company
DE — 1 filing
0.85 LIKELY
Acme Logistics Inc.
NY — 2 filings
0.72 POSSIBLE
↓ registry verification
DE entities merged
Same address, same registry
NY entity: separate
Different registry, different address
↓ final profile
Acme Logistics LLC (canonical)
4 filings across DE — 2 secured parties — confidence: 0.94

Mock Data Specification

{
  "request_id": "ER-2024-0078",
  "input_name": "Acme Logistics LLC",
  "input_state": "DE",
  "candidates": [
    {"name": "ACME LOGISTICS, L.L.C.", "state": "DE", "filing_count": 3},
    {"name": "Acme Logistics Company", "state": "DE", "filing_count": 1},
    {"name": "Acme Logistics Inc.", "state": "NY", "filing_count": 2}
  ]
}
{
  "entity_a": "Acme Logistics LLC",
  "entity_b": "ACME LOGISTICS, L.L.C.",
  "scores": {
    "exact": 0.45,
    "normalized": 0.95,
    "token_sort_ratio": 0.98
  },
  "recommendation": "likely_match"
}
// Note: levenshtein and jaro_winkler are stretch metrics — install
// `rapidfuzz` (`pip install rapidfuzz`) and add them to the scores
// dict to enable. The base capstone uses three deterministic scores.

Implementation Phases

  1. Phase 1 — Mock Tools (60 min): Implement all 5 tools with mock data covering 3+ entities, name variations, and edge cases (registry not found, conflicting data).
  2. Phase 2 — System Prompt (30 min): Write a ReAct-style system prompt that instructs Claude to externalize reasoning, plan before acting, and adapt strategy based on intermediate results.
  3. Phase 3 — ReAct Loop (45 min): Implement the agentic loop with stop_reason checking. The loop must handle 5+ tool calls in a single resolution and branch based on fuzzy match results.
  4. Phase 4 — Branching Logic (30 min): Ensure the agent adapts: high confidence → merge directly. Ambiguous → gather more evidence from registry. Low confidence → mark as distinct.
  5. Phase 5 — Error Recovery (30 min): Handle mid-chain failures: registry unavailable, timeout on filing search. Agent should proceed with partial data and lower confidence.
  6. Phase 6 — Testing (45 min): Run 10 test cases verifying correct merges, correct separations, and proper handling of ambiguous/adversarial inputs.

Step 1: Create entity_tools.py

What & Why

What: You will create a file containing 5 mock tool functions that simulate searching UCC filings, computing fuzzy match scores, retrieving filing details, querying business registries, and merging entity profiles.

Why: Mock tools let you develop and test the agent's reasoning logic without calling real APIs. Every tool returns structured data with explicit error handling, so the agent always knows whether a call succeeded or failed and can adapt its strategy accordingly.

Create the file entity_tools.py and paste the following code:

# entity_tools.py — Mock tools for entity resolution
import re

# --- Tool 1: Search filings by name ---
MOCK_CANDIDATES = {
    "acme logistics llc": [
        {"name": "ACME LOGISTICS, L.L.C.", "state": "DE", "filing_count": 3, "most_recent": "2024-01-15"},
        {"name": "Acme Logistics Company", "state": "DE", "filing_count": 1, "most_recent": "2023-08-20"},
        {"name": "Acme Logistics Inc.", "state": "NY", "filing_count": 2, "most_recent": "2023-12-01"},
    ],
    "buildright construction": [
        {"name": "BuildRight Construction LLC", "state": "NY", "filing_count": 1, "most_recent": "2024-01-10"},
        {"name": "Build Right Construction", "state": "NY", "filing_count": 1, "most_recent": "2022-03-15"},
    ],
}

def search_filings_by_name(business_name: str, state: str | None = None, match_type: str = "fuzzy") -> dict:
    try:
        key = business_name.lower().strip()
        candidates = MOCK_CANDIDATES.get(key, [])
        if state:
            candidates = [c for c in candidates if c["state"] == state.upper()]
        if not candidates:
            return {"is_error": True, "error_category": "NO_RESULTS", "is_retryable": False,
                    "context": f"No filings found for '{business_name}'"}
        return {"is_error": False, "candidates": candidates, "total": len(candidates)}
    except Exception as e:
        return {"is_error": True, "error_category": "INTERNAL_ERROR", "is_retryable": True, "context": str(e)}


# --- Tool 2: Fuzzy match score ---
# NOTE: punctuation is replaced with spaces BEFORE this regex runs, so
# "L.L.C." becomes "l l c" — match that form (with spaces), not "l.l.c."
_SUFFIX_RE = re.compile(
    r'\b(llc|l l c|inc|incorporated|corp|corporation|company|co|ltd)\b'
)

def _normalize(name: str) -> str:
    name = name.lower().strip()
    name = re.sub(r'[,.\-]', ' ', name)
    name = re.sub(r'\s+', ' ', name)
    # Strip ONLY whole-word suffixes (not substrings — "Coca" must
    # not lose "co", "Incorporated" must outrank "corp")
    name = _SUFFIX_RE.sub('', name)
    name = re.sub(r'\s+', ' ', name).strip()
    return name

def fuzzy_match_score(entity_a: str, entity_b: str) -> dict:
    try:
        if not entity_a or not entity_b:
            return {"is_error": True, "error_category": "EMPTY_INPUT", "is_retryable": False, "context": "Both entities required"}
        norm_a, norm_b = _normalize(entity_a), _normalize(entity_b)
        exact = 1.0 if entity_a.lower() == entity_b.lower() else round(len(set(entity_a.lower()) & set(entity_b.lower())) / max(len(set(entity_a.lower())), 1), 2)
        normalized = 1.0 if norm_a == norm_b else round(len(set(norm_a.split()) & set(norm_b.split())) / max(len(set(norm_a.split()) | set(norm_b.split())), 1), 2)
        token_sort = round(normalized * 1.03, 2) if normalized > 0.8 else normalized  # boost for high overlap
        token_sort = min(token_sort, 1.0)
        avg = round((exact + normalized + token_sort) / 3, 2)
        if avg >= 0.85: rec = "likely_match"
        elif avg >= 0.65: rec = "possible_match"
        else: rec = "unlikely_match"
        return {"is_error": False, "entity_a": entity_a, "entity_b": entity_b,
                "scores": {"exact": exact, "normalized": normalized, "token_sort_ratio": token_sort},
                "recommendation": rec}
    except Exception as e:
        return {"is_error": True, "error_category": "INTERNAL_ERROR", "is_retryable": True, "context": str(e)}


# --- Tool 3: Get filing details ---
MOCK_FILINGS = {
    ("acme logistics, l.l.c.", "DE"): {"filings": [
        {"filing_number": "2023-1234567", "secured_party": "First National Bank", "collateral": "All inventory and equipment", "status": "active", "estimated_amount": 750_000},
        {"filing_number": "2022-9876543", "secured_party": "Delaware Capital Partners", "collateral": "All vehicles", "status": "active", "estimated_amount": 350_000},
        {"filing_number": "2024-0011223", "secured_party": "First National Bank", "collateral": "Accounts receivable", "status": "active", "estimated_amount": 1_200_000},
    ]},
}

def get_filing_details(business_name: str, state: str) -> dict:
    try:
        key = (business_name.lower().strip(), state.upper())
        result = MOCK_FILINGS.get(key)
        if result:
            return {"is_error": False, **result}
        return {"is_error": False, "filings": []}
    except Exception as e:
        return {"is_error": True, "error_category": "INTERNAL_ERROR", "is_retryable": True, "context": str(e)}


# --- Tool 4: Business registry ---
MOCK_REGISTRY = {
    ("acme logistics llc", "DE"): {
        "entity_name": "Acme Logistics LLC", "state": "DE", "entity_type": "LLC",
        "file_number": "DE-LLC-2019-4567890", "formation_date": "2019-03-15",
        "status": "active", "principal_address": "456 Commerce Blvd, Dover, DE 19901",
    },
    ("acme logistics inc.", "NY"): {
        "entity_name": "Acme Logistics Inc.", "state": "NY", "entity_type": "Corporation",
        "file_number": "NY-CORP-2020-1234567", "formation_date": "2020-07-01",
        "status": "active", "principal_address": "100 Broadway, New York, NY 10001",
    },
}

def get_business_registry_data(business_name: str, state: str) -> dict:
    try:
        key = (business_name.lower().strip(), state.upper())
        result = MOCK_REGISTRY.get(key)
        if result:
            return {"is_error": False, **result}
        return {"is_error": True, "error_category": "NOT_FOUND", "is_retryable": False,
                "context": f"No registry entry for '{business_name}' in {state}"}
    except Exception as e:
        return {"is_error": True, "error_category": "INTERNAL_ERROR", "is_retryable": True, "context": str(e)}


# --- Tool 5: Merge entity profile ---
def merge_entity_profile(primary_entity: dict, merge_candidates: list, confidence: float) -> dict:
    try:
        if confidence < 0.5:
            return {"is_error": True, "error_category": "INSUFFICIENT_EVIDENCE",
                    "is_retryable": False, "context": f"Confidence {confidence} below 0.5 threshold"}
        total_filings = sum(c.get("filing_count", 0) for c in merge_candidates) + primary_entity.get("filing_count", 0)
        total_lien_exposure = sum(c.get("estimated_amount", 0) for c in merge_candidates) + primary_entity.get("estimated_amount", 0)
        return {"is_error": False,
                "merged_profile_id": f"MP-{primary_entity.get('name', 'unknown')[:10]}-{confidence:.0%}",
                "canonical_name": primary_entity.get("name", "Unknown"),
                "total_filings": total_filings,
                "total_lien_exposure": total_lien_exposure,
                "states": list(set([primary_entity.get("state", "")] + [c.get("state", "") for c in merge_candidates])),
                "confidence": confidence,
                "merge_log": [f"Merged '{c.get('name', '')}' (score: {c.get('match_score', 'N/A')})" for c in merge_candidates]}
    except Exception as e:
        return {"is_error": True, "error_category": "INTERNAL_ERROR", "is_retryable": True, "context": str(e)}
// entity_tools.ts — Mock tools for entity resolution

interface ToolResult { is_error: boolean; [key: string]: unknown; }

const MOCK_CANDIDATES: Record<string, Array<{name:string;state:string;filing_count:number;most_recent:string}>> = {
  "acme logistics llc": [
    {name:"ACME LOGISTICS, L.L.C.",state:"DE",filing_count:3,most_recent:"2024-01-15"},
    {name:"Acme Logistics Company",state:"DE",filing_count:1,most_recent:"2023-08-20"},
    {name:"Acme Logistics Inc.",state:"NY",filing_count:2,most_recent:"2023-12-01"},
  ],
  "buildright construction": [
    {name:"BuildRight Construction LLC",state:"NY",filing_count:1,most_recent:"2024-01-10"},
    {name:"Build Right Construction",state:"NY",filing_count:1,most_recent:"2022-03-15"},
  ],
};

export function searchFilingsByName(name: string, state?: string): ToolResult {
  try {
    let cands = MOCK_CANDIDATES[name.toLowerCase().trim()] ?? [];
    if (state) cands = cands.filter(c => c.state === state.toUpperCase());
    if (!cands.length) return {is_error:true,error_category:"NO_RESULTS",is_retryable:false,context:`No filings for '${name}'`};
    return {is_error:false,candidates:cands,total:cands.length};
  } catch(e) { return {is_error:true,error_category:"INTERNAL_ERROR",is_retryable:true,context:String(e)}; }
}

function normalize(n: string): string {
  return n.toLowerCase().replace(/[,.\-]/g,' ').replace(/\s+/g,' ')
    .replace(/\b(llc|l l c|inc|incorporated|corp|corporation|company|co|ltd)\b/g,'').trim();
}

export function fuzzyMatchScore(a: string, b: string): ToolResult {
  try {
    if (!a || !b) return {is_error:true,error_category:"EMPTY_INPUT",is_retryable:false,context:"Both entities required"};
    const na = normalize(a), nb = normalize(b);
    const tokensA = new Set(na.split(' ')), tokensB = new Set(nb.split(' '));
    const inter = [...tokensA].filter(t => tokensB.has(t)).length;
    const union = new Set([...tokensA,...tokensB]).size;
    const normalized = Math.round((inter / Math.max(union,1)) * 100) / 100;
    const tokenSort = Math.min(Math.round(normalized * 103) / 100, 1);
    const avg = Math.round(((normalized + tokenSort) / 2) * 100) / 100;
    const rec = avg >= 0.85 ? "likely_match" : avg >= 0.65 ? "possible_match" : "unlikely_match";
    return {is_error:false,entity_a:a,entity_b:b,scores:{normalized,token_sort_ratio:tokenSort},recommendation:rec};
  } catch(e) { return {is_error:true,error_category:"INTERNAL_ERROR",is_retryable:true,context:String(e)}; }
}

const MOCK_FILINGS: Record<string, {filings: Array<{filing_number:string;secured_party:string;collateral:string;status:string;estimated_amount:number}>}> = {
  "acme logistics, l.l.c.|DE": {filings:[
    {filing_number:"2023-1234567",secured_party:"First National Bank",collateral:"All inventory and equipment",status:"active",estimated_amount:750000},
    {filing_number:"2022-9876543",secured_party:"Delaware Capital Partners",collateral:"All vehicles",status:"active",estimated_amount:350000},
    {filing_number:"2024-0011223",secured_party:"First National Bank",collateral:"Accounts receivable",status:"active",estimated_amount:1200000},
  ]},
};

export function getFilingDetails(businessName: string, state: string): ToolResult {
  try {
    const key = `${businessName.toLowerCase().trim()}|${state.toUpperCase()}`;
    const r = MOCK_FILINGS[key];
    if (r) return {is_error:false,...r};
    return {is_error:false,filings:[]};
  } catch(e) { return {is_error:true,error_category:"INTERNAL_ERROR",is_retryable:true,context:String(e)}; }
}

const MOCK_REGISTRY: Record<string, object> = {
  "acme logistics llc|DE": {entity_name:"Acme Logistics LLC",state:"DE",entity_type:"LLC",file_number:"DE-LLC-2019-4567890",status:"active",principal_address:"456 Commerce Blvd, Dover, DE 19901"},
  "acme logistics inc.|NY": {entity_name:"Acme Logistics Inc.",state:"NY",entity_type:"Corporation",file_number:"NY-CORP-2020-1234567",status:"active",principal_address:"100 Broadway, New York, NY 10001"},
};

export function getBusinessRegistryData(name: string, state: string): ToolResult {
  try {
    const r = MOCK_REGISTRY[`${name.toLowerCase().trim()}|${state.toUpperCase()}`];
    if (r) return {is_error:false,...r};
    return {is_error:true,error_category:"NOT_FOUND",is_retryable:false,context:`No registry for '${name}' in ${state}`};
  } catch(e) { return {is_error:true,error_category:"INTERNAL_ERROR",is_retryable:true,context:String(e)}; }
}

export function mergeEntityProfile(primary: {name:string;state:string;filing_count?:number;estimated_amount?:number}, candidates: Array<{name:string;state:string;match_score?:number;filing_count?:number;estimated_amount?:number}>, confidence: number): ToolResult {
  try {
    if (confidence < 0.5) return {is_error:true,error_category:"INSUFFICIENT_EVIDENCE",is_retryable:false,context:`Confidence ${confidence} below threshold`};
    const total = (primary.filing_count ?? 0) + candidates.reduce((s,c) => s + (c.filing_count ?? 0), 0);
    const totalLienExposure = (primary.estimated_amount ?? 0) + candidates.reduce((s,c) => s + (c.estimated_amount ?? 0), 0);
    return {is_error:false,canonical_name:primary.name,total_filings:total,total_lien_exposure:totalLienExposure,confidence,
      states:[...new Set([primary.state,...candidates.map(c=>c.state)])],
      merge_log:candidates.map(c => `Merged '${c.name}' (score: ${c.match_score ?? 'N/A'})`)};
  } catch(e) { return {is_error:true,error_category:"INTERNAL_ERROR",is_retryable:true,context:String(e)}; }
}

Quick verify — run this in your terminal:

python -c "from entity_tools import search_filings_by_name; print(search_filings_by_name('Acme Logistics LLC'))"

Expected output:

{'is_error': False, 'candidates': [{'name': 'ACME LOGISTICS, L.L.C.', 'state': 'DE', 'filing_count': 3, 'most_recent': '2024-01-15'}, {'name': 'Acme Logistics Company', 'state': 'DE', 'filing_count': 1, 'most_recent': '2023-08-20'}, {'name': 'Acme Logistics Inc.', 'state': 'NY', 'filing_count': 2, 'most_recent': '2023-12-01'}], 'total': 3}
Checkpoint: Step 1 Complete

You built 5 interconnected tools. The agent will call them in sequence based on its reasoning: first search_filings_by_name to find candidates, then fuzzy_match_score to compare each pair, then get_business_registry_data to verify matches with official records, and finally merge_entity_profile to create the unified profile. The fuzzy matching uses a simplified name normalization that strips legal suffixes (LLC, Inc, Corp) before comparing — because "Acme Logistics LLC" and "Acme Logistics Inc." are the same business name with different entity type designations.

Troubleshooting

If you see ModuleNotFoundError: No module named 'entity_tools' when testing the import, make sure you saved the file as entity_tools.py (not entity-tools.py) and your terminal is in the same directory as the file.

Step 2: Create entity_agent.py

What & Why

What: You will create the ReAct agent that defines tool schemas for Claude, writes a system prompt encoding entity resolution decision criteria, and implements the agentic loop that routes tool calls to your mock functions.

Why: This is the core of the capstone — the agent must reason about which tool to call next based on intermediate results. High fuzzy scores lead directly to merge; ambiguous scores trigger registry lookups for more evidence; low scores mark entities as distinct. The loop caps at 15 iterations to prevent runaway resolution chains.

Create the file entity_agent.py and paste the following code:

# entity_agent.py — ReAct Entity Resolution Agent
import anthropic, json
from entity_tools import (search_filings_by_name, fuzzy_match_score,
    get_filing_details, get_business_registry_data, merge_entity_profile)

SYSTEM_PROMPT = """You are an Entity Resolution Agent for a commercial data provider.
Given a business entity name, you determine whether UCC filing records
under different name variations refer to the same real-world company.

## Your Reasoning Process (ReAct)
For each resolution request, think step by step:
1. THINK: What do I know? What do I need to find out?
2. ACT: Call the appropriate tool to gather evidence.
3. OBSERVE: What did the tool return? What does it tell me?
4. REPEAT until you have enough evidence to make a decision.

## Decision Criteria
- token_sort_ratio >= 0.90 AND same state AND same address → MERGE (high confidence)
- token_sort_ratio >= 0.80 AND same state → LIKELY MERGE (verify with registry)
- token_sort_ratio >= 0.70 AND different state → INVESTIGATE (check registry)
- token_sort_ratio < 0.70 → DISTINCT ENTITY

## Output Format
After resolution, call merge_entity_profile with your findings.
Include a confidence score (0.0-1.0) based on evidence strength.
Explain your reasoning for each merge/separate decision.

## Rules
- Always check the business registry for ambiguous matches.
- If registry data is unavailable, lower your confidence accordingly.
- Never force a merge — flag conflicts for human review.
- If there are 10+ candidates, filter by state before matching."""

TOOLS = [
    {"name": "search_filings_by_name", "description": "Search UCC filings by business name across states. Returns candidate entities with filing counts.",
     "input_schema": {"type": "object", "properties": {"business_name": {"type": "string"}, "state": {"type": "string"}, "match_type": {"type": "string", "enum": ["exact", "fuzzy"]}}, "required": ["business_name"]}},
    {"name": "fuzzy_match_score", "description": "Compute similarity scores between two entity names using multiple algorithms.",
     "input_schema": {"type": "object", "properties": {"entity_a": {"type": "string"}, "entity_b": {"type": "string"}}, "required": ["entity_a", "entity_b"]}},
    {"name": "get_filing_details", "description": "Get full filing details for an entity in a state.",
     "input_schema": {"type": "object", "properties": {"business_name": {"type": "string"}, "state": {"type": "string"}}, "required": ["business_name", "state"]}},
    {"name": "get_business_registry_data", "description": "Cross-reference entity against official SOS business registry.",
     "input_schema": {"type": "object", "properties": {"business_name": {"type": "string"}, "state": {"type": "string"}}, "required": ["business_name", "state"]}},
    {"name": "merge_entity_profile", "description": "Create a merged entity profile from confirmed matches.",
     "input_schema": {"type": "object", "properties": {"primary_entity": {"type": "object"}, "merge_candidates": {"type": "array"}, "confidence": {"type": "number"}}, "required": ["primary_entity", "merge_candidates", "confidence"]}},
]

HANDLERS = {
    "search_filings_by_name": lambda a: search_filings_by_name(a["business_name"], a.get("state"), a.get("match_type", "fuzzy")),
    "fuzzy_match_score": lambda a: fuzzy_match_score(a["entity_a"], a["entity_b"]),
    "get_filing_details": lambda a: get_filing_details(a["business_name"], a["state"]),
    "get_business_registry_data": lambda a: get_business_registry_data(a["business_name"], a["state"]),
    "merge_entity_profile": lambda a: merge_entity_profile(a["primary_entity"], a["merge_candidates"], a["confidence"]),
}

def run_entity_agent(query: str) -> str:
    client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY env var
    messages = [{"role": "user", "content": query}]

    for _ in range(15):  # entity resolution may need many tool calls
        try:
            response = client.messages.create(
                model="claude-sonnet-4-6", max_tokens=4096,
                system=SYSTEM_PROMPT, tools=TOOLS, messages=messages,
            )
        except anthropic.APIError as e:
            return f"Error: {e}"

        if response.stop_reason == "tool_use":
            results = []
            for block in response.content:
                if block.type == "tool_use":
                    handler = HANDLERS.get(block.name)
                    result = handler(block.input) if handler else {"is_error": True, "error_category": "UNKNOWN_TOOL"}
                    results.append({"type": "tool_result", "tool_use_id": block.id, "content": json.dumps(result)})
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": results})
        elif response.stop_reason == "end_turn":
            return " ".join(b.text for b in response.content if hasattr(b, "text"))

    return "Resolution exceeded maximum iterations."

if __name__ == "__main__":
    result = run_entity_agent("Resolve entity: Acme Logistics LLC in Delaware. Check for name variations across states.")
    print(result)
// entity_agent.ts — ReAct Entity Resolution Agent
import Anthropic from "@anthropic-ai/sdk";
import { searchFilingsByName, fuzzyMatchScore, getFilingDetails, getBusinessRegistryData, mergeEntityProfile } from "./entity_tools.js";

const SYSTEM_PROMPT = `You are an Entity Resolution Agent for a commercial data provider.
Given a business entity name, you determine whether UCC filing records
under different name variations refer to the same real-world company.

## Your Reasoning Process (ReAct)
For each resolution request, think step by step:
1. THINK: What do I know? What do I need to find out?
2. ACT: Call the appropriate tool to gather evidence.
3. OBSERVE: What did the tool return? What does it tell me?
4. REPEAT until you have enough evidence to make a decision.

## Decision Criteria
- token_sort_ratio >= 0.90 AND same state AND same address → MERGE (high confidence)
- token_sort_ratio >= 0.80 AND same state → LIKELY MERGE (verify with registry)
- token_sort_ratio >= 0.70 AND different state → INVESTIGATE (check registry)
- token_sort_ratio < 0.70 → DISTINCT ENTITY

## Output Format
After resolution, call merge_entity_profile with your findings.
Include a confidence score (0.0-1.0) based on evidence strength.
Explain your reasoning for each merge/separate decision.

## Rules
- Always check the business registry for ambiguous matches.
- If registry data is unavailable, lower your confidence accordingly.
- Never force a merge — flag conflicts for human review.
- If there are 10+ candidates, filter by state before matching.`;

const TOOLS: Anthropic.Tool[] = [
  {name:"search_filings_by_name",description:"Search filings by name across states",input_schema:{type:"object" as const,properties:{business_name:{type:"string"},state:{type:"string"},match_type:{type:"string",enum:["exact","fuzzy"]}},required:["business_name"]}},
  {name:"fuzzy_match_score",description:"Compute similarity between two entity names",input_schema:{type:"object" as const,properties:{entity_a:{type:"string"},entity_b:{type:"string"}},required:["entity_a","entity_b"]}},
  {name:"get_filing_details",description:"Get full filing details for an entity in a state",input_schema:{type:"object" as const,properties:{business_name:{type:"string"},state:{type:"string"}},required:["business_name","state"]}},
  {name:"get_business_registry_data",description:"Check official SOS registration",input_schema:{type:"object" as const,properties:{business_name:{type:"string"},state:{type:"string"}},required:["business_name","state"]}},
  {name:"merge_entity_profile",description:"Create merged profile from matches",input_schema:{type:"object" as const,properties:{primary_entity:{type:"object"},merge_candidates:{type:"array"},confidence:{type:"number"}},required:["primary_entity","merge_candidates","confidence"]}},
];

type AnyArgs = Record<string, unknown>;
const H: Record<string, (a: AnyArgs) => unknown> = {
  search_filings_by_name: a => searchFilingsByName(a.business_name as string, a.state as string | undefined),
  fuzzy_match_score: a => fuzzyMatchScore(a.entity_a as string, a.entity_b as string),
  get_filing_details: a => getFilingDetails(a.business_name as string, a.state as string),
  get_business_registry_data: a => getBusinessRegistryData(a.business_name as string, a.state as string),
  merge_entity_profile: a => mergeEntityProfile(
    a.primary_entity as {name:string;state:string},
    a.merge_candidates as Array<{name:string;state:string;match_score?:number;filing_count?:number}>,
    a.confidence as number),
};

export async function runEntityAgent(query: string): Promise<string> {
  const client = new Anthropic();
  const messages: Anthropic.MessageParam[] = [{role:"user",content:query}];
  for (let i = 0; i < 15; i++) {
    let resp: Anthropic.Message;
    try {
      resp = await client.messages.create({model:"claude-sonnet-4-6",max_tokens:4096,system:SYSTEM_PROMPT,tools:TOOLS,messages});
    } catch(e) { return `Error: ${e}`; }
    if (resp.stop_reason === "tool_use") {
      const results: Anthropic.ToolResultBlockParam[] = [];
      for (const b of resp.content) {
        if (b.type === "tool_use") {
          const h = H[b.name];
          const r = h ? h(b.input as AnyArgs) : {is_error:true,error_category:"UNKNOWN"};
          results.push({type:"tool_result",tool_use_id:b.id,content:JSON.stringify(r)});
        }
      }
      messages.push({role:"assistant",content:resp.content});
      messages.push({role:"user",content:results});
    } else if (resp.stop_reason === "end_turn") {
      return resp.content.filter((b): b is Anthropic.TextBlock => b.type === "text").map(b => b.text).join(" ");
    }
  }
  return "Resolution exceeded maximum iterations.";
}
Checkpoint: Step 2 Complete

You built a ReAct entity resolution agent that: (1) searches for name variations using fuzzy matching, (2) computes similarity scores between each candidate pair, (3) cross-references official business registries for verification, and (4) merges confirmed matches into a unified profile with a confidence score. The system prompt encodes the decision criteria (what threshold = merge vs. investigate vs. distinct), and the agent adapts its strategy based on intermediate results. The safety cap is 15 iterations because entity resolution can require 8-12 tool calls for complex cases.

Step 3: Test the Entity Resolution Agent

What & Why

What: Run the agent end-to-end to resolve "Acme Logistics LLC" across states and observe the full ReAct reasoning trace.

Why: Testing confirms that the agentic loop correctly routes tool calls, that the agent reasons about fuzzy match scores before deciding to merge or separate, and that the final merged profile includes all expected data.

Run the agent from your terminal:

python entity_agent.py
npx tsx entity_agent.ts

Expected output (your exact wording will vary — the agent reasons in natural language):

I'll resolve the entity "Acme Logistics LLC" by searching for name variations across states. [Agent calls search_filings_by_name, finds 3 candidates] [Agent calls fuzzy_match_score for each candidate pair] [Agent calls get_business_registry_data to verify DE and NY registrations] [Agent calls merge_entity_profile for the confirmed DE matches] Resolution complete: - Canonical name: Acme Logistics LLC - 2 DE entities merged (ACME LOGISTICS, L.L.C. + Acme Logistics Company) — confidence: 0.94 - 1 NY entity (Acme Logistics Inc.) kept separate — different registry, different address - Total filings consolidated: 4 across Delaware - Total lien exposure: $2,300,000
Checkpoint: Step 3 Complete

Your agent successfully resolved entity variations by reasoning through multiple tool calls. It merged the two Delaware entities (high fuzzy match + same registry) and correctly kept the New York entity separate (different address, different registry entry). The confidence score reflects the strength of the evidence gathered.

Verify Everything Works

Run this end-to-end verification to confirm all components work together:

# e2e_test.py — Quick verification script
from entity_tools import (search_filings_by_name, fuzzy_match_score,
    get_filing_details, get_business_registry_data, merge_entity_profile)

# 1. Search works
result = search_filings_by_name("Acme Logistics LLC")
assert not result["is_error"], "Search failed"
assert result["total"] == 3, f"Expected 3 candidates, got {result['total']}"
print("Search:   OK — 3 candidates found")

# 2. Fuzzy matching works
score = fuzzy_match_score("Acme Logistics LLC", "ACME LOGISTICS, L.L.C.")
assert not score["is_error"], "Fuzzy match failed"
assert score["recommendation"] == "likely_match", f"Expected likely_match, got {score['recommendation']}"
print(f"Fuzzy:    OK — {score['recommendation']} (token_sort: {score['scores']['token_sort_ratio']})")

# 3. Registry lookup works
reg = get_business_registry_data("Acme Logistics LLC", "DE")
assert not reg["is_error"], "Registry lookup failed"
print(f"Registry: OK — {reg['entity_name']} in {reg['state']}")

# 4. Merge works
merged = merge_entity_profile(
    {"name": "Acme Logistics LLC", "state": "DE", "filing_count": 3},
    [{"name": "ACME LOGISTICS, L.L.C.", "state": "DE", "match_score": 0.98, "filing_count": 1}],
    confidence=0.94
)
assert not merged["is_error"], "Merge failed"
print(f"Merge:    OK — {merged['canonical_name']}, confidence: {merged['confidence']}")

# 5. Error handling works
empty = search_filings_by_name("Nonexistent Corp XYZ")
assert empty["is_error"], "Expected error for unknown entity"
print(f"Errors:   OK — graceful handling for unknown entities")

print("\nAll checks passed. Run 'python entity_agent.py' for the full agent test.")

Expected output:

Search: OK — 3 candidates found Fuzzy: OK — likely_match (token_sort: 1.0) Registry: OK — Acme Logistics LLC in DE Merge: OK — Acme Logistics LLC, confidence: 0.94 Errors: OK — graceful handling for unknown entities All checks passed. Run 'python entity_agent.py' for the full agent test.

Testing Guide

TypeScenarioExpected Behavior
HappyTwo DE name variants (LLC vs L.L.C.)Merges with high confidence (>0.9), cites same address
HappyThree candidates: 2 match, 1 distinctMerges 2 DE entities, keeps NY entity separate
HappyLegal suffix difference onlyHigh match score, quick merge
HappyEntity with filings in multiple statesCross-references registry to confirm single entity
HappyClean resolution, no conflictsMerged profile with all filings consolidated
EdgeSimilar names, different companies in same stateDistinguishes using address and registry data
EdgeRegistry returns NOT_FOUND for one candidateProceeds with available data, lowers confidence
Edge"possible_match" fuzzy scoreGathers additional evidence before deciding
AdversarialCommon name with 50+ candidatesFilters by state, limits scope, does not attempt all
AdversarialSame name, same state, different file numbersFlags conflict for human review, does not force merge

Troubleshooting

Common issues and how to fix them:

ModuleNotFoundError: No module named 'anthropic'

You have not installed the Anthropic SDK. Run pip install anthropic in your activated virtual environment. If you are using Node.js, run npm install @anthropic-ai/sdk.

AuthenticationError: Invalid API key

Your ANTHROPIC_API_KEY environment variable is not set or contains an invalid key. Verify with echo $ANTHROPIC_API_KEY (Linux/Mac) or echo %ANTHROPIC_API_KEY% (Windows). The key should start with sk-ant-. Re-export it if needed: export ANTHROPIC_API_KEY=sk-ant-...

ImportError: cannot import name 'search_filings_by_name' from 'entity_tools'

Both entity_tools.py and entity_agent.py must be in the same directory. Also check that you saved the tools file as entity_tools.py (underscores, not hyphens) and that it contains all 5 function definitions.

Agent loops without resolving (hits 15 iterations)

If the agent keeps calling tools without reaching a conclusion, check: (1) your MOCK_CANDIDATES data contains entries for the query you are testing, (2) the system prompt decision criteria thresholds are present, and (3) the merge_entity_profile tool is included in the TOOLS list so the agent can finalize its resolution. You can lower the iteration cap from 15 to 10 during debugging to fail faster.

Compliance & Regulatory Notes

Entity Resolution Standards

False positive risk: Merging two distinct entities into one creates incorrect risk profiles. A lender relying on a false merge might deny a loan to a clean company because it was conflated with a heavily-liened entity. Always prefer conservative merges with clear evidence.

FCRA implications: If merged entity profiles are used for credit decisions, accuracy requirements under the Fair Credit Reporting Act apply. Incorrect merges could trigger disputes and regulatory action.

Audit trail: Every merge decision must be logged with the evidence that supported it (match scores, registry data, address comparisons). The merge_log in the output serves this purpose.

Going Further

  • [OPTIONAL] Stretch: SOS validation tool — Add validate_sos_registration to check entity standing (active, dissolved, revoked) for additional merge evidence.
  • Address matching — Add a tool that normalizes and compares addresses, since "456 Commerce Blvd" vs "456 Commerce Boulevard" should match.
  • Multi-agent pipeline — Split into coordinator + subagents: one for filing search, one for fuzzy matching, one for registry verification. This leads to Capstone 4.
  • Confidence calibration — Track merge decisions over time and calibrate confidence scores against human reviewer decisions.
  • Batch resolution — Use the Message Batches API (M25) to resolve thousands of entities at 50% cost reduction.

Knowledge Check

Test your understanding of the entity resolution concepts covered in this capstone.

Q1: What does "ReAct" stand for?

Q2: Why is fuzzy matching important for UCC entity resolution?

Q3: What is the purpose of the confidence score in entity matching?

Q4: The agent finds "ACME LOGISTICS LLC" in Delaware and "Acme Logistics L.L.C." in New York. Should these be merged?

Q5: Why does the agent search multiple states rather than just one?

References & Resources