Capstone 3 — Domain C: Entity Resolution Agent
Build a ReAct agent that reasons through multi-step entity resolution — searching filings, computing fuzzy matches, cross-referencing registries, and merging entity profiles.
Complete M05 (Function Calling), M06 (Multi-Tool Orchestration), M12 (ReAct Agent Loop), and M13 (Planning & Task Decomposition) before starting this capstone. You should be comfortable defining tool schemas, handling tool_use / tool_result message flows, and building agentic loops that check stop_reason.
Project Brief
Business names in UCC filings are notoriously inconsistent. The same company appears as "Acme Logistics LLC", "ACME LOGISTICS, L.L.C.", "Acme Logistics Company", and "Acme Logistics Inc." across different states. A human analyst looking at these four names must decide: are these all the same company, or four different companies?
The pain is scale and subtlety. A commercial data provider processes millions of filings. Manual entity resolution costs $2–5 per entity. At 100,000 entities per month, that is $200K–$500K annually in analyst time. Worse, humans are inconsistent — one analyst merges two entities, another keeps them separate, and the database has conflicting records.
Your agent automates this: given a business name, it searches filings across states, computes fuzzy match scoresNumerical similarity scores between two text strings computed using algorithms like Levenshtein distance, Jaro-Winkler, and token sort ratio. Higher scores indicate more similar strings. Used to determine whether two different-looking business names refer to the same real entity. between candidates, cross-references official business registries, and produces a merged entity profile with a confidence score. The agent must REASON about each match — "ACME LOGISTICS, L.L.C." vs "Acme Logistics LLC" is clearly the same entity, but "Acme Logistics Inc." in a different state needs more investigation.
A ReAct agentAn agent that follows the Reason-Act-Observe loop: it thinks about what to do next (Thought), takes an action by calling a tool (Action), observes the result (Observation), then reasons again. This explicit reasoning chain makes the agent's decision process transparent and debuggable. with 5 tools that implements the Thought-Action-Observation cycle for entity resolution:
search_filings_by_name— find filing candidates across statesfuzzy_match_score— compute similarity between name pairsget_filing_details— retrieve full filing data for comparisonget_business_registry_data— cross-reference official SOS registrationsmerge_entity_profile— create a unified profile from matched entities
The key challenge: the agent must decide at each step what to do next based on what it just learned. If fuzzy match scores are high, proceed to merge. If they are ambiguous, gather more evidence from the registry. If they are low, mark as distinct. This branching logic is what makes it a reasoning agent, not a script.
Environment Setup
You need Python 3.10+ or Node.js 18+ and an Anthropic API key. Run these commands to create your project:
mkdir capstone-3-entity-resolution && cd capstone-3-entity-resolution
python -m venv venv && source venv/bin/activate # Windows: venv\Scripts\Activate.ps1
# Pin dependencies for reproducibility
echo "anthropic>=0.40.0" > requirements.txt
pip install -r requirements.txt
export ANTHROPIC_API_KEY=your-key-here # Windows: set ANTHROPIC_API_KEY=your-key-here
File Structure
capstone-3-entity-resolution/
├── entity_tools.py # Fuzzy matching + mock data tools
├── entity_agent.py # Entity resolution agent with ReAct loop
├── entity_tools.ts # TypeScript tools
├── entity_agent.ts # TypeScript agent
└── requirements.txt # Dependencies
Domain Glossary
ReAct Architecture
The agent follows a Thought → Action → Observation loop. Each cycle, it reasons about what it knows, what it needs, and which tool to call next. Here is a typical resolution trace:
Mock Data Specification
{
"request_id": "ER-2024-0078",
"input_name": "Acme Logistics LLC",
"input_state": "DE",
"candidates": [
{"name": "ACME LOGISTICS, L.L.C.", "state": "DE", "filing_count": 3},
{"name": "Acme Logistics Company", "state": "DE", "filing_count": 1},
{"name": "Acme Logistics Inc.", "state": "NY", "filing_count": 2}
]
}
{
"entity_a": "Acme Logistics LLC",
"entity_b": "ACME LOGISTICS, L.L.C.",
"scores": {
"exact": 0.45,
"normalized": 0.95,
"token_sort_ratio": 0.98
},
"recommendation": "likely_match"
}
// Note: levenshtein and jaro_winkler are stretch metrics — install
// `rapidfuzz` (`pip install rapidfuzz`) and add them to the scores
// dict to enable. The base capstone uses three deterministic scores.
Implementation Phases
- Phase 1 — Mock Tools (60 min): Implement all 5 tools with mock data covering 3+ entities, name variations, and edge cases (registry not found, conflicting data).
- Phase 2 — System Prompt (30 min): Write a ReAct-style system prompt that instructs Claude to externalize reasoning, plan before acting, and adapt strategy based on intermediate results.
- Phase 3 — ReAct Loop (45 min): Implement the agentic loop with
stop_reasonchecking. The loop must handle 5+ tool calls in a single resolution and branch based on fuzzy match results. - Phase 4 — Branching Logic (30 min): Ensure the agent adapts: high confidence → merge directly. Ambiguous → gather more evidence from registry. Low confidence → mark as distinct.
- Phase 5 — Error Recovery (30 min): Handle mid-chain failures: registry unavailable, timeout on filing search. Agent should proceed with partial data and lower confidence.
- Phase 6 — Testing (45 min): Run 10 test cases verifying correct merges, correct separations, and proper handling of ambiguous/adversarial inputs.
Step 1: Create entity_tools.py
What: You will create a file containing 5 mock tool functions that simulate searching UCC filings, computing fuzzy match scores, retrieving filing details, querying business registries, and merging entity profiles.
Why: Mock tools let you develop and test the agent's reasoning logic without calling real APIs. Every tool returns structured data with explicit error handling, so the agent always knows whether a call succeeded or failed and can adapt its strategy accordingly.
Create the file entity_tools.py and paste the following code:
# entity_tools.py — Mock tools for entity resolution
import re
# --- Tool 1: Search filings by name ---
MOCK_CANDIDATES = {
"acme logistics llc": [
{"name": "ACME LOGISTICS, L.L.C.", "state": "DE", "filing_count": 3, "most_recent": "2024-01-15"},
{"name": "Acme Logistics Company", "state": "DE", "filing_count": 1, "most_recent": "2023-08-20"},
{"name": "Acme Logistics Inc.", "state": "NY", "filing_count": 2, "most_recent": "2023-12-01"},
],
"buildright construction": [
{"name": "BuildRight Construction LLC", "state": "NY", "filing_count": 1, "most_recent": "2024-01-10"},
{"name": "Build Right Construction", "state": "NY", "filing_count": 1, "most_recent": "2022-03-15"},
],
}
def search_filings_by_name(business_name: str, state: str | None = None, match_type: str = "fuzzy") -> dict:
try:
key = business_name.lower().strip()
candidates = MOCK_CANDIDATES.get(key, [])
if state:
candidates = [c for c in candidates if c["state"] == state.upper()]
if not candidates:
return {"is_error": True, "error_category": "NO_RESULTS", "is_retryable": False,
"context": f"No filings found for '{business_name}'"}
return {"is_error": False, "candidates": candidates, "total": len(candidates)}
except Exception as e:
return {"is_error": True, "error_category": "INTERNAL_ERROR", "is_retryable": True, "context": str(e)}
# --- Tool 2: Fuzzy match score ---
# NOTE: punctuation is replaced with spaces BEFORE this regex runs, so
# "L.L.C." becomes "l l c" — match that form (with spaces), not "l.l.c."
_SUFFIX_RE = re.compile(
r'\b(llc|l l c|inc|incorporated|corp|corporation|company|co|ltd)\b'
)
def _normalize(name: str) -> str:
name = name.lower().strip()
name = re.sub(r'[,.\-]', ' ', name)
name = re.sub(r'\s+', ' ', name)
# Strip ONLY whole-word suffixes (not substrings — "Coca" must
# not lose "co", "Incorporated" must outrank "corp")
name = _SUFFIX_RE.sub('', name)
name = re.sub(r'\s+', ' ', name).strip()
return name
def fuzzy_match_score(entity_a: str, entity_b: str) -> dict:
try:
if not entity_a or not entity_b:
return {"is_error": True, "error_category": "EMPTY_INPUT", "is_retryable": False, "context": "Both entities required"}
norm_a, norm_b = _normalize(entity_a), _normalize(entity_b)
exact = 1.0 if entity_a.lower() == entity_b.lower() else round(len(set(entity_a.lower()) & set(entity_b.lower())) / max(len(set(entity_a.lower())), 1), 2)
normalized = 1.0 if norm_a == norm_b else round(len(set(norm_a.split()) & set(norm_b.split())) / max(len(set(norm_a.split()) | set(norm_b.split())), 1), 2)
token_sort = round(normalized * 1.03, 2) if normalized > 0.8 else normalized # boost for high overlap
token_sort = min(token_sort, 1.0)
avg = round((exact + normalized + token_sort) / 3, 2)
if avg >= 0.85: rec = "likely_match"
elif avg >= 0.65: rec = "possible_match"
else: rec = "unlikely_match"
return {"is_error": False, "entity_a": entity_a, "entity_b": entity_b,
"scores": {"exact": exact, "normalized": normalized, "token_sort_ratio": token_sort},
"recommendation": rec}
except Exception as e:
return {"is_error": True, "error_category": "INTERNAL_ERROR", "is_retryable": True, "context": str(e)}
# --- Tool 3: Get filing details ---
MOCK_FILINGS = {
("acme logistics, l.l.c.", "DE"): {"filings": [
{"filing_number": "2023-1234567", "secured_party": "First National Bank", "collateral": "All inventory and equipment", "status": "active", "estimated_amount": 750_000},
{"filing_number": "2022-9876543", "secured_party": "Delaware Capital Partners", "collateral": "All vehicles", "status": "active", "estimated_amount": 350_000},
{"filing_number": "2024-0011223", "secured_party": "First National Bank", "collateral": "Accounts receivable", "status": "active", "estimated_amount": 1_200_000},
]},
}
def get_filing_details(business_name: str, state: str) -> dict:
try:
key = (business_name.lower().strip(), state.upper())
result = MOCK_FILINGS.get(key)
if result:
return {"is_error": False, **result}
return {"is_error": False, "filings": []}
except Exception as e:
return {"is_error": True, "error_category": "INTERNAL_ERROR", "is_retryable": True, "context": str(e)}
# --- Tool 4: Business registry ---
MOCK_REGISTRY = {
("acme logistics llc", "DE"): {
"entity_name": "Acme Logistics LLC", "state": "DE", "entity_type": "LLC",
"file_number": "DE-LLC-2019-4567890", "formation_date": "2019-03-15",
"status": "active", "principal_address": "456 Commerce Blvd, Dover, DE 19901",
},
("acme logistics inc.", "NY"): {
"entity_name": "Acme Logistics Inc.", "state": "NY", "entity_type": "Corporation",
"file_number": "NY-CORP-2020-1234567", "formation_date": "2020-07-01",
"status": "active", "principal_address": "100 Broadway, New York, NY 10001",
},
}
def get_business_registry_data(business_name: str, state: str) -> dict:
try:
key = (business_name.lower().strip(), state.upper())
result = MOCK_REGISTRY.get(key)
if result:
return {"is_error": False, **result}
return {"is_error": True, "error_category": "NOT_FOUND", "is_retryable": False,
"context": f"No registry entry for '{business_name}' in {state}"}
except Exception as e:
return {"is_error": True, "error_category": "INTERNAL_ERROR", "is_retryable": True, "context": str(e)}
# --- Tool 5: Merge entity profile ---
def merge_entity_profile(primary_entity: dict, merge_candidates: list, confidence: float) -> dict:
try:
if confidence < 0.5:
return {"is_error": True, "error_category": "INSUFFICIENT_EVIDENCE",
"is_retryable": False, "context": f"Confidence {confidence} below 0.5 threshold"}
total_filings = sum(c.get("filing_count", 0) for c in merge_candidates) + primary_entity.get("filing_count", 0)
total_lien_exposure = sum(c.get("estimated_amount", 0) for c in merge_candidates) + primary_entity.get("estimated_amount", 0)
return {"is_error": False,
"merged_profile_id": f"MP-{primary_entity.get('name', 'unknown')[:10]}-{confidence:.0%}",
"canonical_name": primary_entity.get("name", "Unknown"),
"total_filings": total_filings,
"total_lien_exposure": total_lien_exposure,
"states": list(set([primary_entity.get("state", "")] + [c.get("state", "") for c in merge_candidates])),
"confidence": confidence,
"merge_log": [f"Merged '{c.get('name', '')}' (score: {c.get('match_score', 'N/A')})" for c in merge_candidates]}
except Exception as e:
return {"is_error": True, "error_category": "INTERNAL_ERROR", "is_retryable": True, "context": str(e)}
// entity_tools.ts — Mock tools for entity resolution
interface ToolResult { is_error: boolean; [key: string]: unknown; }
const MOCK_CANDIDATES: Record<string, Array<{name:string;state:string;filing_count:number;most_recent:string}>> = {
"acme logistics llc": [
{name:"ACME LOGISTICS, L.L.C.",state:"DE",filing_count:3,most_recent:"2024-01-15"},
{name:"Acme Logistics Company",state:"DE",filing_count:1,most_recent:"2023-08-20"},
{name:"Acme Logistics Inc.",state:"NY",filing_count:2,most_recent:"2023-12-01"},
],
"buildright construction": [
{name:"BuildRight Construction LLC",state:"NY",filing_count:1,most_recent:"2024-01-10"},
{name:"Build Right Construction",state:"NY",filing_count:1,most_recent:"2022-03-15"},
],
};
export function searchFilingsByName(name: string, state?: string): ToolResult {
try {
let cands = MOCK_CANDIDATES[name.toLowerCase().trim()] ?? [];
if (state) cands = cands.filter(c => c.state === state.toUpperCase());
if (!cands.length) return {is_error:true,error_category:"NO_RESULTS",is_retryable:false,context:`No filings for '${name}'`};
return {is_error:false,candidates:cands,total:cands.length};
} catch(e) { return {is_error:true,error_category:"INTERNAL_ERROR",is_retryable:true,context:String(e)}; }
}
function normalize(n: string): string {
return n.toLowerCase().replace(/[,.\-]/g,' ').replace(/\s+/g,' ')
.replace(/\b(llc|l l c|inc|incorporated|corp|corporation|company|co|ltd)\b/g,'').trim();
}
export function fuzzyMatchScore(a: string, b: string): ToolResult {
try {
if (!a || !b) return {is_error:true,error_category:"EMPTY_INPUT",is_retryable:false,context:"Both entities required"};
const na = normalize(a), nb = normalize(b);
const tokensA = new Set(na.split(' ')), tokensB = new Set(nb.split(' '));
const inter = [...tokensA].filter(t => tokensB.has(t)).length;
const union = new Set([...tokensA,...tokensB]).size;
const normalized = Math.round((inter / Math.max(union,1)) * 100) / 100;
const tokenSort = Math.min(Math.round(normalized * 103) / 100, 1);
const avg = Math.round(((normalized + tokenSort) / 2) * 100) / 100;
const rec = avg >= 0.85 ? "likely_match" : avg >= 0.65 ? "possible_match" : "unlikely_match";
return {is_error:false,entity_a:a,entity_b:b,scores:{normalized,token_sort_ratio:tokenSort},recommendation:rec};
} catch(e) { return {is_error:true,error_category:"INTERNAL_ERROR",is_retryable:true,context:String(e)}; }
}
const MOCK_FILINGS: Record<string, {filings: Array<{filing_number:string;secured_party:string;collateral:string;status:string;estimated_amount:number}>}> = {
"acme logistics, l.l.c.|DE": {filings:[
{filing_number:"2023-1234567",secured_party:"First National Bank",collateral:"All inventory and equipment",status:"active",estimated_amount:750000},
{filing_number:"2022-9876543",secured_party:"Delaware Capital Partners",collateral:"All vehicles",status:"active",estimated_amount:350000},
{filing_number:"2024-0011223",secured_party:"First National Bank",collateral:"Accounts receivable",status:"active",estimated_amount:1200000},
]},
};
export function getFilingDetails(businessName: string, state: string): ToolResult {
try {
const key = `${businessName.toLowerCase().trim()}|${state.toUpperCase()}`;
const r = MOCK_FILINGS[key];
if (r) return {is_error:false,...r};
return {is_error:false,filings:[]};
} catch(e) { return {is_error:true,error_category:"INTERNAL_ERROR",is_retryable:true,context:String(e)}; }
}
const MOCK_REGISTRY: Record<string, object> = {
"acme logistics llc|DE": {entity_name:"Acme Logistics LLC",state:"DE",entity_type:"LLC",file_number:"DE-LLC-2019-4567890",status:"active",principal_address:"456 Commerce Blvd, Dover, DE 19901"},
"acme logistics inc.|NY": {entity_name:"Acme Logistics Inc.",state:"NY",entity_type:"Corporation",file_number:"NY-CORP-2020-1234567",status:"active",principal_address:"100 Broadway, New York, NY 10001"},
};
export function getBusinessRegistryData(name: string, state: string): ToolResult {
try {
const r = MOCK_REGISTRY[`${name.toLowerCase().trim()}|${state.toUpperCase()}`];
if (r) return {is_error:false,...r};
return {is_error:true,error_category:"NOT_FOUND",is_retryable:false,context:`No registry for '${name}' in ${state}`};
} catch(e) { return {is_error:true,error_category:"INTERNAL_ERROR",is_retryable:true,context:String(e)}; }
}
export function mergeEntityProfile(primary: {name:string;state:string;filing_count?:number;estimated_amount?:number}, candidates: Array<{name:string;state:string;match_score?:number;filing_count?:number;estimated_amount?:number}>, confidence: number): ToolResult {
try {
if (confidence < 0.5) return {is_error:true,error_category:"INSUFFICIENT_EVIDENCE",is_retryable:false,context:`Confidence ${confidence} below threshold`};
const total = (primary.filing_count ?? 0) + candidates.reduce((s,c) => s + (c.filing_count ?? 0), 0);
const totalLienExposure = (primary.estimated_amount ?? 0) + candidates.reduce((s,c) => s + (c.estimated_amount ?? 0), 0);
return {is_error:false,canonical_name:primary.name,total_filings:total,total_lien_exposure:totalLienExposure,confidence,
states:[...new Set([primary.state,...candidates.map(c=>c.state)])],
merge_log:candidates.map(c => `Merged '${c.name}' (score: ${c.match_score ?? 'N/A'})`)};
} catch(e) { return {is_error:true,error_category:"INTERNAL_ERROR",is_retryable:true,context:String(e)}; }
}
Quick verify — run this in your terminal:
python -c "from entity_tools import search_filings_by_name; print(search_filings_by_name('Acme Logistics LLC'))"
Expected output:
You built 5 interconnected tools. The agent will call them in sequence based on its reasoning: first search_filings_by_name to find candidates, then fuzzy_match_score to compare each pair, then get_business_registry_data to verify matches with official records, and finally merge_entity_profile to create the unified profile. The fuzzy matching uses a simplified name normalization that strips legal suffixes (LLC, Inc, Corp) before comparing — because "Acme Logistics LLC" and "Acme Logistics Inc." are the same business name with different entity type designations.
If you see ModuleNotFoundError: No module named 'entity_tools' when testing the import, make sure you saved the file as entity_tools.py (not entity-tools.py) and your terminal is in the same directory as the file.
Step 2: Create entity_agent.py
What: You will create the ReAct agent that defines tool schemas for Claude, writes a system prompt encoding entity resolution decision criteria, and implements the agentic loop that routes tool calls to your mock functions.
Why: This is the core of the capstone — the agent must reason about which tool to call next based on intermediate results. High fuzzy scores lead directly to merge; ambiguous scores trigger registry lookups for more evidence; low scores mark entities as distinct. The loop caps at 15 iterations to prevent runaway resolution chains.
Create the file entity_agent.py and paste the following code:
# entity_agent.py — ReAct Entity Resolution Agent
import anthropic, json
from entity_tools import (search_filings_by_name, fuzzy_match_score,
get_filing_details, get_business_registry_data, merge_entity_profile)
SYSTEM_PROMPT = """You are an Entity Resolution Agent for a commercial data provider.
Given a business entity name, you determine whether UCC filing records
under different name variations refer to the same real-world company.
## Your Reasoning Process (ReAct)
For each resolution request, think step by step:
1. THINK: What do I know? What do I need to find out?
2. ACT: Call the appropriate tool to gather evidence.
3. OBSERVE: What did the tool return? What does it tell me?
4. REPEAT until you have enough evidence to make a decision.
## Decision Criteria
- token_sort_ratio >= 0.90 AND same state AND same address → MERGE (high confidence)
- token_sort_ratio >= 0.80 AND same state → LIKELY MERGE (verify with registry)
- token_sort_ratio >= 0.70 AND different state → INVESTIGATE (check registry)
- token_sort_ratio < 0.70 → DISTINCT ENTITY
## Output Format
After resolution, call merge_entity_profile with your findings.
Include a confidence score (0.0-1.0) based on evidence strength.
Explain your reasoning for each merge/separate decision.
## Rules
- Always check the business registry for ambiguous matches.
- If registry data is unavailable, lower your confidence accordingly.
- Never force a merge — flag conflicts for human review.
- If there are 10+ candidates, filter by state before matching."""
TOOLS = [
{"name": "search_filings_by_name", "description": "Search UCC filings by business name across states. Returns candidate entities with filing counts.",
"input_schema": {"type": "object", "properties": {"business_name": {"type": "string"}, "state": {"type": "string"}, "match_type": {"type": "string", "enum": ["exact", "fuzzy"]}}, "required": ["business_name"]}},
{"name": "fuzzy_match_score", "description": "Compute similarity scores between two entity names using multiple algorithms.",
"input_schema": {"type": "object", "properties": {"entity_a": {"type": "string"}, "entity_b": {"type": "string"}}, "required": ["entity_a", "entity_b"]}},
{"name": "get_filing_details", "description": "Get full filing details for an entity in a state.",
"input_schema": {"type": "object", "properties": {"business_name": {"type": "string"}, "state": {"type": "string"}}, "required": ["business_name", "state"]}},
{"name": "get_business_registry_data", "description": "Cross-reference entity against official SOS business registry.",
"input_schema": {"type": "object", "properties": {"business_name": {"type": "string"}, "state": {"type": "string"}}, "required": ["business_name", "state"]}},
{"name": "merge_entity_profile", "description": "Create a merged entity profile from confirmed matches.",
"input_schema": {"type": "object", "properties": {"primary_entity": {"type": "object"}, "merge_candidates": {"type": "array"}, "confidence": {"type": "number"}}, "required": ["primary_entity", "merge_candidates", "confidence"]}},
]
HANDLERS = {
"search_filings_by_name": lambda a: search_filings_by_name(a["business_name"], a.get("state"), a.get("match_type", "fuzzy")),
"fuzzy_match_score": lambda a: fuzzy_match_score(a["entity_a"], a["entity_b"]),
"get_filing_details": lambda a: get_filing_details(a["business_name"], a["state"]),
"get_business_registry_data": lambda a: get_business_registry_data(a["business_name"], a["state"]),
"merge_entity_profile": lambda a: merge_entity_profile(a["primary_entity"], a["merge_candidates"], a["confidence"]),
}
def run_entity_agent(query: str) -> str:
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY env var
messages = [{"role": "user", "content": query}]
for _ in range(15): # entity resolution may need many tool calls
try:
response = client.messages.create(
model="claude-sonnet-4-6", max_tokens=4096,
system=SYSTEM_PROMPT, tools=TOOLS, messages=messages,
)
except anthropic.APIError as e:
return f"Error: {e}"
if response.stop_reason == "tool_use":
results = []
for block in response.content:
if block.type == "tool_use":
handler = HANDLERS.get(block.name)
result = handler(block.input) if handler else {"is_error": True, "error_category": "UNKNOWN_TOOL"}
results.append({"type": "tool_result", "tool_use_id": block.id, "content": json.dumps(result)})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": results})
elif response.stop_reason == "end_turn":
return " ".join(b.text for b in response.content if hasattr(b, "text"))
return "Resolution exceeded maximum iterations."
if __name__ == "__main__":
result = run_entity_agent("Resolve entity: Acme Logistics LLC in Delaware. Check for name variations across states.")
print(result)
// entity_agent.ts — ReAct Entity Resolution Agent
import Anthropic from "@anthropic-ai/sdk";
import { searchFilingsByName, fuzzyMatchScore, getFilingDetails, getBusinessRegistryData, mergeEntityProfile } from "./entity_tools.js";
const SYSTEM_PROMPT = `You are an Entity Resolution Agent for a commercial data provider.
Given a business entity name, you determine whether UCC filing records
under different name variations refer to the same real-world company.
## Your Reasoning Process (ReAct)
For each resolution request, think step by step:
1. THINK: What do I know? What do I need to find out?
2. ACT: Call the appropriate tool to gather evidence.
3. OBSERVE: What did the tool return? What does it tell me?
4. REPEAT until you have enough evidence to make a decision.
## Decision Criteria
- token_sort_ratio >= 0.90 AND same state AND same address → MERGE (high confidence)
- token_sort_ratio >= 0.80 AND same state → LIKELY MERGE (verify with registry)
- token_sort_ratio >= 0.70 AND different state → INVESTIGATE (check registry)
- token_sort_ratio < 0.70 → DISTINCT ENTITY
## Output Format
After resolution, call merge_entity_profile with your findings.
Include a confidence score (0.0-1.0) based on evidence strength.
Explain your reasoning for each merge/separate decision.
## Rules
- Always check the business registry for ambiguous matches.
- If registry data is unavailable, lower your confidence accordingly.
- Never force a merge — flag conflicts for human review.
- If there are 10+ candidates, filter by state before matching.`;
const TOOLS: Anthropic.Tool[] = [
{name:"search_filings_by_name",description:"Search filings by name across states",input_schema:{type:"object" as const,properties:{business_name:{type:"string"},state:{type:"string"},match_type:{type:"string",enum:["exact","fuzzy"]}},required:["business_name"]}},
{name:"fuzzy_match_score",description:"Compute similarity between two entity names",input_schema:{type:"object" as const,properties:{entity_a:{type:"string"},entity_b:{type:"string"}},required:["entity_a","entity_b"]}},
{name:"get_filing_details",description:"Get full filing details for an entity in a state",input_schema:{type:"object" as const,properties:{business_name:{type:"string"},state:{type:"string"}},required:["business_name","state"]}},
{name:"get_business_registry_data",description:"Check official SOS registration",input_schema:{type:"object" as const,properties:{business_name:{type:"string"},state:{type:"string"}},required:["business_name","state"]}},
{name:"merge_entity_profile",description:"Create merged profile from matches",input_schema:{type:"object" as const,properties:{primary_entity:{type:"object"},merge_candidates:{type:"array"},confidence:{type:"number"}},required:["primary_entity","merge_candidates","confidence"]}},
];
type AnyArgs = Record<string, unknown>;
const H: Record<string, (a: AnyArgs) => unknown> = {
search_filings_by_name: a => searchFilingsByName(a.business_name as string, a.state as string | undefined),
fuzzy_match_score: a => fuzzyMatchScore(a.entity_a as string, a.entity_b as string),
get_filing_details: a => getFilingDetails(a.business_name as string, a.state as string),
get_business_registry_data: a => getBusinessRegistryData(a.business_name as string, a.state as string),
merge_entity_profile: a => mergeEntityProfile(
a.primary_entity as {name:string;state:string},
a.merge_candidates as Array<{name:string;state:string;match_score?:number;filing_count?:number}>,
a.confidence as number),
};
export async function runEntityAgent(query: string): Promise<string> {
const client = new Anthropic();
const messages: Anthropic.MessageParam[] = [{role:"user",content:query}];
for (let i = 0; i < 15; i++) {
let resp: Anthropic.Message;
try {
resp = await client.messages.create({model:"claude-sonnet-4-6",max_tokens:4096,system:SYSTEM_PROMPT,tools:TOOLS,messages});
} catch(e) { return `Error: ${e}`; }
if (resp.stop_reason === "tool_use") {
const results: Anthropic.ToolResultBlockParam[] = [];
for (const b of resp.content) {
if (b.type === "tool_use") {
const h = H[b.name];
const r = h ? h(b.input as AnyArgs) : {is_error:true,error_category:"UNKNOWN"};
results.push({type:"tool_result",tool_use_id:b.id,content:JSON.stringify(r)});
}
}
messages.push({role:"assistant",content:resp.content});
messages.push({role:"user",content:results});
} else if (resp.stop_reason === "end_turn") {
return resp.content.filter((b): b is Anthropic.TextBlock => b.type === "text").map(b => b.text).join(" ");
}
}
return "Resolution exceeded maximum iterations.";
}
You built a ReAct entity resolution agent that: (1) searches for name variations using fuzzy matching, (2) computes similarity scores between each candidate pair, (3) cross-references official business registries for verification, and (4) merges confirmed matches into a unified profile with a confidence score. The system prompt encodes the decision criteria (what threshold = merge vs. investigate vs. distinct), and the agent adapts its strategy based on intermediate results. The safety cap is 15 iterations because entity resolution can require 8-12 tool calls for complex cases.
Step 3: Test the Entity Resolution Agent
What: Run the agent end-to-end to resolve "Acme Logistics LLC" across states and observe the full ReAct reasoning trace.
Why: Testing confirms that the agentic loop correctly routes tool calls, that the agent reasons about fuzzy match scores before deciding to merge or separate, and that the final merged profile includes all expected data.
Run the agent from your terminal:
python entity_agent.py
npx tsx entity_agent.ts
Expected output (your exact wording will vary — the agent reasons in natural language):
Your agent successfully resolved entity variations by reasoning through multiple tool calls. It merged the two Delaware entities (high fuzzy match + same registry) and correctly kept the New York entity separate (different address, different registry entry). The confidence score reflects the strength of the evidence gathered.
Verify Everything Works
Run this end-to-end verification to confirm all components work together:
# e2e_test.py — Quick verification script
from entity_tools import (search_filings_by_name, fuzzy_match_score,
get_filing_details, get_business_registry_data, merge_entity_profile)
# 1. Search works
result = search_filings_by_name("Acme Logistics LLC")
assert not result["is_error"], "Search failed"
assert result["total"] == 3, f"Expected 3 candidates, got {result['total']}"
print("Search: OK — 3 candidates found")
# 2. Fuzzy matching works
score = fuzzy_match_score("Acme Logistics LLC", "ACME LOGISTICS, L.L.C.")
assert not score["is_error"], "Fuzzy match failed"
assert score["recommendation"] == "likely_match", f"Expected likely_match, got {score['recommendation']}"
print(f"Fuzzy: OK — {score['recommendation']} (token_sort: {score['scores']['token_sort_ratio']})")
# 3. Registry lookup works
reg = get_business_registry_data("Acme Logistics LLC", "DE")
assert not reg["is_error"], "Registry lookup failed"
print(f"Registry: OK — {reg['entity_name']} in {reg['state']}")
# 4. Merge works
merged = merge_entity_profile(
{"name": "Acme Logistics LLC", "state": "DE", "filing_count": 3},
[{"name": "ACME LOGISTICS, L.L.C.", "state": "DE", "match_score": 0.98, "filing_count": 1}],
confidence=0.94
)
assert not merged["is_error"], "Merge failed"
print(f"Merge: OK — {merged['canonical_name']}, confidence: {merged['confidence']}")
# 5. Error handling works
empty = search_filings_by_name("Nonexistent Corp XYZ")
assert empty["is_error"], "Expected error for unknown entity"
print(f"Errors: OK — graceful handling for unknown entities")
print("\nAll checks passed. Run 'python entity_agent.py' for the full agent test.")
Expected output:
Testing Guide
| Type | Scenario | Expected Behavior |
|---|---|---|
| Happy | Two DE name variants (LLC vs L.L.C.) | Merges with high confidence (>0.9), cites same address |
| Happy | Three candidates: 2 match, 1 distinct | Merges 2 DE entities, keeps NY entity separate |
| Happy | Legal suffix difference only | High match score, quick merge |
| Happy | Entity with filings in multiple states | Cross-references registry to confirm single entity |
| Happy | Clean resolution, no conflicts | Merged profile with all filings consolidated |
| Edge | Similar names, different companies in same state | Distinguishes using address and registry data |
| Edge | Registry returns NOT_FOUND for one candidate | Proceeds with available data, lowers confidence |
| Edge | "possible_match" fuzzy score | Gathers additional evidence before deciding |
| Adversarial | Common name with 50+ candidates | Filters by state, limits scope, does not attempt all |
| Adversarial | Same name, same state, different file numbers | Flags conflict for human review, does not force merge |
Troubleshooting
Common issues and how to fix them:
ModuleNotFoundError: No module named 'anthropic'
You have not installed the Anthropic SDK. Run pip install anthropic in your activated virtual environment. If you are using Node.js, run npm install @anthropic-ai/sdk.
AuthenticationError: Invalid API key
Your ANTHROPIC_API_KEY environment variable is not set or contains an invalid key. Verify with echo $ANTHROPIC_API_KEY (Linux/Mac) or echo %ANTHROPIC_API_KEY% (Windows). The key should start with sk-ant-. Re-export it if needed: export ANTHROPIC_API_KEY=sk-ant-...
ImportError: cannot import name 'search_filings_by_name' from 'entity_tools'
Both entity_tools.py and entity_agent.py must be in the same directory. Also check that you saved the tools file as entity_tools.py (underscores, not hyphens) and that it contains all 5 function definitions.
Agent loops without resolving (hits 15 iterations)
If the agent keeps calling tools without reaching a conclusion, check: (1) your MOCK_CANDIDATES data contains entries for the query you are testing, (2) the system prompt decision criteria thresholds are present, and (3) the merge_entity_profile tool is included in the TOOLS list so the agent can finalize its resolution. You can lower the iteration cap from 15 to 10 during debugging to fail faster.
Compliance & Regulatory Notes
False positive risk: Merging two distinct entities into one creates incorrect risk profiles. A lender relying on a false merge might deny a loan to a clean company because it was conflated with a heavily-liened entity. Always prefer conservative merges with clear evidence.
FCRA implications: If merged entity profiles are used for credit decisions, accuracy requirements under the Fair Credit Reporting Act apply. Incorrect merges could trigger disputes and regulatory action.
Audit trail: Every merge decision must be logged with the evidence that supported it (match scores, registry data, address comparisons). The merge_log in the output serves this purpose.
Going Further
- [OPTIONAL] Stretch: SOS validation tool — Add
validate_sos_registrationto check entity standing (active, dissolved, revoked) for additional merge evidence. - Address matching — Add a tool that normalizes and compares addresses, since "456 Commerce Blvd" vs "456 Commerce Boulevard" should match.
- Multi-agent pipeline — Split into coordinator + subagents: one for filing search, one for fuzzy matching, one for registry verification. This leads to Capstone 4.
- Confidence calibration — Track merge decisions over time and calibrate confidence scores against human reviewer decisions.
- Batch resolution — Use the Message Batches API (M25) to resolve thousands of entities at 50% cost reduction.
Knowledge Check
Test your understanding of the entity resolution concepts covered in this capstone.
Q1: What does "ReAct" stand for?
Q2: Why is fuzzy matching important for UCC entity resolution?
Q3: What is the purpose of the confidence score in entity matching?
Q4: The agent finds "ACME LOGISTICS LLC" in Delaware and "Acme Logistics L.L.C." in New York. Should these be merged?
Q5: Why does the agent search multiple states rather than just one?
References & Resources
- Claude Tool Use Documentation — Multi-tool orchestration
- Claude Model Overview — Model capabilities for reasoning tasks
- Anthropic Cookbook — ReAct agent examples
- UCC Article 9 (Cornell Law) — Legal reference for secured transactions