CC13: Workflows, Agents & Computer Use
The four canonical patterns for orchestrating multi-step Claude work — chaining, parallelization, routing, the agent loop — plus environment inspection and computer use, the GUI-driving capability that lets Claude operate desktop apps directly.
Learning Objectives
- Distinguish workflows (deterministic, predictable) from agents (adaptive, looping).
- Implement the three core workflow patterns: chaining, parallelization, routing.
- Build an agent loop that picks tools, observes outcomes, and stops correctly.
- Use environment inspection so Claude probes its world before deciding.
- Recognize when computer use (screen + mouse + keyboard) is appropriate — and when it isn't.
- Identify which Claude Code features map to which pattern.
Workflows vs Agents — The Most Useful Distinction
A workflow is a recipe: step 1, step 2, step 3, done. Anyone can follow it. The recipe doesn't change based on whether the milk smells funny — you just follow the steps.
An agent is a chef. Given a goal ("make dinner") and a kitchen, they decide what to cook, taste as they go, change course if the cream curdles, and stop when the meal is ready. Adaptive. Looping. Sometimes goes wrong in surprising ways.
Both are useful. Recipes are predictable, cheap, debuggable. Chefs handle novel situations and partial information. Most teams reach for "agent" when "workflow" would be cheaper and more reliable.
A workflow is a deterministic orchestration of one or more LLM calls in a fixed sequence (or DAG). The control flow is in your code. An agent is an LLM running a loop where it picks tools, observes results, and decides what to do next. The control flow is in the model. Workflows are predictable; agents are adaptive.
| Property | Workflow | Agent |
|---|---|---|
| Control flow | Your code | The LLM |
| Steps | Known in advance | Decided at runtime |
| Cost predictability | High | Low (loops can run away) |
| Debugging | Easy — standard tracing | Hard — reasoning is opaque |
| Best for | Defined process, repeatable | Open-ended, partial information |
If you can structure the work as a fixed pipeline, do. You'll have less drama, lower bills, and easier debugging. Reach for an agent only when the steps genuinely can't be enumerated in advance — e.g. open-ended research, multi-file refactors with unknown extent.
Chaining Workflow — Sequential LLM Calls
The simplest workflow: call A's output is call B's input. Use when you can decompose the task into stages, each smaller and more focused than the whole.
| Step | LLM call | Input | Output |
|---|---|---|---|
| 1 | Summarize | The diff | 3-sentence summary |
| 2 | Classify | Summary | {kind: feat|fix|chore, risk: low|med|high} |
| 3 | Route | Classification | Reviewer assignment |
def triage(diff: str) -> dict:
summary = client.messages.create(
model="claude-haiku-4-5-20251001", max_tokens=200, temperature=0,
system="Summarize a diff in 3 sentences.",
messages=[{"role": "user", "content": diff}],
).content[0].text
classify = client.messages.create(
model="claude-haiku-4-5-20251001", max_tokens=100, temperature=0,
tools=CLASSIFY_TOOL,
tool_choice={"type": "tool", "name": "classify"},
messages=[{"role": "user", "content": summary}],
)
cls = next(b.input for b in classify.content if b.type == "tool_use")
return {"summary": summary, **cls}
Each step is smaller, narrower, and easier to evaluate (CC11). Smaller prompts allow cheaper models — the summary can be Haiku even if the final routing needs Sonnet. Chains are easier to debug because every intermediate output is visible.
Parallelization Workflow — Many Calls at Once
Two flavors:
1. Sectioning — one task per piece
Split the work, run each piece in its own call, merge the results.
import asyncio
from anthropic import AsyncAnthropic
aclient = AsyncAnthropic()
async def review_file(path: str, src: str) -> dict:
r = await aclient.messages.create(
model="claude-sonnet-4-6", max_tokens=512, temperature=0,
system="Review this file. Return JSON: {issues: [{line, severity, msg}]}",
messages=[{"role": "user", "content": f"# {path}\n\n{src}"}],
)
return {"file": path, "review": r.content[0].text}
async def review_pr(files: dict[str, str]) -> list[dict]:
return await asyncio.gather(*[review_file(p, s) for p, s in files.items()])
# Sequential: 10 files * 4s = 40s. Parallel: ~5s.
2. Voting — same task, multiple calls, majority wins
Run the same prompt N times (different temperatures or seeds), aggregate. Useful for high-stakes classifications where occasional flips matter.
async def classify_with_vote(text: str, n: int = 5) -> str:
async def one():
r = await aclient.messages.create(
model="claude-haiku-4-5-20251001", max_tokens=8, temperature=0.3,
system="Classify into: spam | ham | unsure", messages=[{"role":"user","content":text}],
)
return r.content[0].text.strip().lower()
votes = await asyncio.gather(*[one() for _ in range(n)])
return max(set(votes), key=votes.count) # majority
Anthropic enforces per-account rate limits. If you fan out 100 parallel calls, you'll get 429s. Use a semaphore (e.g. asyncio.Semaphore(20)) to cap concurrency. For very high volumes, use the Batch API — 50% off, 24h SLA, perfect for offline parallel work.
Routing Workflow — First Call Picks the Path
Use one cheap LLM call to classify the request, then dispatch to a specialized handler. Avoids putting every system prompt in front of every request.
| Class | Handler |
|---|---|
| "explain code" | Sonnet, system: explainer prompt |
| "debug error" | Sonnet, with tools: read file, run tests |
| "refactor" | Opus + extended thinking; multi-file plan |
| "trivia / capital city" | Haiku, no tools |
ROUTE_TOOL = [{
"name": "route",
"description": "Pick the handler for the user's question.",
"input_schema": {
"type": "object",
"properties": {"handler": {"type": "string",
"enum": ["explain", "debug", "refactor", "trivia"]}},
"required": ["handler"],
},
}]
def handle(user_q: str) -> str:
r = client.messages.create(
model="claude-haiku-4-5-20251001", max_tokens=64, temperature=0,
tools=ROUTE_TOOL, tool_choice={"type":"tool","name":"route"},
messages=[{"role":"user","content":user_q}],
)
handler = next(b.input["handler"] for b in r.content if b.type=="tool_use")
return HANDLERS[handler](user_q) # dispatch to the right specialist
Putting all five system prompts in one giant prompt for every request bloats tokens and confuses Claude. A 50-token routing call followed by a focused 4K-token handler call is cheaper, faster, and more accurate. This is exactly how Claude Code's subagent dispatch works under the hood — CC6.
The Agent Loop — When Steps Aren't Predictable
Same loop you saw in CC8, formalized:
def agent(goal: str, tools: list, max_iters: int = 30, max_cost: float = 1.0) -> str:
history = [{"role": "user", "content": goal}]
spent = 0.0
for step in range(max_iters):
r = client.messages.create(
model="claude-sonnet-4-6", max_tokens=4096,
tools=tools, messages=history,
)
history.append({"role": "assistant", "content": r.content})
spent += estimate_cost(r.usage)
if spent > max_cost: return "[BUDGET EXCEEDED]"
if r.stop_reason != "tool_use":
return next(b.text for b in r.content if b.type == "text")
results = [{"type":"tool_result", "tool_use_id":b.id,
"content": str(dispatch(b.name, b.input))}
for b in r.content if b.type == "tool_use"]
history.append({"role": "user", "content": results})
return "[MAX ITERS]"
The four termination conditions
- Natural:
stop_reason == "end_turn"— Claude says "done." - Iteration cap: too many loops — bug in tools or prompt.
- Cost cap: dollar budget exceeded — abort before runaway spend.
- Time cap: wallclock budget exceeded — UX requirement.
An unbounded agent loop is a footgun: a buggy tool returning {"error": "try again"} can spin forever, billing the whole time. Iteration + cost + time caps make every agent loop bounded by something. Pick all three for production.
Environment Inspection
Agents perform better when they look around before acting. Give them tools to inspect the environment — list files, check git status, fetch a URL's status code — and prompt them to use these tools first.
SYSTEM = """You are a debugging agent. Before fixing anything:
1. Use list_files to see what's in the project.
2. Use read_file to inspect any file you'll change.
3. Use git_status to confirm clean working tree.
ONLY THEN propose changes."""
# Tools: list_files, read_file, git_status, write_file, run_tests
An agent without inspection tools either hallucinates the environment or refuses to act. With inspection tools, it grounds itself in reality first. This is exactly the pattern Claude Code itself uses internally — the CLI starts every session with implicit environment awareness (cwd, git status, recent files).
Computer Use — Driving the GUI
Computer use is a Claude capability where the model takes a screenshot of a desktop or browser, decides where to click and what to type, and emits actions like {"action": "left_click", "coordinate": [x, y]} or {"action": "type", "text": "..."}. Your client takes the screenshot, executes the action, takes another screenshot, and feeds it back. Same agent-loop pattern as tool use — tools happen to be GUI primitives.
The three computer-use tools
| Tool | Purpose |
|---|---|
computer_20250124 | Screenshot, click, type, scroll, key combos. |
text_editor_20250124 | Open, view, edit text files (same as in CC8). |
bash_20250124 | Run shell commands. |
Minimal request
resp = client.messages.create(
model="claude-sonnet-4-6", max_tokens=2048,
tools=[
{"type": "computer_20250124", "name": "computer",
"display_width_px": 1920, "display_height_px": 1080, "display_number": 1},
{"type": "text_editor_20250124", "name": "str_replace_editor"},
{"type": "bash_20250124", "name": "bash"},
],
extra_headers={"anthropic-beta": "computer-use-2025-01-24"},
messages=[{"role": "user", "content":
"Open Chrome, search for 'anthropic computer use docs', "
"open the first result, and screenshot the title."}],
)
# Claude emits a sequence of tool_use blocks: take_screenshot, left_click, type, ...
# Your client runs each on the actual desktop and feeds back the next screenshot.
When to reach for computer use
- Yes: automating apps that have no good API (legacy desktop tools, internal tools without integration), QA testing GUIs, browser flows where Selenium is too brittle.
- No: any task with a real API. Computer use is dramatically slower and more error-prone than direct API calls. If the website has an API, use the API.
Claude with computer use can click anywhere, type anywhere, run shell commands. A prompt-injection attack in a webpage can convince it to do unintended things. Always run computer use inside a sandboxed VM or container with no access to credentials, secrets, or the internet beyond what the task requires. Anthropic ships a reference Docker image; use it.
A multi-screen task can take 30+ tool round trips, each with a screenshot (large image input). Costs add up fast. For browser automation specifically, prefer Playwright + a small Claude classification call when possible — cheaper, faster, more reliable.
Mapping to Claude Code
The patterns aren't abstract — you've used them all in Claude Code:
| Pattern | Where in Claude Code |
|---|---|
| Agent loop | The CLI itself. Every session is one agent loop until exit. |
| Routing | Subagent dispatch (CC6) — description matching picks the specialist. |
| Chaining | Slash commands (CC5) — multi-step prompts that run sequentially. |
| Parallelization | Background subagents and worktrees (CC14) — multiple Claude sessions simultaneously. |
| Environment inspection | Built-in: cwd, git status, recent files — CLI surfaces these by default. |
| Computer use | Not a CLI feature; you can wrap it in an MCP server (CC9) if you need GUI work. |
Every "Claude Code feature" you've learned is one of these patterns specialized for development workflows. CLAUDE.md is environment inspection (Claude inspects rules at session start). Subagents are routing + isolation. Hooks are a workflow gate around the agent loop. Once you see the patterns, the CC features compose — and you can build new ones.
Hands-On Lab — Replace an Agent with a Workflow
You'll start with an agent that "reviews a PR" by looping with tools, measure its cost and accuracy, then refactor to a 3-step chain that does the same job. Often the chain wins on both axes — this lab proves it.
Step 1 — The agent baseline
# review_agent.py
TOOLS = [
{"name": "list_files", "description": "List changed files in the PR.",
"input_schema": {"type":"object","properties":{},"required":[]}},
{"name": "read_file", "description": "Read a file.",
"input_schema": {"type":"object","properties":{"path":{"type":"string"}},"required":["path"]}},
{"name": "submit_review", "description": "Submit final review.",
"input_schema": {"type":"object","properties":{
"verdict":{"type":"string","enum":["approve","request_changes"]},
"summary":{"type":"string"}}, "required":["verdict","summary"]}},
]
# Run the loop from CC13's agent() function. Track total tokens + tool calls.
Step 2 — The chain alternative
# review_chain.py
def review_chain(diff: str, files: dict[str,str]) -> dict:
# Step A: Summarize the diff (Haiku)
summary = client.messages.create(
model="claude-haiku-4-5-20251001", max_tokens=300, temperature=0,
system="Summarize a PR diff in 5 bullets max.",
messages=[{"role":"user","content":diff}],
).content[0].text
# Step B: Per-file review in parallel (Sonnet)
async def review_one(path, src):
r = await aclient.messages.create(
model="claude-sonnet-4-6", max_tokens=512, temperature=0,
system="Review one file. Output: severity bullets only.",
messages=[{"role":"user","content":f"# {path}\n\n{src}"}],
)
return r.content[0].text
file_reviews = asyncio.run(asyncio.gather(*[review_one(p,s) for p,s in files.items()]))
# Step C: Verdict (Sonnet, structured tool output)
verdict = client.messages.create(
model="claude-sonnet-4-6", max_tokens=200, temperature=0,
tools=VERDICT_TOOL, tool_choice={"type":"tool","name":"submit_review"},
messages=[{"role":"user","content":
f"<summary>{summary}</summary>\n<reviews>{file_reviews}</reviews>"}],
)
return next(b.input for b in verdict.content if b.type=="tool_use")
Step 3 — Compare on 10 PRs
Run both on a 10-PR test set (CC11 lab). Track:
- Total tokens (input + output) per PR
- Wallclock time per PR
- Verdict accuracy (vs. your hand labels)
metric agent chain
----------------------------------
avg tokens 18,400 9,800
avg time 24s 6s (parallel files)
accuracy 82% 86%
Numbers are illustrative but the shape is real: the chain typically halves cost, beats latency badly, and hits or beats accuracy because each smaller call is more focused.
Step 4 — When NOT to switch
If your task involves "follow the trail until you find the bug" — where step 4's existence depends on step 3's output — you can't replace the agent with a chain. Keep the loop, add caps.
Two implementations of the same task and a comparison sheet. The exercise that pays off most: every time you reach for "agent," ask if you can structure it as a chain first. If yes, you'll likely save half the cost and most of the latency.
Knowledge Check
1. You can decompose your task into 3 fixed steps. Should you build an agent or a workflow?
2. You're fanning out 50 parallel calls and getting 429 errors. Best fix?
3. Your agent loop hangs occasionally. What did you forget?
4. A user asks "what's the weather in Tokyo?" through your routing-based bot. The router should send it to:
5. You want Claude to fill in a form on a legacy desktop app with no API. What's appropriate?
Module Summary
- Workflow vs agent: workflow when steps are knowable; agent when not. Default to workflow.
- Chaining: sequential calls, smaller models per step where possible. Output of A is input of B.
- Parallelization: sectioning (split work) + voting (same work, multiple times, aggregate). Cap concurrency; use Batch API for big offline.
- Routing: cheap classifier picks the specialist. Avoid one-prompt-fits-all bloat.
- Agent loop: tool_use loop with three caps — iteration, cost, time. Always.
- Environment inspection: give the agent tools to look around; prompt it to do so first.
- Computer use: screenshot + click + type. For legacy GUIs and APIs that don't exist. Sandbox in a VM. If the API exists, prefer the API.
- Claude Code maps these patterns: subagents = routing, slash commands = chains, worktrees + background = parallelization, the CLI itself = agent loop.