CC13: Workflows, Agents & Computer Use

Learning Objectives

Distinguish workflows (deterministic, predictable) from agents (adaptive, looping).
Implement the three core workflow patterns: chaining, parallelization, routing.
Build an agent loop that picks tools, observes outcomes, and stops correctly.
Use environment inspection so Claude probes its world before deciding.
Recognize when computer use (screen + mouse + keyboard) is appropriate — and when it isn't.
Identify which Claude Code features map to which pattern.

CC8 was one tool call. CC13 is what happens when the task is bigger than one call — and when to use which pattern instead of throwing everything at "an agent."

Workflows vs Agents — The Most Useful Distinction

Everyday Analogy

A workflow is a recipe: step 1, step 2, step 3, done. Anyone can follow it. The recipe doesn't change based on whether the milk smells funny — you just follow the steps.

An agent is a chef. Given a goal ("make dinner") and a kitchen, they decide what to cook, taste as they go, change course if the cream curdles, and stop when the meal is ready. Adaptive. Looping. Sometimes goes wrong in surprising ways.

Both are useful. Recipes are predictable, cheap, debuggable. Chefs handle novel situations and partial information. Most teams reach for "agent" when "workflow" would be cheaper and more reliable.

Technical Definition

A workflow is a deterministic orchestration of one or more LLM calls in a fixed sequence (or DAG). The control flow is in your code. An agent is an LLM running a loop where it picks tools, observes results, and decides what to do next. The control flow is in the model. Workflows are predictable; agents are adaptive.

Property	Workflow	Agent
Control flow	Your code	The LLM
Steps	Known in advance	Decided at runtime
Cost predictability	High	Low (loops can run away)
Debugging	Easy — standard tracing	Hard — reasoning is opaque
Best for	Defined process, repeatable	Open-ended, partial information

Default to workflow

If you can structure the work as a fixed pipeline, do. You'll have less drama, lower bills, and easier debugging. Reach for an agent only when the steps genuinely can't be enumerated in advance — e.g. open-ended research, multi-file refactors with unknown extent.

Chaining Workflow — Sequential LLM Calls

The simplest workflow: call A's output is call B's input. Use when you can decompose the task into stages, each smaller and more focused than the whole.

Chaining example: PR triage

Step	LLM call	Input	Output
1	Summarize	The diff	3-sentence summary
2	Classify	Summary	{kind: feat\|fix\|chore, risk: low\|med\|high}
3	Route	Classification	Reviewer assignment

def triage(diff: str) -> dict:
    summary = client.messages.create(
        model="claude-haiku-4-5-20251001", max_tokens=200, temperature=0,
        system="Summarize a diff in 3 sentences.",
        messages=[{"role": "user", "content": diff}],
    ).content[0].text

    classify = client.messages.create(
        model="claude-haiku-4-5-20251001", max_tokens=100, temperature=0,
        tools=CLASSIFY_TOOL,
        tool_choice={"type": "tool", "name": "classify"},
        messages=[{"role": "user", "content": summary}],
    )
    cls = next(b.input for b in classify.content if b.type == "tool_use")

    return {"summary": summary, **cls}

Why chain instead of one big prompt

Each step is smaller, narrower, and easier to evaluate (CC11). Smaller prompts allow cheaper models — the summary can be Haiku even if the final routing needs Sonnet. Chains are easier to debug because every intermediate output is visible.

Parallelization Workflow — Many Calls at Once

Two flavors:

1. Sectioning — one task per piece

Split the work, run each piece in its own call, merge the results.

import asyncio
from anthropic import AsyncAnthropic

aclient = AsyncAnthropic()

async def review_file(path: str, src: str) -> dict:
    r = await aclient.messages.create(
        model="claude-sonnet-4-6", max_tokens=512, temperature=0,
        system="Review this file. Return JSON: {issues: [{line, severity, msg}]}",
        messages=[{"role": "user", "content": f"# {path}\n\n{src}"}],
    )
    return {"file": path, "review": r.content[0].text}

async def review_pr(files: dict[str, str]) -> list[dict]:
    return await asyncio.gather(*[review_file(p, s) for p, s in files.items()])

# Sequential: 10 files * 4s = 40s. Parallel: ~5s.

2. Voting — same task, multiple calls, majority wins

Run the same prompt N times (different temperatures or seeds), aggregate. Useful for high-stakes classifications where occasional flips matter.

async def classify_with_vote(text: str, n: int = 5) -> str:
    async def one():
        r = await aclient.messages.create(
            model="claude-haiku-4-5-20251001", max_tokens=8, temperature=0.3,
            system="Classify into: spam | ham | unsure", messages=[{"role":"user","content":text}],
        )
        return r.content[0].text.strip().lower()
    votes = await asyncio.gather(*[one() for _ in range(n)])
    return max(set(votes), key=votes.count)   # majority

Parallel limits

Anthropic enforces per-account rate limits. If you fan out 100 parallel calls, you'll get 429s. Use a semaphore (e.g. asyncio.Semaphore(20)) to cap concurrency. For very high volumes, use the Batch API — 50% off, 24h SLA, perfect for offline parallel work.

Routing Workflow — First Call Picks the Path

Use one cheap LLM call to classify the request, then dispatch to a specialized handler. Avoids putting every system prompt in front of every request.

Routing for a code-help bot

Class	Handler
"explain code"	Sonnet, system: explainer prompt
"debug error"	Sonnet, with tools: read file, run tests
"refactor"	Opus + extended thinking; multi-file plan
"trivia / capital city"	Haiku, no tools

ROUTE_TOOL = [{
    "name": "route",
    "description": "Pick the handler for the user's question.",
    "input_schema": {
        "type": "object",
        "properties": {"handler": {"type": "string",
            "enum": ["explain", "debug", "refactor", "trivia"]}},
        "required": ["handler"],
    },
}]

def handle(user_q: str) -> str:
    r = client.messages.create(
        model="claude-haiku-4-5-20251001", max_tokens=64, temperature=0,
        tools=ROUTE_TOOL, tool_choice={"type":"tool","name":"route"},
        messages=[{"role":"user","content":user_q}],
    )
    handler = next(b.input["handler"] for b in r.content if b.type=="tool_use")
    return HANDLERS[handler](user_q)   # dispatch to the right specialist

Why routing

Putting all five system prompts in one giant prompt for every request bloats tokens and confuses Claude. A 50-token routing call followed by a focused 4K-token handler call is cheaper, faster, and more accurate. This is exactly how Claude Code's subagent dispatch works under the hood — CC6.

The Agent Loop — When Steps Aren't Predictable

Same loop you saw in CC8, formalized:

def agent(goal: str, tools: list, max_iters: int = 30, max_cost: float = 1.0) -> str:
    history = [{"role": "user", "content": goal}]
    spent = 0.0
    for step in range(max_iters):
        r = client.messages.create(
            model="claude-sonnet-4-6", max_tokens=4096,
            tools=tools, messages=history,
        )
        history.append({"role": "assistant", "content": r.content})
        spent += estimate_cost(r.usage)
        if spent > max_cost: return "[BUDGET EXCEEDED]"
        if r.stop_reason != "tool_use":
            return next(b.text for b in r.content if b.type == "text")
        results = [{"type":"tool_result", "tool_use_id":b.id,
                    "content": str(dispatch(b.name, b.input))}
                   for b in r.content if b.type == "tool_use"]
        history.append({"role": "user", "content": results})
    return "[MAX ITERS]"

The four termination conditions

Natural: stop_reason == "end_turn" — Claude says "done."
Iteration cap: too many loops — bug in tools or prompt.
Cost cap: dollar budget exceeded — abort before runaway spend.
Time cap: wallclock budget exceeded — UX requirement.

Agents need three caps. Always.

An unbounded agent loop is a footgun: a buggy tool returning {"error": "try again"} can spin forever, billing the whole time. Iteration + cost + time caps make every agent loop bounded by something. Pick all three for production.

Environment Inspection

Agents perform better when they look around before acting. Give them tools to inspect the environment — list files, check git status, fetch a URL's status code — and prompt them to use these tools first.

SYSTEM = """You are a debugging agent. Before fixing anything:
1. Use list_files to see what's in the project.
2. Use read_file to inspect any file you'll change.
3. Use git_status to confirm clean working tree.
ONLY THEN propose changes."""

# Tools: list_files, read_file, git_status, write_file, run_tests

Why it works

An agent without inspection tools either hallucinates the environment or refuses to act. With inspection tools, it grounds itself in reality first. This is exactly the pattern Claude Code itself uses internally — the CLI starts every session with implicit environment awareness (cwd, git status, recent files).

Computer Use — Driving the GUI

Technical Definition

Computer use is a Claude capability where the model takes a screenshot of a desktop or browser, decides where to click and what to type, and emits actions like {"action": "left_click", "coordinate": [x, y]} or {"action": "type", "text": "..."}. Your client takes the screenshot, executes the action, takes another screenshot, and feeds it back. Same agent-loop pattern as tool use — tools happen to be GUI primitives.

The three computer-use tools

Tool	Purpose
`computer_20250124`	Screenshot, click, type, scroll, key combos.
`text_editor_20250124`	Open, view, edit text files (same as in CC8).
`bash_20250124`	Run shell commands.

Minimal request

resp = client.messages.create(
    model="claude-sonnet-4-6", max_tokens=2048,
    tools=[
        {"type": "computer_20250124", "name": "computer",
         "display_width_px": 1920, "display_height_px": 1080, "display_number": 1},
        {"type": "text_editor_20250124", "name": "str_replace_editor"},
        {"type": "bash_20250124", "name": "bash"},
    ],
    extra_headers={"anthropic-beta": "computer-use-2025-01-24"},
    messages=[{"role": "user", "content":
        "Open Chrome, search for 'anthropic computer use docs', "
        "open the first result, and screenshot the title."}],
)
# Claude emits a sequence of tool_use blocks: take_screenshot, left_click, type, ...
# Your client runs each on the actual desktop and feeds back the next screenshot.

When to reach for computer use

Yes: automating apps that have no good API (legacy desktop tools, internal tools without integration), QA testing GUIs, browser flows where Selenium is too brittle.
No: any task with a real API. Computer use is dramatically slower and more error-prone than direct API calls. If the website has an API, use the API.

Security: run it in a VM

Claude with computer use can click anywhere, type anywhere, run shell commands. A prompt-injection attack in a webpage can convince it to do unintended things. Always run computer use inside a sandboxed VM or container with no access to credentials, secrets, or the internet beyond what the task requires. Anthropic ships a reference Docker image; use it.

Latency & cost reality

A multi-screen task can take 30+ tool round trips, each with a screenshot (large image input). Costs add up fast. For browser automation specifically, prefer Playwright + a small Claude classification call when possible — cheaper, faster, more reliable.

Mapping to Claude Code

The patterns aren't abstract — you've used them all in Claude Code:

Pattern	Where in Claude Code
Agent loop	The CLI itself. Every session is one agent loop until exit.
Routing	Subagent dispatch (CC6) — description matching picks the specialist.
Chaining	Slash commands (CC5) — multi-step prompts that run sequentially.
Parallelization	Background subagents and worktrees (CC14) — multiple Claude sessions simultaneously.
Environment inspection	Built-in: cwd, git status, recent files — CLI surfaces these by default.
Computer use	Not a CLI feature; you can wrap it in an MCP server (CC9) if you need GUI work.

The unifying insight

Every "Claude Code feature" you've learned is one of these patterns specialized for development workflows. CLAUDE.md is environment inspection (Claude inspects rules at session start). Subagents are routing + isolation. Hooks are a workflow gate around the agent loop. Once you see the patterns, the CC features compose — and you can build new ones.

Hands-On Lab — Replace an Agent with a Workflow

You'll start with an agent that "reviews a PR" by looping with tools, measure its cost and accuracy, then refactor to a 3-step chain that does the same job. Often the chain wins on both axes — this lab proves it.

Step 1 — The agent baseline

# review_agent.py
TOOLS = [
    {"name": "list_files", "description": "List changed files in the PR.",
     "input_schema": {"type":"object","properties":{},"required":[]}},
    {"name": "read_file", "description": "Read a file.",
     "input_schema": {"type":"object","properties":{"path":{"type":"string"}},"required":["path"]}},
    {"name": "submit_review", "description": "Submit final review.",
     "input_schema": {"type":"object","properties":{
         "verdict":{"type":"string","enum":["approve","request_changes"]},
         "summary":{"type":"string"}}, "required":["verdict","summary"]}},
]
# Run the loop from CC13's agent() function. Track total tokens + tool calls.

Step 2 — The chain alternative

# review_chain.py
def review_chain(diff: str, files: dict[str,str]) -> dict:
    # Step A: Summarize the diff (Haiku)
    summary = client.messages.create(
        model="claude-haiku-4-5-20251001", max_tokens=300, temperature=0,
        system="Summarize a PR diff in 5 bullets max.",
        messages=[{"role":"user","content":diff}],
    ).content[0].text

    # Step B: Per-file review in parallel (Sonnet)
    async def review_one(path, src):
        r = await aclient.messages.create(
            model="claude-sonnet-4-6", max_tokens=512, temperature=0,
            system="Review one file. Output: severity bullets only.",
            messages=[{"role":"user","content":f"# {path}\n\n{src}"}],
        )
        return r.content[0].text
    file_reviews = asyncio.run(asyncio.gather(*[review_one(p,s) for p,s in files.items()]))

    # Step C: Verdict (Sonnet, structured tool output)
    verdict = client.messages.create(
        model="claude-sonnet-4-6", max_tokens=200, temperature=0,
        tools=VERDICT_TOOL, tool_choice={"type":"tool","name":"submit_review"},
        messages=[{"role":"user","content":
            f"<summary>{summary}</summary>\n<reviews>{file_reviews}</reviews>"}],
    )
    return next(b.input for b in verdict.content if b.type=="tool_use")

Step 3 — Compare on 10 PRs

Run both on a 10-PR test set (CC11 lab). Track:

Total tokens (input + output) per PR
Wallclock time per PR
Verdict accuracy (vs. your hand labels)

  metric        agent     chain
  ----------------------------------
  avg tokens    18,400    9,800
  avg time      24s       6s   (parallel files)
  accuracy      82%       86%

Numbers are illustrative but the shape is real: the chain typically halves cost, beats latency badly, and hits or beats accuracy because each smaller call is more focused.

Step 4 — When NOT to switch

D

Computer use without a sandbox — just trust Claude.

Correct. Legacy GUI with no API is the textbook computer-use case. Always sandbox, always cap loops — computer use is the easiest way to spend a lot of tokens fast if the model gets confused.

Look again. No API + GUI = computer use. But you must sandbox it (security) and cap iterations (cost).

Module Summary

Workflow vs agent: workflow when steps are knowable; agent when not. Default to workflow.
Chaining: sequential calls, smaller models per step where possible. Output of A is input of B.
Parallelization: sectioning (split work) + voting (same work, multiple times, aggregate). Cap concurrency; use Batch API for big offline.
Routing: cheap classifier picks the specialist. Avoid one-prompt-fits-all bloat.
Agent loop: tool_use loop with three caps — iteration, cost, time. Always.
Environment inspection: give the agent tools to look around; prompt it to do so first.
Computer use: screenshot + click + type. For legacy GUIs and APIs that don't exist. Sandbox in a VM. If the API exists, prefer the API.
Claude Code maps these patterns: subagents = routing, slash commands = chains, worktrees + background = parallelization, the CLI itself = agent loop.