CC2: Prompt Engineering for Claude Code

Learning Objectives

Diagnose vague prompts and rewrite them so Claude does the right thing on the first try.
Apply the five core techniques: clarity, specificity, XML structure, examples, thinking triggers.
Recognize where each technique pays off most: CLAUDE.md vs skill description vs subagent prompt vs slash command.
Quantify a prompt change with a 5-case eval (preview of CC11).

CC1 gave you the API. CC2 gives you the words that go into the API. Bad prompts waste tokens and erode trust; good prompts are short and reliably steer Claude.

Where Prompts Live in Claude Code

"Prompt engineering" sounds abstract until you realize everything you write in Claude Code is a prompt:

Every prompt surface in Claude Code

Surface	What you write	Goes where
`CLAUDE.md`	Project conventions, "don't do X"	Always-on system prompt
Skill description (frontmatter)	"When to use this skill"	System prompt, when active
Skill body (SKILL.md)	Step-by-step instructions	System prompt, when invoked
Subagent system prompt	Role + format + constraints	System prompt for that subagent
Slash command body	The actual instruction Claude executes	User message, on invocation
Hook reasoning prompt	"Block? Allow? Why?"	User message, in HTTP hook

Why it matters

The same techniques that make a one-off API call work make your CLAUDE.md, skills, and subagents work. Bad prompt engineering shows up as Claude "ignoring" your rules — usually it's not ignoring them, the prompt was too vague to bite.

Technique 1 — Be Clear & Direct

Everyday Analogy

Tell a contractor "make the kitchen nicer" and you'll get whatever they think nice means — maybe a tile color you hate. Tell them "replace the laminate counters with quartz, leave cabinets, repaint walls Benjamin Moore Cloud White" and you get exactly that.

The pain without clarity: every conversation re-litigates what you wanted. Half the result is correction.

Same with Claude. Vague: "review this code." Direct: "Find SQL injection risks in this file. Output a numbered list. If there are none, say 'No SQL injection risks found.'" One of these gets you the same answer twice; one gets you a different essay every time.

Three rules of clarity

Say what to do, not what not to do. "Output JSON" beats "don't write prose."
State the format up front. "Reply with: VERDICT: ship|hold — one line." Claude follows the explicit shape.
Eliminate ambiguity. "Refactor this" is ambiguous. "Extract every duplicated string into a constant at the top of the file" is not.

× Vague

review this PR for issues

✓ Clear & direct

Find security issues in this PR.
Categories: SQLi, XSS, secrets,
auth bypass, IDOR. Skip style.
Output: bullet list of findings,
each with file:line and severity
(critical/high/medium).

Politeness costs tokens, not answers

"Could you please, when you have a moment..." adds noise. Claude doesn't need to be buttered up. Direct imperatives ("Find...", "Output...", "Skip...") are easier to follow and shorter.

Technique 2 — Be Specific

Specificity is what separates "directionally correct" output from "ready to ship." Three flavors of specificity:

1. Specific inputs

Don't make Claude guess context it doesn't have. Either provide it, or tell Claude to ask.

× Missing context

Add caching to this endpoint.

✓ Specific

Add Redis caching to GET /api/users/:id.
Key: "user:<id>". TTL: 300s.
Invalidate on PUT/DELETE same id.
We use ioredis (see src/cache.ts).
Don't cache if the user is admin.

2. Specific format

If your downstream code parses the output, the format must be deterministic.

× Loose format

Summarize the diff briefly.

✓ Locked format

Summarize the diff in this exact form:
TITLE: <5 word summary>
RISK: low|medium|high
CHANGES:
- <file>: <1-line description>
[repeat as needed]

3. Specific constraints

Hard limits Claude must respect — word counts, file paths, sentinel values, allowed enums.

Constraints:
- Maximum 3 bullets.
- Each bullet starts with an action verb.
- Reference files only from src/billing/. If a finding is elsewhere, drop it.
- If no findings, output exactly the string: "NO_FINDINGS"

Why it works

Specificity narrows the space of "valid" Claude outputs. A vague prompt has thousands of plausible responses, of which most aren't what you wanted. A specific prompt has dozens, almost all of which are.

Technique 3 — Structure with XML Tags

Everyday Analogy

If a manager hands you a single page that says "Read all this and do something useful," your eyes glaze over. Hand you the same page with bold headings — Background, Constraints, Deliverable, Examples — and you can find the part you need in two seconds.

The pain without structure: every reader (or LLM) has to re-parse the soup of text on every read. Easy to misattribute — is "use uppercase" a rule or an example?

XML tags do that bolding for Claude. Anthropic explicitly trained Claude to recognize and respect <tag> structure in prompts. It's not magic syntax — it's a strong convention.

Technical Definition

XML-tag prompting is the practice of wrapping prompt regions in named tags — <context>, <rules>, <example>, <input> — so Claude can refer to specific regions and you can refer to those regions in instructions ("Apply <rules> to the content of <input>"). Tag names are arbitrary; the structure is what matters.

Before / after

× Soup

Here are some rules: no PII in
logs, all amounts in cents,
return JSON only. Below is the
data. Process it: customer
john@x.com paid $42.50.

✓ Tagged

<rules>
- No PII in logs
- All amounts in cents
- Return JSON only
</rules>

<input>
customer john@x.com paid $42.50
</input>

Apply <rules> to <input>.

Common tags worth standardizing on

<context> — background info, codebase facts
<rules> or <constraints> — what Claude must respect
<example> — few-shot demonstrations
<input> or <data> — the thing to operate on
<output_format> — how the answer should look

In Claude Code

Look at any well-written subagent system prompt or skill body — they almost always use XML tags. It's not stylistic; it's a measurable accuracy bump on instruction-following. Adopt this in CLAUDE.md too for sections like <security_rules> that you want Claude to bring up by name.

Technique 4 — Provide Examples (Few-Shot)

For tasks where "the right shape of answer" is hard to describe in words, show, don't tell. Few-shot prompting — giving Claude 1–5 examples of input/output pairs — usually beats more rules.

The three-example sweet spot

One example is often enough, three is the sweet spot for most extraction/classification tasks, more than five usually hits diminishing returns.

<task>
Classify each git commit message as: feat | fix | chore | docs | refactor.
</task>

<examples>
<example>
<input>Add OAuth login endpoint</input>
<output>feat</output>
</example>
<example>
<input>Resolve null deref in pricing</input>
<output>fix</output>
</example>
<example>
<input>Bump axios to 1.7</input>
<output>chore</output>
</example>
</examples>

<input>Extract auth middleware to its own module</input>
<output>

Claude sees three examples that establish the shape, then continues the pattern. Output here will be refactor — with very high confidence and no preamble, because the examples implicitly forbade preamble.

Pick examples that cover edge cases

One typical case
One ambiguous case (with the disambiguating decision shown in the output)
One that resembles a failure mode you've seen Claude make

Examples drift the system prompt

Examples are tokens too. A 5-example few-shot block in a system prompt is billed every turn. For frequently-invoked surfaces (CLAUDE.md, skill descriptions), keep examples lean — or move them into a skill body that loads only when the skill is invoked.

Technique 5 — Trigger Thinking

For hard problems, asking Claude to think step-by-step before answering measurably improves accuracy. Two ways to do it — an old prompting trick and a new built-in feature.

Old: "think step by step" in the prompt

Decide whether to merge this PR. Think step by step:
1. What changed?
2. What could break?
3. Is the test coverage adequate?
4. Final verdict: merge | hold | request-changes.

Output only the final verdict line.

The numbered scaffold lets Claude reason aloud through the steps, then produce only the final answer (because you said so).

New: Extended Thinking (Claude 4.x)

Technical Definition

Extended Thinking is a Claude 4.x feature where the model produces a hidden reasoning trace before the final answer, with a configurable token budget. You enable it via the thinking parameter on a Messages API call. In Claude Code, it's triggered by including phrases like "think hard" or "think deeply" in your prompt — the CLI translates that into the API parameter.

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[{"role": "user", "content":
        "Refactor src/auth.py to use dependency injection. "
        "Identify all call sites and propose a migration plan."
    }],
)
# resp.content has thinking blocks (internal reasoning) and text blocks (the final answer)

When to trigger thinking

Task	Thinking?
"What's the capital of France?"	No — trivial, wastes tokens.
Code review of a 200-line PR	Yes — multiple things to weigh.
Multi-file refactor planning	Yes — budget 4–8K thinking tokens.
Picking which test to run from a name	No — pattern match.

In Claude Code

The CLI maps natural-language hints to thinking budgets: "think" ≈ small budget, "think hard" ≈ medium, "think deeply / ultrathink" ≈ large. Use them sparingly — thinking tokens are billed at the same rate as output tokens. CC14 (Power User) covers thinking UX in detail.

Applied to CLAUDE.md, Skills & Subagents

Each surface has its own prompt-engineering sweet spot. Here's how to apply the five techniques where they pay off most:

CLAUDE.md — tight, rule-flavored, no examples

## Conventions
- Money in cents (integers), never floats.
- Errors: throw HttpError(status, msg). Never throw strings.
- DB access via Drizzle ORM only. No raw SQL outside src/db/migrations/.

## Don't
- Don't write to src/generated/* (codegen output).
- Don't add deps without asking; pnpm-lock.yaml is sacred.

Why this works: clear, specific, no XML soup (unnecessary for short rules), no examples (CLAUDE.md is billed every turn). It's a list of imperatives.

Skill body — structured, with examples

---
name: review-migration
description: Reviews a Drizzle migration file for safety. Triggers on .sql files in src/db/migrations/.
---

<rules>
- Flag any DROP, RENAME, or NOT NULL on existing column with data.
- Flag indexes added without CONCURRENTLY on tables > 1M rows.
- Approve idempotent DDL (CREATE IF NOT EXISTS, ALTER IF EXISTS).
</rules>

<output_format>
VERDICT: safe | risky | block
ISSUES:
- <file>:<line> — <reason>
</output_format>

<example>
<input>ALTER TABLE users ADD COLUMN email NOT NULL;</input>
<output>
VERDICT: block
ISSUES:
- migration.sql:1 — NOT NULL on existing column without default fails on rows.
</output>
</example>

Why this works: skill bodies load only on invocation, so you can afford XML structure + an example. Locked output format means downstream parsers are stable.

Subagent system prompt — role + format + thinking

---
name: deep-reviewer
model: claude-opus-4-7
description: Multi-file PR reviewer. Use for changes spanning 5+ files.
---

You are a senior engineer reviewing a pull request.

<process>
1. Read every changed file.
2. Think hard about cross-file effects.
3. Categorize each finding: bug | perf | security | style.
4. Output in <output_format>.
</process>

<output_format>
SUMMARY: <3 sentences>
FINDINGS (sorted by severity):
- [SEVERITY] file:line — description
RECOMMENDATION: merge | hold | request-changes
</output_format>

Be terse. Skip style nits unless they're severity high.

What just happened?

You saw the same five techniques tuned for three surfaces. The pattern: tighter for always-on (CLAUDE.md), richer for on-demand (skills, subagents). The cost of richness is paid when invoked, not always.

Hands-On Lab — Rewrite a Vague Subagent

You're going to take a deliberately bad subagent prompt, measure it on 5 cases, then rewrite it using the five techniques and re-measure. This previews evaluation (CC11) without the full machinery.

Step 1 — The bad subagent

---
name: commit-classifier-v0
description: Classifies commits.
---

Look at this commit and tell me what type it is.

Save as ~/.claude/agents/commit-classifier-v0.md.

Step 2 — Five test cases (golden answers)

1. "Add OAuth login endpoint"           → feat
2. "Fix null pointer in pricing.ts"      → fix
3. "Bump axios from 1.6 to 1.7"          → chore
4. "Update README install steps"         → docs
5. "Extract auth middleware to module"   → refactor

Step 3 — Run v0, count hits

In Claude Code, invoke the subagent on each commit message (e.g. @commit-classifier-v0 "Add OAuth login endpoint", or call it via the Task tool). Score: how many of the 5 outputs are exactly one of {feat, fix, chore, docs, refactor}? You'll typically see 2–3/5 — lots of "this is a feature commit because..." prose.

Step 4 — Rewrite as v1

---
name: commit-classifier-v1
description: Classifies a git commit message into exactly one type. Use when categorizing commits in CI or changelogs.
model: claude-haiku-4-5-20251001
---

Classify a single git commit message into exactly one category.

<categories>
- feat: new user-facing functionality
- fix: bug fix
- chore: dependency bumps, tooling, no user-visible change
- docs: documentation only
- refactor: structural change, behavior preserved
</categories>

<examples>
<example><input>Add password reset flow</input><output>feat</output></example>
<example><input>Resolve race in cache eviction</input><output>fix</output></example>
<example><input>Bump TypeScript 5.4 -> 5.5</input><output>chore</output></example>
</examples>

<output_format>
Exactly one word, lowercase, from the categories list. No explanation, no punctuation.
</output_format>

Save as ~/.claude/agents/commit-classifier-v1.md (sibling to v0).

Step 5 — Re-run, re-count

Run v1 on the same 5 cases. You should now see 5/5 — one-word answers, all correct. Same model would have worked, but the prompt now tells Claude exactly what shape and what categories.

Step 6 — What changed

Technique	v0	v1
Clear & direct	× "tell me what type"	✓ "Classify into exactly one category"
Specific	× types not enumerated	✓ 5 categories, defined
XML structure	×	✓ `<categories>`, `<examples>`, `<output_format>`
Examples	×	✓ 3 few-shots covering normal cases
Output format locked	×	✓ "exactly one word, no punctuation"

An environment variable.

Correct. Examples are token-heavy. Putting them in CLAUDE.md inflates every turn. Putting them in a skill body means you pay only when the skill triggers.

Look again. CLAUDE.md content is billed every turn. Skill bodies load only on invocation — that's where token-heavy content (like examples) belongs.

Module Summary

Be clear & direct: imperatives, not hedges. State the format up front. Eliminate ambiguity.
Be specific: specific inputs (with context), specific format (parseable), specific constraints (hard limits).
XML tags: wrap regions like <rules>, <input>, <example>. Refer to them by name in instructions.
Few-shot examples: 3 is the sweet spot — typical, ambiguous, known-failure. Lives in skill bodies, not CLAUDE.md.
Trigger thinking for hard tasks. In Claude Code, "think" / "think hard" / "ultrathink" set the budget.
Match technique to surface: tight rules for always-on (CLAUDE.md); rich structure + examples for on-demand (skills, subagents).
Prompt rewrites can produce 50%+ accuracy gains — cheaper than upgrading the model.