CC2: Prompt Engineering for Claude Code
Five techniques — clear & direct, specific, XML-structured, example-driven, thinking-triggered — applied directly to the prompts that you write in Claude Code: CLAUDE.md, skill descriptions, subagent system prompts, and slash-command bodies.
Learning Objectives
- Diagnose vague prompts and rewrite them so Claude does the right thing on the first try.
- Apply the five core techniques: clarity, specificity, XML structure, examples, thinking triggers.
- Recognize where each technique pays off most: CLAUDE.md vs skill description vs subagent prompt vs slash command.
- Quantify a prompt change with a 5-case eval (preview of CC11).
Where Prompts Live in Claude Code
"Prompt engineering" sounds abstract until you realize everything you write in Claude Code is a prompt:
| Surface | What you write | Goes where |
|---|---|---|
CLAUDE.md | Project conventions, "don't do X" | Always-on system prompt |
| Skill description (frontmatter) | "When to use this skill" | System prompt, when active |
| Skill body (SKILL.md) | Step-by-step instructions | System prompt, when invoked |
| Subagent system prompt | Role + format + constraints | System prompt for that subagent |
| Slash command body | The actual instruction Claude executes | User message, on invocation |
| Hook reasoning prompt | "Block? Allow? Why?" | User message, in HTTP hook |
The same techniques that make a one-off API call work make your CLAUDE.md, skills, and subagents work. Bad prompt engineering shows up as Claude "ignoring" your rules — usually it's not ignoring them, the prompt was too vague to bite.
Technique 1 — Be Clear & Direct
Tell a contractor "make the kitchen nicer" and you'll get whatever they think nice means — maybe a tile color you hate. Tell them "replace the laminate counters with quartz, leave cabinets, repaint walls Benjamin Moore Cloud White" and you get exactly that.
The pain without clarity: every conversation re-litigates what you wanted. Half the result is correction.
Same with Claude. Vague: "review this code." Direct: "Find SQL injection risks in this file. Output a numbered list. If there are none, say 'No SQL injection risks found.'" One of these gets you the same answer twice; one gets you a different essay every time.
Three rules of clarity
- Say what to do, not what not to do. "Output JSON" beats "don't write prose."
- State the format up front. "Reply with: VERDICT: ship|hold — one line." Claude follows the explicit shape.
- Eliminate ambiguity. "Refactor this" is ambiguous. "Extract every duplicated string into a constant at the top of the file" is not.
× Vague
review this PR for issues
✓ Clear & direct
Find security issues in this PR. Categories: SQLi, XSS, secrets, auth bypass, IDOR. Skip style. Output: bullet list of findings, each with file:line and severity (critical/high/medium).
"Could you please, when you have a moment..." adds noise. Claude doesn't need to be buttered up. Direct imperatives ("Find...", "Output...", "Skip...") are easier to follow and shorter.
Technique 2 — Be Specific
Specificity is what separates "directionally correct" output from "ready to ship." Three flavors of specificity:
1. Specific inputs
Don't make Claude guess context it doesn't have. Either provide it, or tell Claude to ask.
× Missing context
Add caching to this endpoint.
✓ Specific
Add Redis caching to GET /api/users/:id. Key: "user:<id>". TTL: 300s. Invalidate on PUT/DELETE same id. We use ioredis (see src/cache.ts). Don't cache if the user is admin.
2. Specific format
If your downstream code parses the output, the format must be deterministic.
× Loose format
Summarize the diff briefly.
✓ Locked format
Summarize the diff in this exact form: TITLE: <5 word summary> RISK: low|medium|high CHANGES: - <file>: <1-line description> [repeat as needed]
3. Specific constraints
Hard limits Claude must respect — word counts, file paths, sentinel values, allowed enums.
Constraints:
- Maximum 3 bullets.
- Each bullet starts with an action verb.
- Reference files only from src/billing/. If a finding is elsewhere, drop it.
- If no findings, output exactly the string: "NO_FINDINGS"
Specificity narrows the space of "valid" Claude outputs. A vague prompt has thousands of plausible responses, of which most aren't what you wanted. A specific prompt has dozens, almost all of which are.
Technique 3 — Structure with XML Tags
If a manager hands you a single page that says "Read all this and do something useful," your eyes glaze over. Hand you the same page with bold headings — Background, Constraints, Deliverable, Examples — and you can find the part you need in two seconds.
The pain without structure: every reader (or LLM) has to re-parse the soup of text on every read. Easy to misattribute — is "use uppercase" a rule or an example?
XML tags do that bolding for Claude. Anthropic explicitly trained Claude to recognize and respect <tag> structure in prompts. It's not magic syntax — it's a strong convention.
XML-tag prompting is the practice of wrapping prompt regions in named tags — <context>, <rules>, <example>, <input> — so Claude can refer to specific regions and you can refer to those regions in instructions ("Apply <rules> to the content of <input>"). Tag names are arbitrary; the structure is what matters.
Before / after
× Soup
Here are some rules: no PII in logs, all amounts in cents, return JSON only. Below is the data. Process it: customer john@x.com paid $42.50.
✓ Tagged
<rules> - No PII in logs - All amounts in cents - Return JSON only </rules> <input> customer john@x.com paid $42.50 </input> Apply <rules> to <input>.
Common tags worth standardizing on
<context>— background info, codebase facts<rules>or<constraints>— what Claude must respect<example>— few-shot demonstrations<input>or<data>— the thing to operate on<output_format>— how the answer should look
Look at any well-written subagent system prompt or skill body — they almost always use XML tags. It's not stylistic; it's a measurable accuracy bump on instruction-following. Adopt this in CLAUDE.md too for sections like <security_rules> that you want Claude to bring up by name.
Technique 4 — Provide Examples (Few-Shot)
For tasks where "the right shape of answer" is hard to describe in words, show, don't tell. Few-shot prompting — giving Claude 1–5 examples of input/output pairs — usually beats more rules.
The three-example sweet spot
One example is often enough, three is the sweet spot for most extraction/classification tasks, more than five usually hits diminishing returns.
<task>
Classify each git commit message as: feat | fix | chore | docs | refactor.
</task>
<examples>
<example>
<input>Add OAuth login endpoint</input>
<output>feat</output>
</example>
<example>
<input>Resolve null deref in pricing</input>
<output>fix</output>
</example>
<example>
<input>Bump axios to 1.7</input>
<output>chore</output>
</example>
</examples>
<input>Extract auth middleware to its own module</input>
<output>
Claude sees three examples that establish the shape, then continues the pattern. Output here will be refactor — with very high confidence and no preamble, because the examples implicitly forbade preamble.
Pick examples that cover edge cases
- One typical case
- One ambiguous case (with the disambiguating decision shown in the output)
- One that resembles a failure mode you've seen Claude make
Examples are tokens too. A 5-example few-shot block in a system prompt is billed every turn. For frequently-invoked surfaces (CLAUDE.md, skill descriptions), keep examples lean — or move them into a skill body that loads only when the skill is invoked.
Technique 5 — Trigger Thinking
For hard problems, asking Claude to think step-by-step before answering measurably improves accuracy. Two ways to do it — an old prompting trick and a new built-in feature.
Old: "think step by step" in the prompt
Decide whether to merge this PR. Think step by step:
1. What changed?
2. What could break?
3. Is the test coverage adequate?
4. Final verdict: merge | hold | request-changes.
Output only the final verdict line.
The numbered scaffold lets Claude reason aloud through the steps, then produce only the final answer (because you said so).
New: Extended Thinking (Claude 4.x)
Extended Thinking is a Claude 4.x feature where the model produces a hidden reasoning trace before the final answer, with a configurable token budget. You enable it via the thinking parameter on a Messages API call. In Claude Code, it's triggered by including phrases like "think hard" or "think deeply" in your prompt — the CLI translates that into the API parameter.
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
thinking={"type": "enabled", "budget_tokens": 8000},
messages=[{"role": "user", "content":
"Refactor src/auth.py to use dependency injection. "
"Identify all call sites and propose a migration plan."
}],
)
# resp.content has thinking blocks (internal reasoning) and text blocks (the final answer)
When to trigger thinking
| Task | Thinking? |
|---|---|
| "What's the capital of France?" | No — trivial, wastes tokens. |
| Code review of a 200-line PR | Yes — multiple things to weigh. |
| Multi-file refactor planning | Yes — budget 4–8K thinking tokens. |
| Picking which test to run from a name | No — pattern match. |
The CLI maps natural-language hints to thinking budgets: "think" ≈ small budget, "think hard" ≈ medium, "think deeply / ultrathink" ≈ large. Use them sparingly — thinking tokens are billed at the same rate as output tokens. CC14 (Power User) covers thinking UX in detail.
Applied to CLAUDE.md, Skills & Subagents
Each surface has its own prompt-engineering sweet spot. Here's how to apply the five techniques where they pay off most:
CLAUDE.md — tight, rule-flavored, no examples
## Conventions
- Money in cents (integers), never floats.
- Errors: throw HttpError(status, msg). Never throw strings.
- DB access via Drizzle ORM only. No raw SQL outside src/db/migrations/.
## Don't
- Don't write to src/generated/* (codegen output).
- Don't add deps without asking; pnpm-lock.yaml is sacred.
Why this works: clear, specific, no XML soup (unnecessary for short rules), no examples (CLAUDE.md is billed every turn). It's a list of imperatives.
Skill body — structured, with examples
---
name: review-migration
description: Reviews a Drizzle migration file for safety. Triggers on .sql files in src/db/migrations/.
---
<rules>
- Flag any DROP, RENAME, or NOT NULL on existing column with data.
- Flag indexes added without CONCURRENTLY on tables > 1M rows.
- Approve idempotent DDL (CREATE IF NOT EXISTS, ALTER IF EXISTS).
</rules>
<output_format>
VERDICT: safe | risky | block
ISSUES:
- <file>:<line> — <reason>
</output_format>
<example>
<input>ALTER TABLE users ADD COLUMN email NOT NULL;</input>
<output>
VERDICT: block
ISSUES:
- migration.sql:1 — NOT NULL on existing column without default fails on rows.
</output>
</example>
Why this works: skill bodies load only on invocation, so you can afford XML structure + an example. Locked output format means downstream parsers are stable.
Subagent system prompt — role + format + thinking
---
name: deep-reviewer
model: claude-opus-4-7
description: Multi-file PR reviewer. Use for changes spanning 5+ files.
---
You are a senior engineer reviewing a pull request.
<process>
1. Read every changed file.
2. Think hard about cross-file effects.
3. Categorize each finding: bug | perf | security | style.
4. Output in <output_format>.
</process>
<output_format>
SUMMARY: <3 sentences>
FINDINGS (sorted by severity):
- [SEVERITY] file:line — description
RECOMMENDATION: merge | hold | request-changes
</output_format>
Be terse. Skip style nits unless they're severity high.
You saw the same five techniques tuned for three surfaces. The pattern: tighter for always-on (CLAUDE.md), richer for on-demand (skills, subagents). The cost of richness is paid when invoked, not always.
Hands-On Lab — Rewrite a Vague Subagent
You're going to take a deliberately bad subagent prompt, measure it on 5 cases, then rewrite it using the five techniques and re-measure. This previews evaluation (CC11) without the full machinery.
Step 1 — The bad subagent
---
name: commit-classifier-v0
description: Classifies commits.
---
Look at this commit and tell me what type it is.
Save as ~/.claude/agents/commit-classifier-v0.md.
Step 2 — Five test cases (golden answers)
1. "Add OAuth login endpoint" → feat
2. "Fix null pointer in pricing.ts" → fix
3. "Bump axios from 1.6 to 1.7" → chore
4. "Update README install steps" → docs
5. "Extract auth middleware to module" → refactor
Step 3 — Run v0, count hits
In Claude Code, invoke the subagent on each commit message (e.g. @commit-classifier-v0 "Add OAuth login endpoint", or call it via the Task tool). Score: how many of the 5 outputs are exactly one of {feat, fix, chore, docs, refactor}? You'll typically see 2–3/5 — lots of "this is a feature commit because..." prose.
Step 4 — Rewrite as v1
---
name: commit-classifier-v1
description: Classifies a git commit message into exactly one type. Use when categorizing commits in CI or changelogs.
model: claude-haiku-4-5-20251001
---
Classify a single git commit message into exactly one category.
<categories>
- feat: new user-facing functionality
- fix: bug fix
- chore: dependency bumps, tooling, no user-visible change
- docs: documentation only
- refactor: structural change, behavior preserved
</categories>
<examples>
<example><input>Add password reset flow</input><output>feat</output></example>
<example><input>Resolve race in cache eviction</input><output>fix</output></example>
<example><input>Bump TypeScript 5.4 -> 5.5</input><output>chore</output></example>
</examples>
<output_format>
Exactly one word, lowercase, from the categories list. No explanation, no punctuation.
</output_format>
Save as ~/.claude/agents/commit-classifier-v1.md (sibling to v0).
Step 5 — Re-run, re-count
Run v1 on the same 5 cases. You should now see 5/5 — one-word answers, all correct. Same model would have worked, but the prompt now tells Claude exactly what shape and what categories.
Step 6 — What changed
| Technique | v0 | v1 |
|---|---|---|
| Clear & direct | × "tell me what type" | ✓ "Classify into exactly one category" |
| Specific | × types not enumerated | ✓ 5 categories, defined |
| XML structure | × | ✓ <categories>, <examples>, <output_format> |
| Examples | × | ✓ 3 few-shots covering normal cases |
| Output format locked | × | ✓ "exactly one word, no punctuation" |
Two subagents and a 5-case scoresheet showing the jump from v0 to v1. You've now done a tiny eval — in CC11 you'll automate this. The lesson: prompt rewrites can produce 50%+ accuracy gains without changing models. Almost always cheaper than upgrading to Opus.
Knowledge Check
1. Your CLAUDE.md says "review code carefully." Reviews come back inconsistent. Best fix?
2. You want a skill body to take input and apply rules to it. Best structure?
<rules>...</rules> and <input>...</input> with an instruction "Apply <rules> to <input>."{rules: [...], input: ...}# Rules / # Input3. Your subagent does classification. You give it 0 examples and a vague description. Best first improvement?
4. You write "think hard" in a Claude Code prompt. What does the CLI do?
thinking API parameter with a medium token budget.thinking parameter.5. You want to add few-shot examples to teach Claude a coding style. Where should they live?
Module Summary
- Be clear & direct: imperatives, not hedges. State the format up front. Eliminate ambiguity.
- Be specific: specific inputs (with context), specific format (parseable), specific constraints (hard limits).
- XML tags: wrap regions like
<rules>,<input>,<example>. Refer to them by name in instructions. - Few-shot examples: 3 is the sweet spot — typical, ambiguous, known-failure. Lives in skill bodies, not CLAUDE.md.
- Trigger thinking for hard tasks. In Claude Code, "think" / "think hard" / "ultrathink" set the budget.
- Match technique to surface: tight rules for always-on (CLAUDE.md); rich structure + examples for on-demand (skills, subagents).
- Prompt rewrites can produce 50%+ accuracy gains — cheaper than upgrading the model.