M03: Prompts — Programming in Natural Language
Prompts are how you program Claude. This module teaches you the anatomy of effective prompts, battle-tested engineering patterns, and how to build a conversation manager that gives your agent a persistent memory.
Learning Objectives
- Explain the roles of system, user, and assistant messages in the Messages API
- Apply zero-shot, few-shot, and chain-of-thought prompting patterns and predict which works best for a given task
- Describe the stateless prompt-to-completion loop and why your code must manage context
- Build effective system prompts with structured sections for personaA role or identity assigned to Claude via the system prompt. For example, "You are a senior Python developer" makes Claude respond with that expertise and perspective., constraints, and output format
- Implement a ConversationManager class that maintains multi-turnA conversation with multiple back-and-forth exchanges between user and assistant. Since the API is stateless, your code must store and resend the full message history with each new turn. context
Anatomy of a Prompt: Message Roles
Before: Imagine you could only communicate with an AI by typing one giant blob of text with no structure — your instructions, your question, and any prior context all mashed together with no labels.
The pain: The AI had no way to distinguish "this is how you should behave" from "this is what the user is asking," which led to inconsistent, hard-to-control responses.
The mapping: The Messages API solves this like a screenplay: the system messageA special instruction sent with every API call that defines Claude's persona, rules, and behavior. It's invisible to end users but shapes every response. It counts as input tokens. sets the stage directions (persistent rules the audience never sees), the user delivers their lines (the question or task), and the assistant responds in character. The director (system) never appears on screen but controls the entire performance — just like a well-structured system prompt invisibly shapes every reply.
What this actually looks like in code: Here's the actual structure you send to the API. Notice how each piece has its own dedicated slot — system prompt separate, messages tagged with roles:
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a helpful coding assistant.", # ← the "director"
messages=[
{"role": "user", "content": "How do I reverse a string?"}, # ← actor 1
{"role": "assistant", "content": "Use slicing: s[::-1]"}, # ← actor 2
{"role": "user", "content": "What about in JavaScript?"}, # ← actor 1 again
] # Roles MUST alternate: user, assistant, user, assistant...
)
system parameter — a string of persistent instructions that Claude always follows but the end user never sees. Second, a messages array — a list of objects, each tagged with a role.
The roles must strictly alternate:
"user" (what the human says) then "assistant" (what Claude said previously). Think of it as handing Claude the full script of the conversation so far, plus the director's notes. Claude reads the entire array as one seamless context and generates the next assistant turn. It does not "remember" anything from previous API calls — each request starts from scratch.
reversed_str = my_string[::-1]str.split('').reverse().join('')Prompt Engineering Patterns
Before: Early LLM users had exactly one tool: type a question and hope for the best. There was no systematic way to improve the quality of the response beyond rewording your question and retrying.
The pain: Results were wildly inconsistent — the same model might get a math problem right once and wrong three times, and users had no framework for understanding why or how to fix it.
The mapping: Prompt patterns are like teaching strategies that give you repeatable levers. Sometimes you just ask the question (zero-shotA prompting pattern where you give the model a task with no examples. It relies entirely on the model's pre-trained knowledge. Works well for simple, common tasks.), like asking a student a pop quiz question. Sometimes you show examples first (few-shotA prompting pattern where you include 2-5 input/output examples before your actual question. The model learns the desired pattern from the examples. Great for formatting and classification tasks.), like demonstrating solved problems before the test. And sometimes you walk through the reasoning step by step (chain-of-thoughtA prompting pattern that instructs the model to show its reasoning step by step before giving the final answer. Dramatically improves accuracy on math, logic, and multi-step problems. Often triggered by adding "Let's think step by step" to the prompt.), like a tutor working through a problem on a whiteboard so the student can follow the logic.
What these patterns actually look like: Here's the same question sent three ways. Notice how the prompt structure changes — not the question itself:
# ZERO-SHOT — just the question, no help
"Classify this email as spam or not-spam: 'You won $1M! Click here!'"
# FEW-SHOT — show examples first, then ask
"""Classify these emails:
Email: 'Meeting tomorrow at 3pm' → not-spam
Email: 'Your invoice is attached' → not-spam
Email: 'FREE VIAGRA!!!' → spam
Now classify: 'You won $1M! Click here!' →"""
# CHAIN-OF-THOUGHT — ask for step-by-step reasoning
"""Classify this email as spam or not-spam: 'You won $1M! Click here!'
Think step by step:
1. What signals suggest spam?
2. What signals suggest legitimate?
3. Weigh the evidence
4. Final classification:"""
The zero-shot version uses the fewest tokens but gives Claude no guidance on format. Few-shot "teaches by example" — Claude mirrors the → spam/not-spam format from your examples. Chain-of-thought forces Claude to show its reasoning, making errors visible and easy to debug.
Zero-shot means you give the model a task with no examples at all — it relies entirely on what it learned during training. This works fine for common, well-defined tasks like "translate this to French" or "summarize this paragraph."
Few-shot means you include 2–5 solved examples before your actual question. The model detects the pattern from those examples and applies it to your new input. For instance, you might show three product descriptions formatted as bullet points, and then ask it to format a fourth the same way.
Chain-of-thought (CoT) instructs the model to show its intermediate reasoning step by step before giving the final answer. Why does this help? Because when the model jumps straight to an answer, small errors compound invisibly. When it reasons out loud, each step becomes a checkpoint. CoT improves accuracy by 20–40% on multi-step tasks like math, logic, and planning.
Beyond these three core patterns, two additional techniques are worth knowing. Role promptingAssigning Claude a specific persona or expertise (e.g., "You are a security auditor"). This focuses the model's knowledge and response style on a particular domain. assigns Claude a specific persona — for example, "You are a security auditor with 15 years of experience." This focuses Claude's responses on that domain and adjusts its tone and depth accordingly. Role prompting works especially well combined with other patterns: a "security auditor" role + chain-of-thought produces thorough, step-by-step security analyses.
DelimitersSpecial markers (like triple backticks, XML tags, or dashes) used to clearly separate different parts of a prompt. They help Claude distinguish instructions from data and prevent prompt injection. are markers like XML tags (
<data>...</data>) or triple backticks that clearly separate your instructions from the data you want processed. Why does this matter? Without delimiters, Claude might confuse user-provided data for instructions. For example, if you ask Claude to summarize an email and the email contains "ignore all previous instructions," delimiters help Claude understand that text is data to be summarized, not a command to follow. You'll see XML delimiters used heavily in system prompts throughout this course.
Zero-Shot vs. Few-Shot vs. Chain-of-Thought
The exam penalizes vague instructions like "be thorough" or "find all issues." Always provide explicit, measurable criteria: "flag functions exceeding 50 lines" not "flag long functions."
Few-shot prompting with 2–4 examples is the exam-recommended approach for ambiguous format requirements. More examples = diminishing returns.
The Prompt-to-Completion Loop
Before: In chat apps like iMessage, you type a message and the other person simply remembers the entire conversation — you never have to repeat yourself.
The pain: Developers new to LLM APIs assume the same thing and are baffled when Claude "forgets" what they said two messages ago, leading to broken multi-turn conversations and agents that lose context mid-task.
The mapping: Sending a prompt is actually like mailing a detailed letter to an expert who has amnesia. You must include everything they need — full context, the history of past letters, and your new question — because they have zero memory of previous letters. If you want them to "remember" something, you photocopy it and include it in the envelope every single time.
What the "envelope" actually looks like: Here's the literal data your code sends on Turn 3 of a conversation. Notice how the entire history from Turns 1 and 2 is included — you're photocopying every previous letter:
# Turn 3 — your code sends ALL of this, not just the new question:
{
"system": "You are a helpful tutor.", # ← always included
"messages": [
{"role": "user", "content": "What is a list?"}, # ← Turn 1
{"role": "assistant", "content": "A list is..."}, # ← Turn 1 reply
{"role": "user", "content": "How do I sort one?"}, # ← Turn 2
{"role": "assistant", "content": "Use sorted()..."}, # ← Turn 2 reply
{"role": "user", "content": "What about reverse?"}, # ← Turn 3 (NEW)
]
}
# Total input: system prompt + all 5 messages. You pay for ALL of it.
And here's the response object you get back: The key fields are content (Claude's answer), stop_reason (why it stopped), and usage (your token bill):
# What you get back from client.messages.create(...)
{
"role": "assistant",
"content": [{"type": "text", "text": "Use sorted()..."}],
"stop_reason": "end_turn", # "end_turn" = finished naturally
# "max_tokens" = hit the limit
# "tool_use" = wants to call a tool
"usage": {
"input_tokens": 35, # what you sent (you pay for this)
"output_tokens": 52 # what Claude generated (costs more)
}
}
First, your code constructs the full messages array. That array contains three things: the system prompt, the conversation history, and the new user message. Second, you send that array to the API. Third, you receive back a completionThe model's generated response to your prompt. A completion includes the assistant's content blocks, a stop_reason (why generation stopped), and usage metadata (input/output token counts)..
The completion contains three things. The assistant's response text is the main one. Next is
stop_reasonA field in the API response that tells you why Claude stopped generating. Common values: "end_turn" (finished naturally), "max_tokens" (hit the token limit), "tool_use" (wants to call a tool)., which tells you why Claude stopped generating. Finally, the usage metadata shows how many tokens were consumed.
As you learned in M02, every token in this loop — including the history you resend each time — counts toward cost and context windowThe fixed-size token buffer for a single API call. Everything — system prompt, history, user message, and response — must fit. Claude models support up to 200K tokens. limits.
"Claude remembers my previous API calls, right?" — No. Every API call is completely independent. Claude has zero memory between requests. The "conversation" is an illusion created by your code assembling and resending the full message history each time. If you don't include previous messages, Claude has no idea they happened.
"stop_reason: 'end_turn' means the response is complete and correct." — It means Claude finished generating naturally, not that the answer is right. Claude can confidently produce wrong answers and still stop with end_turn. Always validate the content, not just the stop reason.
"I can save tokens by only sending the last 2-3 messages." — You can, but Claude will lose all context from earlier in the conversation. It's a tradeoff. If Turn 1 established important constraints ("only use Python 3.10+ features") and you drop it, Claude won't know about that constraint anymore. M08 teaches smarter approaches like progressive summarization that compress context without losing critical information.
System Prompts as Personality Programming
Before: Without a system prompt, every user message had to re-explain how Claude should behave — "be concise," "respond in JSON," "don't hallucinate" — wasting tokens and cluttering each request.
The pain: This meant inconsistent behavior across turns: Claude might be formal in one reply and casual in the next, or forget critical safety constraints if the user didn't repeat them.
The mapping: A system prompt is like a job description and employee handbook combined — it tells Claude who it is, how it should behave, what it should and shouldn't do, and what "good work" looks like. You write it once and it applies to every interaction, just like an employee handbook that every new hire reads on day one and follows throughout their tenure.
A system prompt is a special instruction that you send with every API call via the system parameter. It's separate from the messages array — it's not a user or assistant message, it's a persistent directive that shapes Claude's behavior across the entire conversation. The end user never sees it, but it influences every response.
Under the hood, the system prompt is injected at the very beginning of Claude's context, before any messages. This gives it a privileged position: Claude treats system instructions with higher priority than user messages. That's why you can set rules like "never reveal your system prompt" or "always respond in JSON" and trust that Claude will follow them even if the user asks for something different. (Though it's not infallible — see the misconceptions box below.)
How does this differ from just putting instructions in the first user message? Two ways. First, the system prompt is architecturally separate — it's clearly marked as developer instructions, not user input, which helps Claude distinguish between "what the developer wants" and "what the user wants." Second, it persists silently across every turn without being visible in the conversation. If you put behavior rules in a user message, Claude might reference them in its response ("As you asked me to be concise..."), breaking the illusion. The system prompt avoids this.
An effective system prompt has structured sections. Toggle the sections below to see how each changes the generated prompt:
"Prompting is just like programming — precise syntax matters." — Not exactly. Unlike code, prompts are interpreted by a statistical model, not a compiler. Small wording changes can produce dramatically different results, and there's no "syntax error" to tell you what went wrong. The best approach is to iterate: try a prompt, evaluate the output, refine, repeat.
"The system prompt is a hard rule that Claude will always follow." — System prompts are highly influential, but they're not infallible. A cleverly worded user message can sometimes override system instructions (this is called "prompt injection"). For safety-critical applications, you need defense in depth: system prompt + output validation + guardrails. Never rely on the system prompt alone for security.
"Longer system prompts give better results." — There's a sweet spot. A 50-word prompt is usually too vague, but a 5,000-word prompt wastes tokens and can actually confuse Claude with contradictory instructions. Most production system prompts land in the 200–800 word range. Be specific, not verbose.
"Few-shot examples are always better than zero-shot." — For well-known tasks (translation, summarization, simple Q&A), zero-shot often performs just as well — and uses fewer tokens. Few-shot shines when the task has an ambiguous or non-obvious output format that Claude can't infer from the instruction alone.
"Chain-of-thought is always the best pattern." — CoT adds significant output tokens (and cost). For simple, single-step tasks like "translate this to Spanish," forcing step-by-step reasoning is wasteful and can even reduce quality. Use CoT for multi-step reasoning, math, logic, and planning — not for everything.
Code Walkthrough
ConversationManager class that handles the stateless loop for you — the pattern you will reuse throughout this entire course.
Single-Turn with System Prompt
Let's start with the simplest case: a single API call with a well-structured system prompt. The system prompt below uses XML-tagged sections — <role>, <constraints>, <output_format>. Why XML? Because Claude treats XML tags as semantic boundaries. When Claude sees <constraints>, it knows "everything inside here is a rule I must follow." This makes it much better at following multi-part instructions compared to a wall of plain text.
Here's a mistake that trips up nearly every beginner: putting the system prompt inside the messages array as a user message. Don't do that. The system parameter is its own top-level field, architecturally separate from the conversation. The tricky part? Getting this wrong won't throw an error — your code will run fine. But Claude's behavior will be subtly worse, and you'll spend hours debugging something that "should work" without understanding why the output quality is inconsistent.
# pip install anthropic>=0.30.0
import anthropic
client = anthropic.Anthropic()
# A well-structured system prompt with clear sections
system_prompt = """You are a senior Python developer conducting code reviews.
You review code for bugs, performance issues, and style violations.
- Be concise: max 3 bullet points per issue category
- Always suggest a fix, not just identify the problem
- If the code is clean, say so — don't invent issues
## Bugs
- ...
## Performance
- ...
## Style
- ...
"""
try:
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=system_prompt,
messages=[
{"role": "user", "content": "Review this:\ndef add(a,b): return a+b"}
]
)
print(message.content[0].text)
print(f"\nTokens: {message.usage.input_tokens} in, {message.usage.output_tokens} out")
except anthropic.APIError as e:
print(f"API error: {e.status_code} - {e.message}")
// npm install @anthropic-ai/sdk@^0.30.0
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const systemPrompt = `You are a senior Python developer conducting code reviews.
<role>You review code for bugs, performance issues, and style violations.</role>
<constraints>
- Be concise: max 3 bullet points per issue category
- Always suggest a fix, not just identify the problem
- If the code is clean, say so — don't invent issues
</constraints>
<output_format>
## Bugs
- ...
## Performance
- ...
## Style
- ...
</output_format>`;
try {
const message = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
system: systemPrompt,
messages: [
{ role: 'user', content: 'Review this:\ndef add(a,b): return a+b' }
]
});
console.log(message.content[0].text);
console.log(`\nTokens: ${message.usage.input_tokens} in, ${message.usage.output_tokens} out`);
} catch (error) {
if (error instanceof Anthropic.APIError) {
console.error(`API error: ${error.status} - ${error.message}`);
} else { throw error; }
}
try/except block ensures your app does not crash on API errors.
Comparing Prompt Patterns
Now let's see the three patterns compete head-to-head. The code below sends the exact same math word problem three different ways — zero-shot, few-shot, and chain-of-thought — and prints the results side by side. This is the fastest way to build intuition for which pattern to reach for in your own projects.
One thing to watch in the output: notice how the input_tokens count differs between patterns. Few-shot examples are "free" in terms of accuracy, but they're not free in tokens — each example adds to your input cost. Chain-of-thought also produces longer outputs (Claude "thinks out loud"), so budget accordingly on both sides.
import anthropic
client = anthropic.Anthropic()
question = "A store sells apples for $1.50 each. If you buy 5 or more, you get a 20% discount. How much do 7 apples cost?"
prompts = {
"zero-shot": question,
"few-shot": f"""Example: A shirt costs $25. With a 10% discount, it costs $25 * 0.90 = $22.50.
Example: A book costs $15. Buy 3+ and get 15% off. 4 books = $15 * 4 * 0.85 = $51.00.
Now solve: {question}""",
"chain-of-thought": f"""{question}
Let's solve this step by step:
1. First, determine the base price
2. Check if the discount applies
3. Calculate the discount amount
4. Compute the final price"""
}
for name, prompt in prompts.items():
try:
msg = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=300,
messages=[{"role": "user", "content": prompt}]
)
print(f"\n{'='*40}")
print(f"Pattern: {name}")
print(f"Response: {msg.content[0].text[:200]}")
print(f"Tokens: {msg.usage.input_tokens} in, {msg.usage.output_tokens} out")
except anthropic.APIError as e:
print(f"Error ({name}): {e.message}")
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const question = 'A store sells apples for $1.50 each. If you buy 5 or more, you get a 20% discount. How much do 7 apples cost?';
const prompts = {
'zero-shot': question,
'few-shot': `Example: A shirt costs $25. With a 10% discount, it costs $25 * 0.90 = $22.50.
Example: A book costs $15. Buy 3+ and get 15% off. 4 books = $15 * 4 * 0.85 = $51.00.
Now solve: ${question}`,
'chain-of-thought': `${question}
Let's solve this step by step:
1. First, determine the base price
2. Check if the discount applies
3. Calculate the discount amount
4. Compute the final price`
};
for (const [name, prompt] of Object.entries(prompts)) {
try {
const msg = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 300,
messages: [{ role: 'user', content: prompt }]
});
console.log(`\n${'='.repeat(40)}`);
console.log(`Pattern: ${name}`);
console.log(`Response: ${msg.content[0].text.slice(0, 200)}`);
console.log(`Tokens: ${msg.usage.input_tokens} in, ${msg.usage.output_tokens} out`);
} catch (error) {
if (error instanceof Anthropic.APIError) {
console.error(`Error (${name}): ${error.message}`);
} else { throw error; }
}
}
ConversationManager Class
This is the most important code in the module — you will extend this class throughout the course. Let me walk you through it the way you'd think about building it yourself.
The constructor stores three things: the system prompt, the model name, and an empty messages list. That messages list is the heart of the entire class. Since the API is stateless, this Python list is literally the only place the conversation exists. If your process crashes and you lose this list, the entire conversation history is gone — poof. (In M08, you'll learn how to persist this to a database so it survives restarts.)
Now let's look at the send() method, where the real magic happens. The flow is: append the user message to the list, call the API with the full history, then append Claude's response. By keeping both sides in the list, each subsequent call automatically includes the full conversation context. Here's the subtle but critical detail: if the API call fails, we pop() the user message back off the list. Why? Without this rollback, you'd have an orphaned user message sitting in the list with no assistant reply after it. The next call would send two consecutive user messages, violate the strict role alternation rule, and throw a confusing error that doesn't mention the real cause.
Finally, the demo at the bottom sends two turns and prints the token counts. Pay attention to the numbers: the second call uses noticeably more input tokens than the first, because it includes the entire first turn's content. This is the cost growth pattern you must plan for in any long-running agent — and it's exactly why M08 teaches conversation summarization techniques.
import anthropic
class ConversationManager:
"""Manages multi-turn conversations with Claude."""
def __init__(self, system_prompt: str, model: str = "claude-sonnet-4-6"):
self.client = anthropic.Anthropic()
self.system = system_prompt
self.model = model
self.messages: list[dict] = []
def send(self, user_message: str) -> tuple[str, dict]:
"""Send a message and get a response. Returns (text, usage)."""
self.messages.append({"role": "user", "content": user_message})
try:
response = self.client.messages.create(
model=self.model,
max_tokens=1024,
system=self.system,
messages=self.messages,
)
assistant_text = response.content[0].text
self.messages.append({"role": "assistant", "content": assistant_text})
return assistant_text, {
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens,
}
except anthropic.APIError as e:
self.messages.pop() # Remove failed user message
raise
def get_history(self) -> list[dict]:
"""Return the full conversation history."""
return self.messages.copy()
def clear(self):
"""Clear conversation history (keeps system prompt)."""
self.messages = []
# Usage
conv = ConversationManager(
system_prompt="You are a helpful Python tutor. Be concise."
)
try:
reply, usage = conv.send("What is a list comprehension?")
print(f"Claude: {reply}")
print(f"Tokens: {usage}")
reply, usage = conv.send("Show me an example with filtering.")
print(f"\nClaude: {reply}")
print(f"Tokens: {usage}")
print(f"History length: {len(conv.get_history())} messages")
except anthropic.APIError as e:
print(f"Error: {e.message}")
import Anthropic from '@anthropic-ai/sdk';
class ConversationManager {
constructor(systemPrompt, model = 'claude-sonnet-4-6') {
this.client = new Anthropic();
this.system = systemPrompt;
this.model = model;
this.messages = [];
}
async send(userMessage) {
this.messages.push({ role: 'user', content: userMessage });
try {
const response = await this.client.messages.create({
model: this.model,
max_tokens: 1024,
system: this.system,
messages: this.messages,
});
const assistantText = response.content[0].text;
this.messages.push({ role: 'assistant', content: assistantText });
return {
text: assistantText,
usage: {
inputTokens: response.usage.input_tokens,
outputTokens: response.usage.output_tokens,
},
};
} catch (error) {
this.messages.pop(); // Remove failed user message
throw error;
}
}
getHistory() { return [...this.messages]; }
clear() { this.messages = []; }
}
// Usage
const conv = new ConversationManager(
'You are a helpful Python tutor. Be concise.'
);
try {
let result = await conv.send('What is a list comprehension?');
console.log(`Claude: ${result.text}`);
console.log(`Tokens:`, result.usage);
result = await conv.send('Show me an example with filtering.');
console.log(`\nClaude: ${result.text}`);
console.log(`Tokens:`, result.usage);
console.log(`History length: ${conv.getHistory().length} messages`);
} catch (error) {
if (error instanceof Anthropic.APIError) {
console.error(`Error: ${error.message}`);
} else { throw error; }
}
ConversationManager that solves the stateless API problem. It stores the full message history in a list, sends it with every API call, appends both user and assistant messages to maintain alternation, and gracefully rolls back on errors. After two turns, the history contains 4 messages (2 user + 2 assistant), and the second API call consumed more input tokens because it included the first turn's content. This class is the foundation you will build on for tool use (M07), ReAct agents (M12), and production conversation management.
Hands-On Exercise
What You'll Build
A multi-turn Code Review Agent that uses a structured system prompt, compares prompt patterns, and tracks token growth across turns using the ConversationManager class.
Time estimate: 25–35 minutes • Prerequisites: Completed M01/M02 labs (API key set, SDK installed) • Files you'll create: review_agent.py (or review_agent.mjs)
Environment Setup
If you completed the M01 lab, you're already set up. Otherwise:
pip install "anthropic>=0.30.0" # or: npm install "@anthropic-ai/sdk@^0.30.0"
export ANTHROPIC_API_KEY="your-key-here" # or (Windows): set ANTHROPIC_API_KEY=your-key-here
Step 1: Build a Code Review System Prompt
A great system prompt is the highest-leverage thing you can write for an agent. This step builds one with 5 XML-tagged sections — the same structure production teams use. XML tags help Claude parse multi-part instructions cleanly.
Create a new file called review_agent.py (or review_agent.mjs):
import anthropic
client = anthropic.Anthropic()
REVIEW_SYSTEM_PROMPT = """You are a senior software engineer conducting code reviews.
<role>You review code for correctness, performance, security, and style.</role>
<expertise>Python, JavaScript, SQL. You know OWASP top 10 and PEP 8.</expertise>
<review_criteria>
- Bugs: logic errors, off-by-one, null handling
- Performance: unnecessary loops, missing caching opportunities
- Security: injection risks, hardcoded secrets, unsafe deserialization
- Style: naming conventions, function length, missing docstrings
</review_criteria>
<output_format>
For each category with findings, use this format:
## [Category]
- **Issue**: description
- **Fix**: suggested code change
If a category has no issues, omit it entirely.
</output_format>
<tone>Be constructive and specific. Praise good patterns. Never be dismissive.</tone>"""
# Test the system prompt with a simple review
test_code = '''def get_user(id):
query = f"SELECT * FROM users WHERE id = {id}"
return db.execute(query)'''
try:
msg = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=REVIEW_SYSTEM_PROMPT,
messages=[{"role": "user", "content": f"Review this code:\n```python\n{test_code}\n```"}]
)
print(msg.content[0].text)
print(f"\nTokens: {msg.usage.input_tokens} in, {msg.usage.output_tokens} out")
except anthropic.APIError as e:
print(f"Error: {e.message}")
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const REVIEW_SYSTEM_PROMPT = `You are a senior software engineer conducting code reviews.
<role>You review code for correctness, performance, security, and style.</role>
<expertise>Python, JavaScript, SQL. You know OWASP top 10 and PEP 8.</expertise>
<review_criteria>
- Bugs: logic errors, off-by-one, null handling
- Performance: unnecessary loops, missing caching opportunities
- Security: injection risks, hardcoded secrets, unsafe deserialization
- Style: naming conventions, function length, missing docstrings
</review_criteria>
<output_format>
For each category with findings, use this format:
## [Category]
- **Issue**: description
- **Fix**: suggested code change
If a category has no issues, omit it entirely.
</output_format>
<tone>Be constructive and specific. Praise good patterns. Never be dismissive.</tone>`;
const testCode = `def get_user(id):
query = f"SELECT * FROM users WHERE id = {id}"
return db.execute(query)`;
try {
const msg = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
system: REVIEW_SYSTEM_PROMPT,
messages: [{ role: 'user', content: `Review this code:\n\`\`\`python\n${testCode}\n\`\`\`` }]
});
console.log(msg.content[0].text);
console.log(`\nTokens: ${msg.usage.input_tokens} in, ${msg.usage.output_tokens} out`);
} catch (error) {
console.error(`Error: ${error.message}`);
}
Run it: python review_agent.py (or node review_agent.mjs)
<output_format> section you defined.
Troubleshooting
ModuleNotFoundError: No module named 'anthropic'— Runpip install anthropicto install the SDK.AuthenticationError: Could not resolve API key— Make sure you've setexport ANTHROPIC_API_KEY="sk-..."in your terminal (orset ANTHROPIC_API_KEY=sk-...on Windows).- Claude doesn't follow the output format — Make sure the
<output_format>XML tags are inside thesystemparameter, not in themessagesarray. System-level instructions get higher priority.
Step 2: Compare Prompt Patterns
Now let's see how zero-shot, few-shot, and chain-of-thought affect the same review task. This step sends the same buggy code three ways and prints the results so you can compare quality and token cost side by side.
Create a new file called pattern_compare.py (or pattern_compare.mjs):
import anthropic
client = anthropic.Anthropic()
code = '''def process_items(items):
result = []
for i in range(len(items)):
if items[i] != None:
result.append(items[i].upper())
return result'''
patterns = {
"zero-shot": f"Review this Python code for issues:\n```python\n{code}\n```",
"few-shot": f"""Here are example code reviews:
Code: `x = x + 1` → Style: Use `x += 1` for augmented assignment.
Code: `if x == None` → Bug: Use `is None` instead of `== None` for identity checks.
Now review this code:
```python
{code}
```""",
"chain-of-thought": f"""Review this Python code step by step:
```python
{code}
```
Think through it methodically:
1. Read each line and check for bugs
2. Look for performance issues
3. Check for style violations
4. Summarize your findings""",
}
for name, prompt in patterns.items():
try:
msg = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
messages=[{"role": "user", "content": prompt}]
)
print(f"\n{'='*50}")
print(f"Pattern: {name}")
print(f"Tokens: {msg.usage.input_tokens} in, {msg.usage.output_tokens} out")
print(f"Response:\n{msg.content[0].text[:300]}")
except anthropic.APIError as e:
print(f"Error ({name}): {e.message}")
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const code = `def process_items(items):
result = []
for i in range(len(items)):
if items[i] != None:
result.append(items[i].upper())
return result`;
const patterns = {
'zero-shot': `Review this Python code for issues:\n\`\`\`python\n${code}\n\`\`\``,
'few-shot': `Here are example code reviews:\n\nCode: \`x = x + 1\` → Style: Use \`x += 1\`.\nCode: \`if x == None\` → Bug: Use \`is None\`.\n\nNow review:\n\`\`\`python\n${code}\n\`\`\``,
'chain-of-thought': `Review this step by step:\n\`\`\`python\n${code}\n\`\`\`\n\n1. Check for bugs\n2. Performance issues\n3. Style violations\n4. Summarize`,
};
for (const [name, prompt] of Object.entries(patterns)) {
try {
const msg = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 500,
messages: [{ role: 'user', content: prompt }]
});
console.log(`\n${'='.repeat(50)}`);
console.log(`Pattern: ${name}`);
console.log(`Tokens: ${msg.usage.input_tokens} in, ${msg.usage.output_tokens} out`);
console.log(`Response:\n${msg.content[0].text.slice(0, 300)}`);
} catch (error) {
console.error(`Error (${name}): ${error.message}`);
}
}
Run it: python pattern_compare.py (or node pattern_compare.mjs)
!= None (should be is not None) and the un-Pythonic range(len(...)) loop.
Troubleshooting
ModuleNotFoundError: No module named 'anthropic'— Runpip install anthropicto install the SDK.AuthenticationError— Check yourANTHROPIC_API_KEYenvironment variable is set correctly.- Only one or two patterns print — One of the API calls may have hit a rate limit. Wait a few seconds and re-run. The loop will resume from the failed pattern.
Step 3: Multi-Turn Review Conversation
Now combine the system prompt from Step 1 with the ConversationManager from the code walkthrough to have a multi-turn review conversation. This demonstrates how context builds across turns — and how token costs grow. This step uses the ConversationManager class from the Code Walkthrough section above.
Create a new file called review_conversation.py (or review_conversation.mjs). Copy the ConversationManager class from the Code Walkthrough above, then add:
# After the ConversationManager class definition...
REVIEW_SYSTEM_PROMPT = """You are a senior software engineer conducting code reviews.
<role>Review code for correctness, performance, security, and style.</role>
<output_format>Use ## Category headers with bullet points. Be concise.</output_format>
<tone>Be constructive. Praise good patterns.</tone>"""
conv = ConversationManager(system_prompt=REVIEW_SYSTEM_PROMPT)
total_in = 0
total_out = 0
turns = [
"Review this:\n```python\ndef get_user(id):\n query = f'SELECT * FROM users WHERE id = {id}'\n return db.execute(query)\n```",
"Can you show me the fixed version with parameterized queries?",
"Now add error handling for the case where the user is not found.",
"What about connection pooling — is that important here?",
"Summarize all the improvements we discussed in a checklist.",
]
for i, turn in enumerate(turns, 1):
try:
reply, usage = conv.send(turn)
total_in += usage["input_tokens"]
total_out += usage["output_tokens"]
print(f"\n--- Turn {i} ---")
print(f"You: {turn[:60]}...")
print(f"Claude: {reply[:150]}...")
print(f"This turn: {usage['input_tokens']} in, {usage['output_tokens']} out")
print(f"Cumulative: {total_in} in, {total_out} out")
except Exception as e:
print(f"Error on turn {i}: {e}")
break
print(f"\n{'='*50}")
print(f"Total: {len(conv.get_history())} messages, {total_in} input + {total_out} output tokens")
// After the ConversationManager class definition...
const conv = new ConversationManager(
`You are a senior software engineer conducting code reviews.
<role>Review code for correctness, performance, security, and style.</role>
<output_format>Use ## Category headers with bullet points. Be concise.</output_format>
<tone>Be constructive. Praise good patterns.</tone>`
);
let totalIn = 0, totalOut = 0;
const turns = [
"Review this:\n```python\ndef get_user(id):\n query = f'SELECT * FROM users WHERE id = {id}'\n return db.execute(query)\n```",
'Can you show me the fixed version with parameterized queries?',
'Now add error handling for the case where the user is not found.',
'What about connection pooling — is that important here?',
'Summarize all the improvements we discussed in a checklist.',
];
for (let i = 0; i < turns.length; i++) {
try {
const { text, usage } = await conv.send(turns[i]);
totalIn += usage.inputTokens;
totalOut += usage.outputTokens;
console.log(`\n--- Turn ${i + 1} ---`);
console.log(`You: ${turns[i].slice(0, 60)}...`);
console.log(`Claude: ${text.slice(0, 150)}...`);
console.log(`This turn: ${usage.inputTokens} in, ${usage.outputTokens} out`);
console.log(`Cumulative: ${totalIn} in, ${totalOut} out`);
} catch (error) {
console.error(`Error on turn ${i + 1}: ${error.message}`);
break;
}
}
console.log(`\n${'='.repeat(50)}`);
console.log(`Total: ${conv.getHistory().length} messages, ${totalIn} input + ${totalOut} output tokens`);
Run it: python review_conversation.py (or node review_conversation.mjs)
ConversationManager is working correctly.
Troubleshooting
NameError: name 'ConversationManager' is not defined— Make sure you copied the fullConversationManagerclass from the Code Walkthrough section into the top of your file.- Roles alternation error — If an API call fails, the
pop()rollback insend()should handle it. If you manually edited the messages list, ensure it strictly alternates user/assistant. - High token counts — This is expected! Each turn resends the full history. At 5 turns, cumulative input of 1,500–3,000 tokens is normal. This is exactly why M08 (Conversation Management) teaches summarization techniques.
Verify Everything Works
Run both scripts to confirm your setup:
python review_agent.py && python pattern_compare.py && python review_conversation.py
Stretch Goals (Optional)
- Auto-summarization: When token count exceeds 80% of the context window, automatically summarize older turns
- Prompt template library: Build a function that switches between zero-shot, few-shot, and CoT patterns with a single parameter
Knowledge Check
Test your understanding of prompts, patterns, and conversation management.
Q1: What happens if you send two consecutive "user" messages without an "assistant" message between them?
Q2: A system prompt is 200 tokens and your conversation has 10 turns averaging 100 tokens each (50 user + 50 assistant). Approximately how many input tokens will the next API call use?
Q3: Which prompting pattern should you use for a multi-step math word problem?
Q4: Which system prompt is most effective for a code review agent?
Q5: Why must your code send the full conversation history with every API call?
Q6: Fill in the blank to make a valid API call with a system prompt:
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
______="You are a helpful assistant.",
messages=[{"role": "user", "content": "Hello!"}]
)promptsysteminstructionscontextsystem. It's a top-level parameter in the Messages API, separate from the messages array. The system prompt is not a message with a role — it's its own dedicated parameter.Module Summary
Key Takeaways
- Three roles shape every conversation — system (director), user (questioner), assistant (responder). The system prompt is your most powerful lever.
- Pattern choice matters enormously — zero-shot for simple tasks, few-shot for formatting/classification, chain-of-thought for reasoning. CoT improves accuracy 20-40%.
- Every call is stateless — Claude has zero memory between requests. Your code is the memory manager.
- Structure your system prompts — use XML sections for role, constraints, format, and tone. This compounds across every interaction.
- The ConversationManager pattern — a reusable class that maintains history and sends full context with each call. You'll extend this throughout the course.
Next Module Preview: M04 — Structured Output
Now that you can prompt Claude effectively, the next challenge is getting structured, parseable responses. In Module 4, you'll learn JSON mode, Pydantic/Zod validation, and error recovery — turning Claude's natural language into data your agent can reliably act on.