Intermediate

Prompt Patterns in BMAD

A catalog of proven prompt engineering patterns with examples, plus guidance on prompt versioning, A/B testing, prompt libraries, and anti-patterns to avoid.

Pattern Catalog

Chain of Thought (CoT)

Instructs the model to reason step-by-step before providing a final answer. This dramatically improves accuracy on complex reasoning tasks.

Chain of Thought Example
System: You are a code review assistant. When
analyzing code, think through each issue step by
step before providing your assessment.

User: Review this function for bugs:

```python
def calculate_discount(price, discount_pct):
    return price * discount_pct / 100
```

Assistant: Let me analyze this step by step:

Step 1: Check the function signature
- Takes price and discount_pct as parameters
- No type hints (minor issue)

Step 2: Check the calculation logic
- price * discount_pct / 100 calculates the
  discount AMOUNT, not the discounted PRICE
- If price=100, discount_pct=20, returns 20
- But the user likely wants 80 (the final price)

Step 3: Check edge cases
- No validation for negative values
- No validation for discount_pct > 100

Bug found: Function returns the discount amount
instead of the discounted price. Should be:
  return price * (1 - discount_pct / 100)

Tree of Thought (ToT)

Explores multiple reasoning paths and evaluates them before selecting the best approach. Useful for complex decisions with trade-offs.

Tree of Thought Example
System: When solving architecture decisions,
explore at least 3 different approaches. For each
approach, list pros, cons, and a score from 1-10.
Then recommend the best option with your reasoning.

User: We need to implement real-time notifications.
Our stack is Node.js with PostgreSQL. We have
~10K concurrent users.

Assistant:
Approach 1: WebSockets (Socket.io)
Pros: True real-time, bidirectional, mature library
Cons: Stateful connections, harder to scale
Score: 7/10

Approach 2: Server-Sent Events (SSE)
Pros: Simple, HTTP-based, auto-reconnect
Cons: Unidirectional, limited browser connections
Score: 8/10

Approach 3: Polling with Long-Poll Fallback
Pros: Simplest, works everywhere, stateless
Cons: Higher latency, more server load
Score: 5/10

Recommendation: Approach 2 (SSE)
For 10K users with mostly server-to-client
notifications, SSE provides the best balance of
simplicity and real-time capability.

ReAct (Reasoning + Acting)

Combines reasoning with action steps. The model thinks about what to do, takes an action, observes the result, and repeats. Essential for AI agent workflows.

ReAct Pattern Example
System: You are a debugging assistant. Use the
Thought/Action/Observation cycle to diagnose issues.

Available actions:
- READ_FILE(path) - Read a source file
- SEARCH(query) - Search the codebase
- RUN_TEST(name) - Run a specific test

User: The login endpoint returns 500 errors.

Assistant:
Thought: I need to find the login endpoint handler
and check for potential error sources.

Action: SEARCH("login endpoint handler")

Observation: Found in src/routes/auth.ts, line 42

Thought: Let me read the file to see the handler
implementation and identify the error source.

Action: READ_FILE("src/routes/auth.ts")

Observation: The handler calls db.users.findOne()
but doesn't await the promise.

Thought: Missing await would cause the handler to
proceed with an undefined user object, causing a
TypeError when accessing user.password.

Diagnosis: Missing await on line 45. The database
query returns a Promise that is not being awaited.

Self-Consistency

Run the same prompt multiple times and select the most common answer. Reduces variability and improves reliability for critical outputs.

Python - Self-Consistency Implementation
async def classify_with_consistency(text, n=5):
    """Run classification N times, return majority."""
    results = []
    for _ in range(n):
        result = await llm.classify(
            text,
            temperature=0.7  # Some randomness
        )
        results.append(result)

    # Return most common classification
    counter = Counter(results)
    best, count = counter.most_common(1)[0]

    confidence = count / n
    return {
        "classification": best,
        "confidence": confidence,
        "all_results": dict(counter)
    }

Prompt Versioning

Treat prompts like code — version them, track changes, and test before deploying:

Prompt Version File
# prompts/code-review/v2.3.yaml
name: code-review
version: 2.3
model: claude-sonnet
temperature: 0.3
changelog:
  - v2.3: Added security check instructions
  - v2.2: Improved false positive rate
  - v2.1: Added step-by-step reasoning
system: |
  You are a senior code reviewer. Analyze code
  changes for bugs, security issues, and style
  problems. Use step-by-step reasoning.
eval_dataset: datasets/code-review-v2.json
baseline_accuracy: 0.91

A/B Testing Prompts

Compare prompt versions in production to find the best performer:

Python - A/B Testing
class PromptABTest:
    def __init__(self, variants, split=0.5):
        self.variants = variants
        self.split = split
        self.metrics = defaultdict(list)

    async def run(self, input_data, user_id):
        # Deterministic assignment based on user
        variant = "A" if hash(user_id) % 100 < \
            self.split * 100 else "B"

        prompt = self.variants[variant]
        result = await llm.call(prompt, input_data)

        # Track metrics for analysis
        self.metrics[variant].append({
            "latency": result.latency,
            "tokens": result.token_count,
            "user_rating": None  # Filled later
        })
        return result

Prompt Libraries

Organize and share proven prompts across your team:

Library Category Example Prompts
Code Analysis Code review, bug detection, refactoring suggestions, documentation generation
Data Processing Classification, extraction, summarization, translation
Content Generation Technical writing, email drafting, report generation
Quality Assurance Test case generation, output validation, bias detection

Anti-Patterns to Avoid

These patterns lead to unreliable, expensive, or unmaintainable AI features:
  • Mega-prompt: Cramming every instruction into one massive prompt. Break complex tasks into a chain of focused prompts instead.
  • Hope-driven development: Testing a prompt once, seeing it work, and shipping it. Always test against a diverse evaluation dataset.
  • Prompt hardcoding: Embedding prompts directly in source code. Use external prompt files that can be versioned and updated independently.
  • Ignoring token costs: Not tracking or optimizing token usage. A prompt that costs $0.01 per call adds up to $10,000 at 1M requests.
  • No fallback plan: Assuming the AI model will always be available and produce good results. Always build fallback mechanisms.
  • Temperature neglect: Using default temperature for all tasks. Use low temperature (0.0-0.3) for deterministic tasks, higher (0.7-1.0) for creative ones.