Advanced

Best Practices for BMAD

Practical guidance on team structure, estimating AI work, managing uncertainty, documentation standards, scaling AI development, and managing inference costs.

Team Structure for AI Projects

Effective AI teams blend traditional development skills with AI-specific expertise:

👥

Small Team (3-5 people)

1 AI Engineer (also does prompt design), 1-2 Full-Stack Developers, 1 QA Engineer (covers AI QA), 1 Product Owner.

👥

Medium Team (6-10 people)

2 AI Engineers, 1 Prompt Designer, 3-4 Developers, 1 AI QA Engineer, 1 QA Engineer, 1 Product Owner, 1 Scrum Master.

👥

Large Team (10+ people)

Dedicated AI platform team, multiple feature teams with embedded AI engineers, centralized prompt library and evaluation infrastructure.

Key principle: Every team member should understand AI basics, even if they are not writing prompts. When developers understand how AI models work, they make better architectural decisions and write better integration code.

Estimating AI Work

AI tasks are inherently harder to estimate than traditional development. BMAD uses a modified estimation approach:

Work Type Estimation Approach Buffer
Integration code Standard story points Normal (20%)
Prompt engineering Time-boxed experiments High (50-100%)
Model evaluation Fixed time per model + dataset size Medium (30%)
Quality tuning Time-box with exit criteria Very high (100-200%)
New AI feature Discovery spike first, then estimate Do not estimate without a spike
Estimation Example
Feature: AI-powered ticket classification

Spike (1 day):
  - Can an LLM classify our ticket categories? YES
  - Baseline accuracy with zero-shot: 78%
  - With few-shot examples: 89%
  - Target accuracy: 90%

Estimates:
  Prompt engineering to reach 90%:  2-4 days
  Integration code:                 3 days
  Evaluation framework:             2 days
  Testing and QA:                   2 days
  Monitoring setup:                 1 day
  ─────────────────────────────────
  Total: 10-12 days (with buffer: 15 days)

Managing Uncertainty

AI projects have more unknowns than traditional software. BMAD manages this through:

  1. Discovery Spikes

    Before committing to an AI approach, run a time-boxed spike (1-3 days) to validate feasibility. If the spike fails, you have saved weeks of wasted effort.

  2. Progressive Commitment

    Start with the simplest possible AI approach. Only increase complexity if the simple approach does not meet quality targets. Many teams over-engineer when a well-crafted few-shot prompt would suffice.

  3. Exit Criteria

    Define clear exit criteria for experiments: "If we cannot reach 85% accuracy in 3 days, we will try approach B." This prevents infinite optimization loops.

  4. Fallback Plans

    Always have a non-AI fallback. If the AI component fails or does not meet quality targets, what is the manual or rule-based alternative?

Documentation Standards

BMAD projects require additional documentation beyond standard software docs:

  • Prompt documentation: Every production prompt should have a README explaining its purpose, expected inputs/outputs, known limitations, and version history.
  • Evaluation reports: Document evaluation results for each prompt version, including accuracy, sample outputs, and failure analysis.
  • Model cards: For each AI model used, document its capabilities, limitations, cost, and any known biases.
  • Runbooks: Create operational runbooks for common AI issues: quality degradation, model outages, cost spikes, and bias incidents.

Scaling AI Development

📋

Centralized Prompt Library

Maintain a shared repository of tested, versioned prompts that teams can reuse. Prevents duplication and ensures quality.

Evaluation Infrastructure

Build shared tools for prompt testing, model benchmarking, and quality monitoring. Amortize the cost across all AI features.

🔄

AI Platform Team

At scale, a dedicated platform team manages model access, caching, rate limiting, cost controls, and shared monitoring.

📚

Knowledge Sharing

Regular cross-team sessions to share prompt engineering learnings, evaluation techniques, and lessons from production incidents.

Cost Management

AI inference costs can grow rapidly. BMAD treats cost as a first-class engineering concern:

Cost Management Strategies
1. Model Tiering
   Use expensive models only when needed:
   - Simple tasks → Haiku ($0.25/1M tokens)
   - Standard tasks → Sonnet ($3/1M tokens)
   - Complex tasks → Opus ($15/1M tokens)

2. Response Caching
   Cache identical or similar requests:
   - Exact match cache: 90%+ hit rate typical
   - Semantic cache: 60-80% hit rate

3. Prompt Optimization
   Shorter prompts = lower costs:
   - Remove redundant instructions
   - Use concise few-shot examples
   - Limit output length with max_tokens

4. Batching
   Group requests when real-time is not needed:
   - Batch classification jobs
   - Process during off-peak hours

5. Budget Controls
   Set hard limits to prevent surprises:
   - Per-feature daily budget caps
   - Alert at 80% of budget
   - Auto-disable at 100% with fallback
Watch out: A common surprise is the cost of evaluation and testing. Running your prompt against a 1,000-case test dataset costs the same as 1,000 production requests. Budget for evaluation costs separately from production inference.

Frequently Asked Questions

Yes. BMAD is designed to layer on top of existing Agile practices. Start by adding experiment sprints and the Experiment Review ceremony. Introduce AI-specific estimation practices gradually. You do not need to restructure your entire process at once.

Frame it in business terms: "Without AI QA, we are deploying features that fail 15% of the time and we do not know it." Track the cost of AI errors (support tickets, user churn, manual corrections) and compare it to the cost of evaluation infrastructure. The ROI is usually clear within one quarter.

Start with a learning sprint: have the team experiment with AI APIs, build simple prototypes, and run basic evaluations. Most developers can become productive with AI in 2-4 weeks. For specialized needs like bias testing or model fine-tuning, consider bringing in a consultant for the initial setup.

BMAD's Actualize phase explicitly requires fallback mechanisms. At minimum, have a secondary model provider configured. Better yet, design your system so AI features degrade gracefully: show cached results, fall back to rule-based logic, or queue requests for processing when the provider recovers.

BMAD's principles apply to all AI development, though the specific practices in this course focus on LLM-based applications. For ML training projects, the Blueprint and Deploy phases work the same way, while the Model phase would focus on data preparation and model training instead of prompt engineering.

Define measurable success metrics during the Blueprint phase: time saved, tasks automated, user satisfaction improvement, or revenue impact. Track inference costs alongside these metrics. A simple ROI formula: (Value generated - Inference costs - Development costs) / Total investment. Review quarterly and sunset features that do not justify their cost.