Best Practices for BMAD
Practical guidance on team structure, estimating AI work, managing uncertainty, documentation standards, scaling AI development, and managing inference costs.
Team Structure for AI Projects
Effective AI teams blend traditional development skills with AI-specific expertise:
Small Team (3-5 people)
1 AI Engineer (also does prompt design), 1-2 Full-Stack Developers, 1 QA Engineer (covers AI QA), 1 Product Owner.
Medium Team (6-10 people)
2 AI Engineers, 1 Prompt Designer, 3-4 Developers, 1 AI QA Engineer, 1 QA Engineer, 1 Product Owner, 1 Scrum Master.
Large Team (10+ people)
Dedicated AI platform team, multiple feature teams with embedded AI engineers, centralized prompt library and evaluation infrastructure.
Estimating AI Work
AI tasks are inherently harder to estimate than traditional development. BMAD uses a modified estimation approach:
| Work Type | Estimation Approach | Buffer |
|---|---|---|
| Integration code | Standard story points | Normal (20%) |
| Prompt engineering | Time-boxed experiments | High (50-100%) |
| Model evaluation | Fixed time per model + dataset size | Medium (30%) |
| Quality tuning | Time-box with exit criteria | Very high (100-200%) |
| New AI feature | Discovery spike first, then estimate | Do not estimate without a spike |
Feature: AI-powered ticket classification Spike (1 day): - Can an LLM classify our ticket categories? YES - Baseline accuracy with zero-shot: 78% - With few-shot examples: 89% - Target accuracy: 90% Estimates: Prompt engineering to reach 90%: 2-4 days Integration code: 3 days Evaluation framework: 2 days Testing and QA: 2 days Monitoring setup: 1 day ───────────────────────────────── Total: 10-12 days (with buffer: 15 days)
Managing Uncertainty
AI projects have more unknowns than traditional software. BMAD manages this through:
-
Discovery Spikes
Before committing to an AI approach, run a time-boxed spike (1-3 days) to validate feasibility. If the spike fails, you have saved weeks of wasted effort.
-
Progressive Commitment
Start with the simplest possible AI approach. Only increase complexity if the simple approach does not meet quality targets. Many teams over-engineer when a well-crafted few-shot prompt would suffice.
-
Exit Criteria
Define clear exit criteria for experiments: "If we cannot reach 85% accuracy in 3 days, we will try approach B." This prevents infinite optimization loops.
-
Fallback Plans
Always have a non-AI fallback. If the AI component fails or does not meet quality targets, what is the manual or rule-based alternative?
Documentation Standards
BMAD projects require additional documentation beyond standard software docs:
- Prompt documentation: Every production prompt should have a README explaining its purpose, expected inputs/outputs, known limitations, and version history.
- Evaluation reports: Document evaluation results for each prompt version, including accuracy, sample outputs, and failure analysis.
- Model cards: For each AI model used, document its capabilities, limitations, cost, and any known biases.
- Runbooks: Create operational runbooks for common AI issues: quality degradation, model outages, cost spikes, and bias incidents.
Scaling AI Development
Centralized Prompt Library
Maintain a shared repository of tested, versioned prompts that teams can reuse. Prevents duplication and ensures quality.
Evaluation Infrastructure
Build shared tools for prompt testing, model benchmarking, and quality monitoring. Amortize the cost across all AI features.
AI Platform Team
At scale, a dedicated platform team manages model access, caching, rate limiting, cost controls, and shared monitoring.
Knowledge Sharing
Regular cross-team sessions to share prompt engineering learnings, evaluation techniques, and lessons from production incidents.
Cost Management
AI inference costs can grow rapidly. BMAD treats cost as a first-class engineering concern:
1. Model Tiering Use expensive models only when needed: - Simple tasks → Haiku ($0.25/1M tokens) - Standard tasks → Sonnet ($3/1M tokens) - Complex tasks → Opus ($15/1M tokens) 2. Response Caching Cache identical or similar requests: - Exact match cache: 90%+ hit rate typical - Semantic cache: 60-80% hit rate 3. Prompt Optimization Shorter prompts = lower costs: - Remove redundant instructions - Use concise few-shot examples - Limit output length with max_tokens 4. Batching Group requests when real-time is not needed: - Batch classification jobs - Process during off-peak hours 5. Budget Controls Set hard limits to prevent surprises: - Per-feature daily budget caps - Alert at 80% of budget - Auto-disable at 100% with fallback
Frequently Asked Questions
Yes. BMAD is designed to layer on top of existing Agile practices. Start by adding experiment sprints and the Experiment Review ceremony. Introduce AI-specific estimation practices gradually. You do not need to restructure your entire process at once.
Frame it in business terms: "Without AI QA, we are deploying features that fail 15% of the time and we do not know it." Track the cost of AI errors (support tickets, user churn, manual corrections) and compare it to the cost of evaluation infrastructure. The ROI is usually clear within one quarter.
Start with a learning sprint: have the team experiment with AI APIs, build simple prototypes, and run basic evaluations. Most developers can become productive with AI in 2-4 weeks. For specialized needs like bias testing or model fine-tuning, consider bringing in a consultant for the initial setup.
BMAD's Actualize phase explicitly requires fallback mechanisms. At minimum, have a secondary model provider configured. Better yet, design your system so AI features degrade gracefully: show cached results, fall back to rule-based logic, or queue requests for processing when the provider recovers.
BMAD's principles apply to all AI development, though the specific practices in this course focus on LLM-based applications. For ML training projects, the Blueprint and Deploy phases work the same way, while the Model phase would focus on data preparation and model training instead of prompt engineering.
Define measurable success metrics during the Blueprint phase: time saved, tasks automated, user satisfaction improvement, or revenue impact. Track inference costs alongside these metrics. A simple ROI formula: (Value generated - Inference costs - Development costs) / Total investment. Review quarterly and sunset features that do not justify their cost.
Lilly Tech Systems