Advanced

Best Practices & Checklist

Everything you need to review before launching a computer vision system in production. This lesson consolidates the entire course into actionable checklists, covers annotation pipeline design for continuous model improvement, defines when to trigger model retraining, and answers the most frequently asked questions about building CV systems at scale.

Production CV System Checklist

Use this checklist every time you build or deploy a CV system. Each item maps to a lesson in this course.

Architecture (Lesson 1)

Processing mode is chosen deliberately — batch vs real-time decision is based on actual latency requirements, not assumptions
Edge vs cloud decision is documented — latency, bandwidth, offline, and cost tradeoffs are evaluated for your specific use case
Pipeline stages are independently monitored — ingest, preprocess, inference, postprocess, and delivery each have health checks
Failure modes are defined — what happens when a camera goes down, GPU fails, or network drops?

Image Processing (Lesson 2)

Preprocessing matches training exactly — same resize method, normalization values, color space, and letterboxing as training
GPU batching is implemented — dynamic batching with configurable max batch size and max wait time
NMS parameters are tuned — IoU threshold and confidence threshold are validated on your specific data
Coordinate mapping is correct — detections are mapped back to original image coordinates accurately

Video Processing (Lesson 3)

Frame selection strategy is chosen — not processing every frame unless latency genuinely requires it
Hardware decoding is enabled — NVDEC or QuickSync for video decode, not software decoding
Object tracking is implemented — SORT or DeepSORT for persistent identity across frames
Backpressure handling works — frames are dropped gracefully when processing cannot keep up

Model Optimization (Lesson 4)

Model is TensorRT or OpenVINO optimized — not running raw PyTorch in production
FP16 is the minimum optimization — INT8 if accuracy validation passes on your data
Benchmark numbers are documented — FPS, latency (p50, p99), GPU memory, and accuracy on your test set
Model size fits target hardware — GPU memory usage verified under peak batch size

Edge Deployment (Lesson 5)

Offline operation is tested — system works correctly without cloud connectivity
Model update mechanism exists — can push new models to edge devices without manual intervention
Rollback is automatic — edge device reverts to previous model if new model fails health checks
Fleet monitoring is active — all edge devices report health metrics to central dashboard

Scaling (Lesson 6)

Capacity planning is documented — GPU count, storage growth, and cost projections for 6-12 months
Autoscaling is configured — GPU workers scale based on queue depth or utilization
Storage lifecycle is automated — hot to warm to cold tiering with retention policies
Load testing is complete — system tested at 2x expected peak load

Annotation Pipeline Design

Your model is only as good as your training data. A production annotation pipeline continuously improves your model by collecting, labeling, and incorporating new data from production.

# Production annotation pipeline architecture
#
# ┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
# │ 1. Collect    │───▶│ 2. Select    │───▶│ 3. Label     │───▶│ 4. Validate  │
# │ Production    │    │ (Active      │    │ (Human       │    │ (Quality     │
# │ samples       │    │  Learning)   │    │  annotators) │    │  checks)     │
# └──────────────┘    └──────────────┘    └──────────────┘    └──────────────┘
#         │                                                           │
#         │                                                           ▼
# ┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
# │ 8. Deploy     │◀──│ 7. Validate  │◀──│ 6. Train     │◀──│ 5. Dataset   │
# │ new model     │    │ accuracy     │    │ new model    │    │ merge        │
# └──────────────┘    └──────────────┘    └──────────────┘    └──────────────┘

class AnnotationPipeline:
    """
    Continuous improvement pipeline for production CV models.
    Uses active learning to select the most valuable samples for labeling.
    """

    def __init__(self, model, label_studio_url: str):
        self.model = model
        self.label_studio = LabelStudioClient(label_studio_url)

    def collect_samples(self, production_results: list[dict],
                        sample_rate: float = 0.01) -> list[dict]:
        """
        Collect production samples for potential labeling.
        Sample 1% of all predictions, plus 100% of edge cases.
        """
        samples = []
        for result in production_results:
            # Always collect edge cases
            is_edge_case = (
                result["max_confidence"] < 0.7 or           # Low confidence
                result["max_confidence"] > 0.99 or           # Suspiciously high
                result["detection_count"] > 50 or            # Unusually many detections
                result["detection_count"] == 0 or            # Nothing detected (might be wrong)
                result.get("user_flagged", False)            # User reported wrong result
            )

            if is_edge_case or np.random.random() < sample_rate:
                samples.append({
                    "image_key": result["image_key"],
                    "predictions": result["detections"],
                    "confidence": result["max_confidence"],
                    "reason": "edge_case" if is_edge_case else "random_sample",
                })

        return samples

    def select_for_labeling(self, samples: list[dict],
                            budget: int = 500) -> list[dict]:
        """
        Active learning: select the most informative samples.
        Prioritize samples where the model is most uncertain.
        """
        # Score by uncertainty (lower confidence = higher value)
        scored = []
        for sample in samples:
            uncertainty = 1 - sample["confidence"]
            # Boost edge cases
            if sample["reason"] == "edge_case":
                uncertainty *= 2
            scored.append((uncertainty, sample))

        # Select top-k most uncertain
        scored.sort(reverse=True, key=lambda x: x[0])
        selected = [s[1] for s in scored[:budget]]

        return selected

    def send_to_labeling(self, samples: list[dict], project_id: int):
        """Send selected samples to Label Studio for human annotation"""
        tasks = []
        for sample in samples:
            task = {
                "data": {
                    "image": sample["image_key"],
                },
                "predictions": [{
                    "result": self._format_predictions(sample["predictions"]),
                    "score": sample["confidence"],
                }],
            }
            tasks.append(task)

        self.label_studio.import_tasks(project_id, tasks)
        print(f"Sent {len(tasks)} samples to Label Studio project {project_id}")

    def validate_annotations(self, annotations: list[dict],
                             min_agreement: float = 0.8) -> list[dict]:
        """
        Quality control: validate annotations before adding to training data.
        - Inter-annotator agreement check (if multiple annotators)
        - Bounding box sanity checks (not too small, not too large)
        - Class consistency checks
        """
        valid = []
        for ann in annotations:
            issues = []
            for bbox in ann["bboxes"]:
                area = (bbox["x2"] - bbox["x1"]) * (bbox["y2"] - bbox["y1"])
                if area < 100:
                    issues.append("bbox_too_small")
                if area > ann["image_width"] * ann["image_height"] * 0.9:
                    issues.append("bbox_too_large")

            if not issues:
                valid.append(ann)

        rejection_rate = 1 - len(valid) / len(annotations) if annotations else 0
        print(f"Validated: {len(valid)}/{len(annotations)} ({rejection_rate:.1%} rejected)")
        return valid

Model Retraining Triggers

Knowing when to retrain your model is as important as knowing how to train it. Here are the triggers that indicate your model needs updating.

Trigger	How to Detect	Action
Accuracy drift	Weekly accuracy audit on labeled production samples drops below threshold	Retrain with recent production data
New classes needed	Business request: "detect new product type" or "new defect category"	Collect + label new class data, retrain
Distribution shift	Confidence distribution changes (more low-confidence predictions)	Investigate cause, collect samples from new distribution, retrain
Environment change	New cameras, lighting changes, seasonal variation (snow, rain)	Collect samples in new conditions, augment training data, retrain
Scheduled cadence	Calendar: monthly or quarterly retrain	Incorporate all new labeled data collected since last training
Sufficient new data	Accumulated 10,000+ new labeled samples since last training	Retrain with combined old + new data

# Automated retraining trigger monitoring
class RetrainingMonitor:
    """Monitor production metrics and trigger retraining when needed"""

    def __init__(self, config: dict):
        self.accuracy_threshold = config.get("min_accuracy", 0.90)
        self.confidence_drift_threshold = config.get("max_confidence_drift", 0.10)
        self.new_data_threshold = config.get("min_new_samples", 10000)
        self.max_days_since_training = config.get("max_days", 90)

    def should_retrain(self, metrics: dict) -> dict:
        """Check all retraining triggers and return recommendation"""
        triggers = []

        # Trigger 1: Accuracy dropped below threshold
        if metrics["weekly_accuracy"] < self.accuracy_threshold:
            triggers.append({
                "trigger": "accuracy_drift",
                "current": metrics["weekly_accuracy"],
                "threshold": self.accuracy_threshold,
                "priority": "high",
            })

        # Trigger 2: Confidence distribution shifted
        current_mean_conf = metrics["mean_confidence_7d"]
        baseline_mean_conf = metrics["mean_confidence_baseline"]
        drift = abs(current_mean_conf - baseline_mean_conf)
        if drift > self.confidence_drift_threshold:
            triggers.append({
                "trigger": "confidence_drift",
                "drift": drift,
                "threshold": self.confidence_drift_threshold,
                "priority": "medium",
            })

        # Trigger 3: Enough new labeled data accumulated
        if metrics["new_labeled_samples"] >= self.new_data_threshold:
            triggers.append({
                "trigger": "sufficient_new_data",
                "samples": metrics["new_labeled_samples"],
                "threshold": self.new_data_threshold,
                "priority": "low",
            })

        # Trigger 4: Time since last training
        if metrics["days_since_last_training"] >= self.max_days_since_training:
            triggers.append({
                "trigger": "scheduled_cadence",
                "days": metrics["days_since_last_training"],
                "threshold": self.max_days_since_training,
                "priority": "low",
            })

        return {
            "should_retrain": len(triggers) > 0,
            "triggers": triggers,
            "highest_priority": triggers[0]["priority"] if triggers else None,
        }

Frequently Asked Questions

How do I choose between YOLO, Faster R-CNN, and DETR for object detection?

Use YOLOv8 for 90% of production use cases. It offers the best balance of speed and accuracy, has an excellent ecosystem (Ultralytics), and is easy to optimize with TensorRT. Use Faster R-CNN when you need higher accuracy on small objects and can tolerate 3-5x slower inference (medical imaging, satellite analysis). Use DETR or RT-DETR when you want end-to-end detection without NMS post-processing, or when your objects have complex spatial relationships. For edge deployment, YOLOv8n or YOLOv8s are almost always the right choice due to their speed-to-accuracy ratio.

How many labeled images do I need to train a custom object detector?

With transfer learning from a COCO-pretrained YOLOv8: 100-500 images per class for a decent baseline, 1,000-5,000 per class for production quality, and 10,000+ for challenging scenarios (small objects, similar-looking classes, variable conditions). Start with 500 images, train, evaluate on a held-out test set, and collect more data specifically for the failure cases. Quality of annotations matters more than quantity — 500 perfectly labeled images beat 5,000 noisy ones. Use data augmentation (mosaic, mixup, copy-paste) to effectively multiply your dataset 5-10x.

How do I handle varying lighting conditions (day/night, indoor/outdoor)?

Three approaches, in order of effectiveness: (1) Diverse training data — collect and label images in all lighting conditions your system will encounter. This is the most reliable approach. (2) Augmentation — use brightness, contrast, saturation, and hue jittering during training. YOLOv8 does this by default, but increase the range if your conditions vary widely. (3) Preprocessing normalization — apply histogram equalization (CLAHE) as a preprocessing step to normalize contrast before model inference. For IR cameras (night vision), train a separate model on IR imagery or use multi-spectral fusion if both visible and IR cameras are available.

What is the real cost of running a production CV system?

Rough estimates for AWS (2026 pricing): Small (10 cameras): 1 T4 GPU ($180-450/month) + storage ($50/month) + compute ($100/month) = $330-600/month. Medium (100 cameras): 4 T4 GPUs ($720-1800/month) + storage ($200/month) + compute ($300/month) = $1,200-2,300/month. Large (1000 cameras): 20 T4 GPUs ($3,600-9,000/month) + storage ($1,000/month) + compute ($1,500/month) = $6,100-11,500/month. The biggest cost levers: (1) frame skipping (5-10x savings), (2) model optimization (TensorRT FP16), (3) spot instances (60-70% savings), (4) storing only detection frames (not all frames).

How do I test a CV system before deploying to production?

Follow this testing hierarchy: (1) Unit tests: verify preprocessing produces correct tensor shapes and values, NMS output is correct, coordinate mapping is accurate. (2) Integration tests: run the full pipeline on a test video, verify detections match expected ground truth within tolerance. (3) Performance tests: measure FPS, latency (p50/p99), GPU memory under sustained load for 24+ hours. (4) Shadow mode: run the new model alongside the current production model, compare results without affecting production output. Only promote when shadow results meet accuracy and performance targets. (5) Canary deployment: deploy to 5-10% of production traffic, monitor accuracy and latency for 24 hours before full rollout.

Should I build my own CV pipeline or use a managed service?

Use managed services (AWS Rekognition, Google Vision AI, Azure Computer Vision) if you have simple, standard requirements (face detection, general object detection, OCR), your team lacks GPU/ML expertise, and you process fewer than 100,000 images per month. Cost: $1-4 per 1,000 images. Use custom pipelines if you need domain-specific models (defect detection, product recognition), require edge deployment or offline operation, process more than 100,000 images per month (managed services become expensive at scale), or need control over model updates and accuracy. At scale, custom pipelines are 10-50x cheaper than managed services. The breakeven point is typically around 50,000-100,000 images per month.

How do I handle privacy concerns with camera-based CV systems?

Critical considerations: (1) Data minimization: process frames on-device and send only metadata (bounding boxes, counts) to the cloud, never raw images unless needed. (2) Face blurring: apply face detection + blur before any storage or transmission if faces are not relevant to your use case. (3) Retention limits: define how long images are stored, auto-delete after the retention period. (4) Consent signage: display clear signs that CV systems are in operation (required in many jurisdictions). (5) Access controls: restrict who can view raw images vs aggregated analytics. (6) GDPR/CCPA compliance: implement data subject access requests and right to deletion. Edge processing naturally improves privacy because images never leave the device.

Course Summary

You have completed the Designing Computer Vision Pipelines at Scale course. Here is a recap of the key concepts from each lesson:

Lesson	Key Takeaway
1. Pipeline Architecture	Every CV pipeline has 5 stages (ingest, preprocess, inference, postprocess, deliver). Choose batch vs real-time based on actual latency requirements. Use hybrid edge-cloud for most production systems.
2. Image Processing	Preprocessing must match training exactly (letterbox, normalize, color space). GPU batching gives 4-8x throughput improvement. Dynamic batching balances latency and throughput.
3. Video Processing	Never process every frame. Adaptive frame selection reduces GPU cost 10-20x. Use hardware decoding (NVDEC) for video streams. SORT tracking is fast (1ms) and sufficient for most use cases.
4. Model Optimization	TensorRT FP16 is the minimum for production (2-4x speedup). INT8 quantization gives another 1.5-2x with proper calibration. YOLOv8s TensorRT FP16 is the sweet spot for most applications.
5. Edge Deployment	NVIDIA Jetson for GPU edge, OpenVINO for Intel, CoreML/TFLite for mobile. Always implement offline operation, automatic model updates, and fleet monitoring.
6. Scaling Infrastructure	Use Kubernetes GPU autoscaling for cloud inference. Store only detection frames to save 99% storage. Capacity plan with real benchmarks, not estimates. Frame skipping is the biggest cost lever.
7. Best Practices	Build an annotation pipeline for continuous improvement. Use active learning to select the most valuable samples. Monitor accuracy weekly and retrain when triggers fire.

💡

Keep building: The best way to learn CV system design is to build real systems. Start with a single camera and a YOLOv8n model on a Jetson or in the cloud. Add tracking, optimize the model, add monitoring, then scale. Each layer you add teaches you something the documentation cannot. The patterns in this course are used by teams building CV systems from 1 camera to 100,000 cameras.

← Previous Scaling CV Infrastructure Back to Course → Course Overview