Federated Learning Best Practices
Practical guidance for handling non-IID data, optimizing communication, securing FL systems, and deploying federated models in production.
Handling Non-IID Data
- Data sharing: Share a small public dataset across clients to provide a common baseline. Even 5% shared data can dramatically improve convergence.
- FedProx: Add a proximal term to keep local models from drifting too far from the global model during local training.
- Personalization: Train a global model as a starting point, then fine-tune locally for each client. This works well when clients have different data distributions.
- Clustered FL: Group clients with similar data distributions and train separate models per cluster.
Communication Optimization
| Technique | Compression Ratio | Impact on Accuracy |
|---|---|---|
| Gradient quantization | 4-8x | Minimal |
| Top-k sparsification | 10-100x | Small with error feedback |
| Federated distillation | 100-1000x | Moderate, task-dependent |
| Fewer rounds + more local epochs | Linear | Small with FedProx |
Security Considerations
Byzantine-Robust Aggregation
Use robust aggregation methods (Krum, Trimmed Mean, Median) that can tolerate malicious clients sending adversarial updates.
Client Authentication
Verify client identities to prevent unauthorized participants from joining the federation.
Anomaly Detection
Monitor client updates for anomalies (unusually large gradients, statistically unlikely updates) that may indicate attacks.
Privacy Budget Management
Track cumulative privacy expenditure across rounds. Set hard limits on total epsilon to prevent excessive information leakage over time.
Deployment Checklist
- Start with simulation: Test your FL system in simulation before deploying to real devices. Flower and TFF both support simulation mode.
- Define governance: Establish clear agreements about data ownership, model ownership, and liability before starting a federation.
- Plan for heterogeneity: Devices have different compute capabilities, memory, and network speeds. Design for the weakest participant.
- Monitoring: Track per-client metrics, aggregation quality, convergence speed, and privacy budget consumption.
- Fallback strategy: Have a plan for when clients drop out, network partitions occur, or the model fails to converge.
- Compliance: Even with FL, consult legal counsel about data protection regulations in each jurisdiction where clients operate.
Frequently Asked Questions
No. If you can centralize data (with consent and compliance), centralized training is simpler, faster, and often produces better models. FL is the right choice when centralization is impossible or undesirable due to privacy, regulation, bandwidth, or competitive reasons.
For cross-silo FL (e.g., hospitals), 3-20 participants can work well. For cross-device FL (e.g., mobile phones), hundreds to millions of devices are typical. The key is having enough total data across all clients, not the number of clients per se.
FL alone does not guarantee privacy. Model updates can leak information through gradient inversion attacks. For strong privacy guarantees, combine FL with differential privacy and secure aggregation. The level of privacy depends on the privacy budget (epsilon) chosen.
Lilly Tech Systems