Explainable AI Best Practices
Navigate regulatory requirements, choose the right explanation methods, communicate results effectively, and deploy explanations in production.
Regulatory Landscape
| Regulation | Region | XAI Requirements |
|---|---|---|
| GDPR Art. 22 | EU | Right to meaningful information about automated decision-making logic |
| EU AI Act | EU | High-risk AI systems must be sufficiently transparent and provide explanations |
| SR 11-7 | US (Banking) | Model risk management requires documentation and validation of model decisions |
| ECOA / FCRA | US (Credit) | Adverse action notices must provide specific reasons for credit decisions |
| FDA | US (Healthcare) | AI/ML-based medical devices require clinical transparency and validation |
Choosing the Right Method
What do you need to explain?
├── Individual prediction (local)
│ ├── Tabular data
│ │ ├── Tree model → SHAP TreeExplainer
│ │ ├── Any model → LIME or SHAP KernelExplainer
│ │ └── Actionable → Counterfactual (DiCE)
│ ├── Text data
│ │ ├── Transformer → Attention weights + SHAP
│ │ └── Any model → LIME Text
│ └── Image data
│ ├── CNN → Grad-CAM
│ ├── Any model → LIME Image
│ └── Detailed → Integrated Gradients
├── Overall model behavior (global)
│ ├── Feature ranking → Permutation importance or SHAP
│ ├── Feature effects → PDP / ICE plots
│ └── Interactions → 2D PDP or SHAP dependence
└── Model comparison
└── Compare SHAP summary plots across models
Communicating Explanations
Different audiences need different levels of detail:
- Data scientists: Full SHAP plots, feature interactions, model diagnostics. They need technical depth.
- Business stakeholders: Top-3 factors driving predictions, simple bar charts, natural language summaries.
- End users: Plain language explanations, counterfactuals ("If X were different, the outcome would change").
- Regulators: Documentation of methodology, validation results, fairness assessments, audit trails.
Production Deployment
from fastapi import FastAPI
import shap
import joblib
app = FastAPI()
model = joblib.load("model.pkl")
explainer = shap.TreeExplainer(model)
@app.post("/predict")
def predict(features: dict):
X = preprocess(features)
prediction = model.predict(X)[0]
# Compute SHAP values for this prediction
shap_values = explainer.shap_values(X)
# Return top contributing features
feature_contributions = sorted(
zip(feature_names, shap_values[0]),
key=lambda x: abs(x[1]),
reverse=True
)[:5]
return {
"prediction": float(prediction),
"explanation": {
"top_factors": [
{"feature": name, "impact": float(val)}
for name, val in feature_contributions
],
"base_value": float(explainer.expected_value)
}
}
Testing Explanations
- Sanity checks: Verify that SHAP values sum to the prediction minus the base value.
- Stability tests: Run LIME multiple times and check that top features remain consistent.
- Domain validation: Have domain experts review top features. If "patient ID" ranks highly, something is wrong.
- Fairness auditing: Check that explanations don't disproportionately rely on protected attributes.
Common Pitfalls
- Confusing correlation with causation: SHAP shows feature importance, not causal relationships. A feature may be important because it correlates with the true cause.
- Ignoring feature interactions: Single-feature explanations miss important interactions. Use 2D PDPs and SHAP interaction values.
- Over-trusting attention: Attention weights are not always reliable explanations. Validate with gradient-based methods.
- One-size-fits-all: Using the same explanation format for all audiences. Tailor the detail level to your audience.
- Explanation washing: Providing superficial explanations to check a compliance box without genuine transparency.
Frequently Asked Questions
Not necessarily. Low-risk applications (content recommendation, spam filtering) may not require formal explainability. High-risk applications (healthcare, finance, hiring) almost always do. Consider both regulatory requirements and stakeholder needs.
It can. SHAP TreeExplainer adds minimal overhead. LIME and KernelExplainer add significant latency. For real-time systems, consider pre-computing explanations for common inputs, caching, or using faster methods like TreeExplainer. You can also compute explanations asynchronously.
Yes. Research has shown that adversarial models can produce misleading explanations. This is why it's important to use multiple explanation methods, validate with domain experts, and maintain proper model governance. No single explanation method should be trusted in isolation.
Lilly Tech Systems