Feature Engineering Intermediate
Feature engineering is the art and science of transforming raw network data into meaningful inputs for ML models. Good features can make a simple model outperform a complex one trained on raw data.
Common Network Features
| Category | Raw Data | Engineered Features |
|---|---|---|
| Traffic Volume | Byte counters | Bytes/sec, rate of change, rolling average, peak-to-mean ratio |
| Timing | Timestamps | Hour of day, day of week, is_business_hours, minutes_since_last_event |
| Errors | Error counters | Error rate, error ratio (errors/total packets), error trend |
| Flows | Flow records | Unique src/dst pairs, flow duration distribution, new flow rate |
| Topology | Device connections | Hop count, path diversity, device centrality score |
Time-Series Feature Extraction
Python
import pandas as pd def create_network_features(df): """Create ML features from network time-series data.""" # Rolling statistics df['bytes_in_avg_5m'] = df['bytes_in'].rolling(5).mean() df['bytes_in_std_5m'] = df['bytes_in'].rolling(5).std() # Rate of change df['bytes_in_delta'] = df['bytes_in'].diff() df['bytes_in_pct_change'] = df['bytes_in'].pct_change() # Temporal features df['hour'] = df['timestamp'].dt.hour df['day_of_week'] = df['timestamp'].dt.dayofweek df['is_business_hours'] = df['hour'].between(8, 18).astype(int) # Z-score for anomaly detection df['bytes_in_zscore'] = (df['bytes_in'] - df['bytes_in'].mean()) / df['bytes_in'].std() return df
Feature Selection
Not all features improve model performance. Use these techniques to select the best ones:
- Correlation Analysis — Remove highly correlated features (redundant information)
- Feature Importance — Use Random Forest or XGBoost to rank features by predictive power
- Recursive Feature Elimination — Iteratively remove the least important features
- Domain Knowledge — Network engineers know which metrics matter most for specific problems
Pro Tip: Combine domain knowledge with automated feature selection. Start with features that make sense from a networking perspective, then use ML techniques to validate and refine your choices.
Next Step
Learn the best practices for deploying and maintaining ML models in network environments.
Next: Best Practices →
Lilly Tech Systems