Intermediate
Feature Engineering for Time Series
Transform raw time series into powerful features — lag features, rolling statistics, Fourier terms, and calendar-based encodings.
Lag Features
Lag features capture the dependency on past values. They turn a time series problem into a supervised learning problem.
Python — Creating lag features
import pandas as pd
def create_lag_features(df, column, lags):
"""Create lag features for a given column."""
for lag in lags:
df[f'{column}_lag_{lag}'] = df[column].shift(lag)
return df
# Create lags at 1, 7, 14, 28 days
df = create_lag_features(df, 'sales', [1, 7, 14, 28])
# For seasonal data, include seasonal lags
# e.g., same day last week, same day last year
df['sales_lag_7'] = df['sales'].shift(7) # last week
df['sales_lag_365'] = df['sales'].shift(365) # last year
Rolling Statistics
Python — Rolling window features
# Rolling mean and standard deviation
for window in [7, 14, 30]:
df[f'rolling_mean_{window}'] = (
df['sales'].rolling(window=window).mean()
)
df[f'rolling_std_{window}'] = (
df['sales'].rolling(window=window).std()
)
df[f'rolling_min_{window}'] = (
df['sales'].rolling(window=window).min()
)
df[f'rolling_max_{window}'] = (
df['sales'].rolling(window=window).max()
)
# Expanding statistics (cumulative)
df['expanding_mean'] = df['sales'].expanding().mean()
# Exponentially weighted moving average
df['ewm_mean_7'] = df['sales'].ewm(span=7).mean()
Calendar Features
Python — Date-based features
df['date'] = pd.to_datetime(df['date'])
# Basic calendar features
df['day_of_week'] = df['date'].dt.dayofweek # 0=Monday
df['day_of_month'] = df['date'].dt.day
df['day_of_year'] = df['date'].dt.dayofyear
df['week_of_year'] = df['date'].dt.isocalendar().week
df['month'] = df['date'].dt.month
df['quarter'] = df['date'].dt.quarter
df['year'] = df['date'].dt.year
# Boolean features
df['is_weekend'] = df['day_of_week'].isin([5, 6]).astype(int)
df['is_month_start'] = df['date'].dt.is_month_start.astype(int)
df['is_month_end'] = df['date'].dt.is_month_end.astype(int)
Fourier Terms
Fourier features encode cyclical patterns (seasonality) using sine and cosine functions.
Python — Fourier features for seasonality
import numpy as np
def fourier_features(dates, period, n_terms):
"""Generate Fourier features for seasonality."""
t = np.arange(len(dates))
features = {}
for k in range(1, n_terms + 1):
features[f'sin_{period}_{k}'] = np.sin(2 * np.pi * k * t / period)
features[f'cos_{period}_{k}'] = np.cos(2 * np.pi * k * t / period)
return pd.DataFrame(features, index=dates)
# Weekly seasonality (period=7)
weekly = fourier_features(df['date'], period=7, n_terms=3)
# Yearly seasonality (period=365.25)
yearly = fourier_features(df['date'], period=365.25, n_terms=5)
df = pd.concat([df, weekly, yearly], axis=1)
Feature Engineering Checklist
| Category | Features | When to Use |
|---|---|---|
| Lag features | t-1, t-7, t-30, t-365 | Always (core features) |
| Rolling stats | Mean, std, min, max over windows | Capture recent trends |
| Calendar | Day of week, month, holidays | Business data with weekly/yearly patterns |
| Fourier | Sin/cos with different periods | Complex seasonality |
| External | Weather, prices, promotions | When external factors drive the series |
Data leakage warning: When creating features, use only information available at prediction time. Lag features must use past values only. Rolling statistics must use a trailing window, not centered. Never include future information in your features.