Intermediate

Feature Engineering for Time Series

Transform raw time series into powerful features — lag features, rolling statistics, Fourier terms, and calendar-based encodings.

Lag Features

Lag features capture the dependency on past values. They turn a time series problem into a supervised learning problem.

Python — Creating lag features
import pandas as pd

def create_lag_features(df, column, lags):
    """Create lag features for a given column."""
    for lag in lags:
        df[f'{column}_lag_{lag}'] = df[column].shift(lag)
    return df

# Create lags at 1, 7, 14, 28 days
df = create_lag_features(df, 'sales', [1, 7, 14, 28])

# For seasonal data, include seasonal lags
# e.g., same day last week, same day last year
df['sales_lag_7'] = df['sales'].shift(7)    # last week
df['sales_lag_365'] = df['sales'].shift(365) # last year

Rolling Statistics

Python — Rolling window features
# Rolling mean and standard deviation
for window in [7, 14, 30]:
    df[f'rolling_mean_{window}'] = (
        df['sales'].rolling(window=window).mean()
    )
    df[f'rolling_std_{window}'] = (
        df['sales'].rolling(window=window).std()
    )
    df[f'rolling_min_{window}'] = (
        df['sales'].rolling(window=window).min()
    )
    df[f'rolling_max_{window}'] = (
        df['sales'].rolling(window=window).max()
    )

# Expanding statistics (cumulative)
df['expanding_mean'] = df['sales'].expanding().mean()

# Exponentially weighted moving average
df['ewm_mean_7'] = df['sales'].ewm(span=7).mean()

Calendar Features

Python — Date-based features
df['date'] = pd.to_datetime(df['date'])

# Basic calendar features
df['day_of_week'] = df['date'].dt.dayofweek      # 0=Monday
df['day_of_month'] = df['date'].dt.day
df['day_of_year'] = df['date'].dt.dayofyear
df['week_of_year'] = df['date'].dt.isocalendar().week
df['month'] = df['date'].dt.month
df['quarter'] = df['date'].dt.quarter
df['year'] = df['date'].dt.year

# Boolean features
df['is_weekend'] = df['day_of_week'].isin([5, 6]).astype(int)
df['is_month_start'] = df['date'].dt.is_month_start.astype(int)
df['is_month_end'] = df['date'].dt.is_month_end.astype(int)

Fourier Terms

Fourier features encode cyclical patterns (seasonality) using sine and cosine functions.

Python — Fourier features for seasonality
import numpy as np

def fourier_features(dates, period, n_terms):
    """Generate Fourier features for seasonality."""
    t = np.arange(len(dates))
    features = {}
    for k in range(1, n_terms + 1):
        features[f'sin_{period}_{k}'] = np.sin(2 * np.pi * k * t / period)
        features[f'cos_{period}_{k}'] = np.cos(2 * np.pi * k * t / period)
    return pd.DataFrame(features, index=dates)

# Weekly seasonality (period=7)
weekly = fourier_features(df['date'], period=7, n_terms=3)

# Yearly seasonality (period=365.25)
yearly = fourier_features(df['date'], period=365.25, n_terms=5)

df = pd.concat([df, weekly, yearly], axis=1)

Feature Engineering Checklist

CategoryFeaturesWhen to Use
Lag featurest-1, t-7, t-30, t-365Always (core features)
Rolling statsMean, std, min, max over windowsCapture recent trends
CalendarDay of week, month, holidaysBusiness data with weekly/yearly patterns
FourierSin/cos with different periodsComplex seasonality
ExternalWeather, prices, promotionsWhen external factors drive the series
Data leakage warning: When creating features, use only information available at prediction time. Lag features must use past values only. Rolling statistics must use a trailing window, not centered. Never include future information in your features.