Introduction to Time Series Forecasting
Understand the fundamental concepts of time series data, including stationarity, trend, seasonality, and the standard forecasting workflow.
What is a Time Series?
A time series is a sequence of data points collected at successive, equally spaced points in time. Unlike regular tabular data, the order of observations matters — each value depends on previous values, creating temporal dependencies.
Time series data is everywhere: stock prices, weather readings, website traffic, sensor data, sales figures, energy consumption, and heart rate monitors.
Key Components
Trend
Long-term increase or decrease in the data. A stock market index may show an upward trend over decades.
Seasonality
Regular, repeating patterns at fixed intervals. Retail sales spike every December; ice cream sales peak in summer.
Cyclical Patterns
Fluctuations not at a fixed frequency, often driven by economic or business cycles lasting years.
Residual / Noise
Random variation that cannot be attributed to trend, seasonality, or cycles. The unpredictable component.
Stationarity
A time series is stationary when its statistical properties (mean, variance, autocorrelation) do not change over time. Most classical forecasting methods require stationary data.
import pandas as pd
from statsmodels.tsa.stattools import adfuller
# Augmented Dickey-Fuller test
def test_stationarity(series):
result = adfuller(series, autolag='AIC')
print(f"ADF Statistic: {result[0]:.4f}")
print(f"p-value: {result[1]:.4f}")
if result[1] < 0.05:
print("Series IS stationary (reject null hypothesis)")
else:
print("Series is NOT stationary (fail to reject)")
# Make a series stationary by differencing
df['value_diff'] = df['value'].diff() # First difference
df['value_diff2'] = df['value'].diff().diff() # Second difference
# Or use log transform + differencing
import numpy as np
df['value_log_diff'] = np.log(df['value']).diff()
Autocorrelation
Autocorrelation measures the correlation between a time series and its lagged version. The ACF (autocorrelation function) and PACF (partial autocorrelation function) plots are essential diagnostic tools.
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
plot_acf(df['value'].dropna(), ax=axes[0], lags=40)
plot_pacf(df['value'].dropna(), ax=axes[1], lags=40)
axes[0].set_title("ACF")
axes[1].set_title("PACF")
plt.tight_layout()
plt.show()
The Forecasting Workflow
Explore and visualize
Plot the data, identify trend, seasonality, outliers, and missing values.
Preprocess
Handle missing values, remove outliers, apply transformations (log, Box-Cox) to stabilize variance.
Check stationarity
Use the ADF test and visual inspection. Apply differencing if needed.
Select a model
Choose between classical (ARIMA), modern (Prophet), or deep learning (LSTM) based on data characteristics.
Train and validate
Use time-series-aware splitting (no random shuffling). Apply walk-forward validation.
Forecast and evaluate
Generate predictions, compute metrics (MAPE, RMSE), and assess uncertainty with confidence intervals.
Common Applications
- Stock market: Price prediction, volatility forecasting, algorithmic trading signals.
- Weather: Temperature, precipitation, and wind forecasting for days to weeks ahead.
- Demand forecasting: Retail sales, inventory management, supply chain optimization.
- Energy: Electricity load forecasting, renewable energy output prediction.
- Healthcare: Patient volume forecasting, epidemic curve modeling.
Lilly Tech Systems