Advanced
Deep Learning for Time Series
Build LSTM, GRU, and Transformer-based models for complex time series forecasting with multivariate inputs and long-range dependencies.
When to Use Deep Learning
Deep learning shines when you have:
- Large datasets: Thousands of time steps or many related series.
- Multivariate inputs: Multiple features influencing the forecast.
- Complex patterns: Non-linear relationships that classical methods cannot capture.
- Multi-step forecasting: Predicting multiple future time steps simultaneously.
Data Preparation
Python — Creating sequences for LSTM
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
def create_sequences(data, seq_length, forecast_horizon=1):
"""Create input-output pairs for sequence models."""
X, y = [], []
for i in range(len(data) - seq_length - forecast_horizon + 1):
X.append(data[i:i + seq_length])
y.append(data[i + seq_length:i + seq_length + forecast_horizon])
return np.array(X), np.array(y)
# Normalize data
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data.reshape(-1, 1))
# Create sequences
seq_length = 60 # use 60 past steps
X, y = create_sequences(scaled_data, seq_length, forecast_horizon=1)
# Convert to PyTorch tensors
X_tensor = torch.FloatTensor(X)
y_tensor = torch.FloatTensor(y)
LSTM Model
Python — LSTM for time series
import torch.nn as nn
class LSTMForecaster(nn.Module):
def __init__(self, input_size=1, hidden_size=64,
num_layers=2, output_size=1, dropout=0.2):
super().__init__()
self.lstm = nn.LSTM(
input_size, hidden_size, num_layers,
batch_first=True, dropout=dropout
)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
lstm_out, _ = self.lstm(x)
# Use only the last time step's output
last_output = lstm_out[:, -1, :]
prediction = self.fc(last_output)
return prediction
# Training
model = LSTMForecaster(input_size=1, hidden_size=128, num_layers=2)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
for epoch in range(100):
model.train()
optimizer.zero_grad()
output = model(X_train)
loss = criterion(output, y_train)
loss.backward()
optimizer.step()
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch+1}, Loss: {loss.item():.6f}")
Transformer for Time Series
Python — Transformer encoder for forecasting
class TimeSeriesTransformer(nn.Module):
def __init__(self, input_size=1, d_model=64, nhead=4,
num_layers=2, dim_feedforward=256,
seq_length=60, output_size=1, dropout=0.1):
super().__init__()
self.input_projection = nn.Linear(input_size, d_model)
self.pos_encoding = nn.Parameter(
torch.randn(1, seq_length, d_model)
)
encoder_layer = nn.TransformerEncoderLayer(
d_model=d_model, nhead=nhead,
dim_feedforward=dim_feedforward,
dropout=dropout, batch_first=True
)
self.transformer = nn.TransformerEncoder(
encoder_layer, num_layers=num_layers
)
self.fc = nn.Linear(d_model, output_size)
def forward(self, x):
x = self.input_projection(x) + self.pos_encoding
x = self.transformer(x)
x = x[:, -1, :] # last time step
return self.fc(x)
model = TimeSeriesTransformer(
input_size=1, d_model=64, nhead=4, num_layers=3
)
Model Comparison
| Architecture | Strengths | Weaknesses | Best For |
|---|---|---|---|
| LSTM | Captures long-range dependencies, handles variable lengths | Slow training, sequential processing | Medium-length sequences, multivariate |
| GRU | Faster than LSTM, fewer parameters | May miss very long dependencies | When LSTM is too slow |
| TCN | Parallelizable, stable gradients | Fixed receptive field | When speed matters |
| Transformer | Parallel processing, attention mechanism | Needs more data, positional encoding | Long sequences, multi-step forecasting |
Deep learning pitfalls: (1) Always normalize/scale your data before feeding it to neural networks. (2) Start with a simple LSTM before trying Transformers. (3) Use early stopping to prevent overfitting. (4) Compare against a classical baseline — ARIMA often wins on small datasets.
Lilly Tech Systems