Advanced

Deep Learning for Time Series

Build LSTM, GRU, and Transformer-based models for complex time series forecasting with multivariate inputs and long-range dependencies.

When to Use Deep Learning

Deep learning shines when you have:

  • Large datasets: Thousands of time steps or many related series.
  • Multivariate inputs: Multiple features influencing the forecast.
  • Complex patterns: Non-linear relationships that classical methods cannot capture.
  • Multi-step forecasting: Predicting multiple future time steps simultaneously.

Data Preparation

Python — Creating sequences for LSTM
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader

def create_sequences(data, seq_length, forecast_horizon=1):
    """Create input-output pairs for sequence models."""
    X, y = [], []
    for i in range(len(data) - seq_length - forecast_horizon + 1):
        X.append(data[i:i + seq_length])
        y.append(data[i + seq_length:i + seq_length + forecast_horizon])
    return np.array(X), np.array(y)

# Normalize data
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data.reshape(-1, 1))

# Create sequences
seq_length = 60  # use 60 past steps
X, y = create_sequences(scaled_data, seq_length, forecast_horizon=1)

# Convert to PyTorch tensors
X_tensor = torch.FloatTensor(X)
y_tensor = torch.FloatTensor(y)

LSTM Model

Python — LSTM for time series
import torch.nn as nn

class LSTMForecaster(nn.Module):
    def __init__(self, input_size=1, hidden_size=64,
                 num_layers=2, output_size=1, dropout=0.2):
        super().__init__()
        self.lstm = nn.LSTM(
            input_size, hidden_size, num_layers,
            batch_first=True, dropout=dropout
        )
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        lstm_out, _ = self.lstm(x)
        # Use only the last time step's output
        last_output = lstm_out[:, -1, :]
        prediction = self.fc(last_output)
        return prediction

# Training
model = LSTMForecaster(input_size=1, hidden_size=128, num_layers=2)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

for epoch in range(100):
    model.train()
    optimizer.zero_grad()
    output = model(X_train)
    loss = criterion(output, y_train)
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1}, Loss: {loss.item():.6f}")

Transformer for Time Series

Python — Transformer encoder for forecasting
class TimeSeriesTransformer(nn.Module):
    def __init__(self, input_size=1, d_model=64, nhead=4,
                 num_layers=2, dim_feedforward=256,
                 seq_length=60, output_size=1, dropout=0.1):
        super().__init__()
        self.input_projection = nn.Linear(input_size, d_model)
        self.pos_encoding = nn.Parameter(
            torch.randn(1, seq_length, d_model)
        )
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=d_model, nhead=nhead,
            dim_feedforward=dim_feedforward,
            dropout=dropout, batch_first=True
        )
        self.transformer = nn.TransformerEncoder(
            encoder_layer, num_layers=num_layers
        )
        self.fc = nn.Linear(d_model, output_size)

    def forward(self, x):
        x = self.input_projection(x) + self.pos_encoding
        x = self.transformer(x)
        x = x[:, -1, :]  # last time step
        return self.fc(x)

model = TimeSeriesTransformer(
    input_size=1, d_model=64, nhead=4, num_layers=3
)

Model Comparison

ArchitectureStrengthsWeaknessesBest For
LSTMCaptures long-range dependencies, handles variable lengthsSlow training, sequential processingMedium-length sequences, multivariate
GRUFaster than LSTM, fewer parametersMay miss very long dependenciesWhen LSTM is too slow
TCNParallelizable, stable gradientsFixed receptive fieldWhen speed matters
TransformerParallel processing, attention mechanismNeeds more data, positional encodingLong sequences, multi-step forecasting
Deep learning pitfalls: (1) Always normalize/scale your data before feeding it to neural networks. (2) Start with a simple LSTM before trying Transformers. (3) Use early stopping to prevent overfitting. (4) Compare against a classical baseline — ARIMA often wins on small datasets.