Intermediate

Category 4: Time Series

The time series category tests your ability to forecast future values from sequential data. You must create windowed datasets from raw time series, build RNN/LSTM models, and achieve target MAE thresholds. This is often considered the hardest exam category.

What the Exam Tests

You receive a time series (e.g., temperature, stock prices, sunspot activity) and must build a model that predicts future values. The key challenge is creating windowed training data from a single sequence and choosing the right model architecture.

💡
Exam tip: The windowed dataset function is the most critical piece of code for this category. Memorize the tf.data.Dataset.window() pattern — you will use it in every time series task. Getting this wrong means your model trains on garbage data.

Creating Windowed Datasets

The core technique for time series: split a sequence into overlapping windows where each window is a training example. The last value in each window is the label (what we predict).

import tensorflow as tf
import numpy as np

# ---- The windowed dataset function (MEMORIZE THIS) ----
def windowed_dataset(series, window_size, batch_size, shuffle_buffer):
    """
    Convert a time series into a windowed tf.data.Dataset.

    Args:
        series: numpy array of time series values
        window_size: number of time steps to use as input
        batch_size: batch size for training
        shuffle_buffer: buffer size for shuffling

    Returns:
        tf.data.Dataset of (window, label) pairs
    """
    dataset = tf.data.Dataset.from_tensor_slices(series)
    dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)
    dataset = dataset.flat_map(lambda w: w.batch(window_size + 1))
    dataset = dataset.map(lambda w: (w[:-1], w[-1]))  # (input, label)
    dataset = dataset.shuffle(shuffle_buffer)
    dataset = dataset.batch(batch_size).prefetch(1)
    return dataset

# ---- Example usage ----
# Generate synthetic time series
time = np.arange(0, 1000)
series = 10 + np.sin(time * 0.1) * 10 + np.random.randn(1000) * 2
series = series.astype(np.float32)

# Split into train/validation
SPLIT_TIME = 800
train_series = series[:SPLIT_TIME]
val_series = series[SPLIT_TIME:]

# Create windowed datasets
WINDOW_SIZE = 20
BATCH_SIZE = 32
SHUFFLE_BUFFER = 1000

train_dataset = windowed_dataset(
    train_series, WINDOW_SIZE, BATCH_SIZE, SHUFFLE_BUFFER
)

# Inspect one batch
for x, y in train_dataset.take(1):
    print(f"Input shape: {x.shape}")   # (32, 20)
    print(f"Label shape: {y.shape}")   # (32,)
    print(f"Input[0]: {x[0].numpy()[:5]}...")
    print(f"Label[0]: {y[0].numpy()}")

Practice Model 1: Dense Network for Time Series

Start with a simple dense network. This works surprisingly well for many exam tasks and trains much faster than LSTM.

import tensorflow as tf
import numpy as np

# ---- Generate synthetic time series with trend + seasonality ----
def generate_series(time, trend_slope=0.05, seasonality_period=365,
                    seasonality_amplitude=40, noise_level=5):
    trend = trend_slope * time
    seasonal = seasonality_amplitude * np.sin(2 * np.pi * time / seasonality_period)
    noise = noise_level * np.random.randn(len(time))
    return (trend + seasonal + noise).astype(np.float32)

time = np.arange(0, 1500)
series = generate_series(time)

SPLIT_TIME = 1200
train_series = series[:SPLIT_TIME]
val_series = series[SPLIT_TIME:]

WINDOW_SIZE = 30
BATCH_SIZE = 32

# Reuse windowed_dataset function from above
train_dataset = windowed_dataset(train_series, WINDOW_SIZE, BATCH_SIZE, 1000)

# ---- Simple Dense model ----
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=[WINDOW_SIZE]),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1)
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss='mse',
    metrics=['mae']
)

history = model.fit(
    train_dataset,
    epochs=50,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(monitor='loss', patience=5)
    ]
)

# ---- Forecast validation period ----
def forecast_series(model, series, window_size, split_time):
    forecast = []
    for t in range(split_time, len(series)):
        window = series[t - window_size:t][np.newaxis]
        pred = model.predict(window, verbose=0)[0, 0]
        forecast.append(pred)
    return np.array(forecast)

forecast = forecast_series(model, series, WINDOW_SIZE, SPLIT_TIME)
mae = np.mean(np.abs(forecast - val_series[:len(forecast)]))
print(f"Validation MAE: {mae:.4f}")

model.save('timeseries_dense.h5')

Practice Model 2: LSTM for Time Series

When a dense network is not enough, use LSTM. The key difference is reshaping input to 3D: (batch, timesteps, features).

import tensorflow as tf
import numpy as np

# Assume: train_dataset created with windowed_dataset()
# WINDOW_SIZE = 30

# ---- LSTM model ----
# CRITICAL: LSTM expects 3D input: (batch_size, timesteps, features)
# Our windowed data is 2D: (batch_size, window_size)
# We need to add a feature dimension using Lambda or Reshape

model = tf.keras.Sequential([
    # Add feature dimension: (batch, window_size) -> (batch, window_size, 1)
    tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, axis=-1),
                           input_shape=[30]),

    # Bidirectional LSTM
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32, return_sequences=True)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(16)),

    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1)
])

# ---- Learning rate scheduling (important for time series) ----
lr_schedule = tf.keras.callbacks.LearningRateScheduler(
    lambda epoch: 1e-4 * 10**(epoch / 20)  # Exponential increase to find best LR
)

model.compile(
    optimizer=tf.keras.optimizers.SGD(learning_rate=1e-4, momentum=0.9),
    loss='mse',
    metrics=['mae']
)

# Step 1: Find optimal learning rate (run ~100 epochs with lr_schedule)
# history = model.fit(train_dataset, epochs=100, callbacks=[lr_schedule])
# Plot loss vs learning_rate to find the minimum

# Step 2: Train with the optimal learning rate
model.compile(
    optimizer=tf.keras.optimizers.SGD(learning_rate=1e-5, momentum=0.9),
    loss='mse',
    metrics=['mae']
)

# history = model.fit(train_dataset, epochs=200)

model.save('timeseries_lstm.h5')

Practice Model 3: Conv1D + LSTM Hybrid

A powerful combination that often outperforms pure LSTM on exam tasks. Conv1D extracts local patterns, LSTM captures long-range dependencies.

import tensorflow as tf

# ---- Conv1D + LSTM hybrid model ----
WINDOW_SIZE = 30

model = tf.keras.Sequential([
    # Expand dimensions for Conv1D
    tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, axis=-1),
                           input_shape=[WINDOW_SIZE]),

    # Conv1D to extract local patterns
    tf.keras.layers.Conv1D(64, kernel_size=5, strides=1,
                           padding='causal', activation='relu'),
    tf.keras.layers.Conv1D(64, kernel_size=3, strides=1,
                           padding='causal', activation='relu'),

    # LSTM for sequence modeling
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),

    # Output
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1)
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss='huber',  # Huber loss is robust to outliers
    metrics=['mae']
)

# history = model.fit(
#     train_dataset,
#     epochs=100,
#     callbacks=[
#         tf.keras.callbacks.EarlyStopping(
#             monitor='loss', patience=10,
#             restore_best_weights=True
#         )
#     ]
# )

model.save('timeseries_conv_lstm.h5')

Time Series Quick Reference

# Time Series Exam Cheat Sheet

# 1. ALWAYS split time series chronologically (no random shuffle for split)
SPLIT_TIME = int(0.8 * len(series))
train = series[:SPLIT_TIME]
val = series[SPLIT_TIME:]

# 2. Window size selection:
# - Too small: model cannot capture patterns
# - Too large: model trains slowly, may overfit
# - Good starting points: 20-50 for daily data, 7-14 for weekly patterns

# 3. Model architecture decision:
# - Start with Dense (fastest to train, often good enough)
# - Move to LSTM if Dense MAE is too high
# - Try Conv1D + LSTM hybrid for best results

# 4. Loss functions for time series:
# - 'mse': standard, penalizes large errors more
# - 'mae': robust to outliers
# - 'huber': combination of MSE and MAE (often best for exam)

# 5. Common mistakes:
# - Shuffling the train/val split (must be chronological!)
# - Forgetting to expand dims for LSTM/Conv1D input
# - Not using the windowed_dataset function correctly
# - Setting window_size larger than available data
# - Using too high a learning rate (time series models are sensitive)

Key Takeaways

💡
  • Memorize the windowed_dataset() function — it is the foundation of every time series exam task
  • Always split time series chronologically, never randomly
  • Start with a Dense model first — if MAE is too high, move to LSTM
  • LSTM requires 3D input: use Lambda(lambda x: tf.expand_dims(x, axis=-1))
  • Huber loss often works better than MSE for time series
  • Learning rate scheduling can significantly improve time series model performance
  • Conv1D + LSTM hybrid is the most powerful architecture for exam tasks