Intermediate

Eigenvalues & SVD

Eigenvalue decomposition and SVD are the most powerful tools in linear algebra. They power PCA, recommender systems, image compression, and latent semantic analysis. This lesson implements five key algorithms from scratch so you understand exactly how they work.

Problem 1: Power Iteration (Dominant Eigenvalue)

Problem: Given a square matrix A, find its dominant eigenvalue (largest in absolute value) and corresponding eigenvector using the power iteration method.

Why it matters: Power iteration is the foundation of Google's PageRank algorithm. It is also the simplest eigenvalue algorithm, making it a common interview question.

import math

def dot_product(u, v):
    """Dot product of two vectors."""
    return sum(a * b for a, b in zip(u, v))

def vector_norm(v):
    """L2 norm of a vector."""
    return math.sqrt(sum(x * x for x in v))

def mat_vec_multiply(A, v):
    """Multiply matrix A by vector v."""
    return [dot_product(row, v) for row in A]

def power_iteration(A, max_iter=1000, tol=1e-10):
    """
    Find dominant eigenvalue and eigenvector of square matrix A.
    Time: O(n^2 * max_iter), Space: O(n)
    """
    n = len(A)

    # Start with random vector (use deterministic for reproducibility)
    v = [1.0 / math.sqrt(n)] * n

    eigenvalue = 0.0

    for _ in range(max_iter):
        # Multiply: w = A * v
        w = mat_vec_multiply(A, v)

        # Compute eigenvalue estimate (Rayleigh quotient)
        new_eigenvalue = dot_product(w, v)

        # Normalize
        norm = vector_norm(w)
        if norm < 1e-15:
            raise ValueError("Converged to zero vector - check matrix")
        v = [x / norm for x in w]

        # Check convergence
        if abs(new_eigenvalue - eigenvalue) < tol:
            return new_eigenvalue, v

        eigenvalue = new_eigenvalue

    return eigenvalue, v

# Test: symmetric matrix with known eigenvalues
A = [[4, 1], [1, 3]]
# Eigenvalues: (7 + sqrt(5))/2 ≈ 4.618, (7 - sqrt(5))/2 ≈ 2.382
eigenvalue, eigenvector = power_iteration(A)
print(f"Dominant eigenvalue: {eigenvalue:.6f}")  # ≈ 4.618034
print(f"Eigenvector: [{eigenvector[0]:.4f}, {eigenvector[1]:.4f}]")

Problem 2: QR Algorithm (All Eigenvalues)

Problem: Compute all eigenvalues of a square matrix using the QR algorithm. This iteratively decomposes A = QR (where Q is orthogonal and R is upper triangular), then forms A' = RQ, which converges to a matrix with eigenvalues on the diagonal.

def qr_decomposition(A):
    """
    QR decomposition using Gram-Schmidt process.
    A = Q * R where Q is orthogonal, R is upper triangular.
    Time: O(n^3), Space: O(n^2)
    """
    n = len(A)
    Q = [[0.0] * n for _ in range(n)]
    R = [[0.0] * n for _ in range(n)]

    # Extract columns of A
    columns = [[A[i][j] for i in range(n)] for j in range(n)]

    orthogonal = []

    for j in range(n):
        v = columns[j][:]

        # Subtract projections onto previous orthogonal vectors
        for k in range(j):
            R[k][j] = dot_product(orthogonal[k], columns[j])
            for i in range(n):
                v[i] -= R[k][j] * orthogonal[k][i]

        R[j][j] = vector_norm(v)

        if R[j][j] < 1e-15:
            # Linearly dependent - use zero vector
            q = [0.0] * n
        else:
            q = [x / R[j][j] for x in v]

        orthogonal.append(q)
        for i in range(n):
            Q[i][j] = q[i]

    return Q, R

def qr_eigenvalues(A, max_iter=200, tol=1e-10):
    """
    Compute all eigenvalues using the QR algorithm.
    Time: O(n^3 * max_iter), Space: O(n^2)
    """
    n = len(A)
    Ak = [row[:] for row in A]

    for iteration in range(max_iter):
        Q, R = qr_decomposition(Ak)

        # A_{k+1} = R * Q
        Ak = matrix_multiply_simple(R, Q, n)

        # Check convergence: off-diagonal elements should be near zero
        off_diag = sum(abs(Ak[i][j]) for i in range(n)
                       for j in range(n) if i != j)
        if off_diag < tol:
            break

    return [Ak[i][i] for i in range(n)]

def matrix_multiply_simple(A, B, n):
    """Matrix multiply for square n x n matrices."""
    C = [[0.0] * n for _ in range(n)]
    for i in range(n):
        for k in range(n):
            for j in range(n):
                C[i][j] += A[i][k] * B[k][j]
    return C

# Test
A = [[4, 1, 0], [1, 3, 1], [0, 1, 2]]
eigenvalues = qr_eigenvalues(A)
eigenvalues.sort(reverse=True)
print([f"{e:.4f}" for e in eigenvalues])
# Eigenvalues approximately: [4.7913, 2.7913, 1.4174]

💡

Interview tip: The basic QR algorithm shown here has O(n^3) per iteration. In practice, the QR algorithm uses Hessenberg reduction first (O(n^3) one-time cost) to make each QR step O(n^2). Mention this optimization to show depth.

Problem 3: SVD Computation

Problem: Compute the Singular Value Decomposition A = U * S * V^T for any m x n matrix. The singular values are the square roots of the eigenvalues of A^T * A.

def svd(A):
    """
    Compute SVD: A = U * S * V^T
    A: m x n matrix
    Returns: U (m x m), S (list of singular values), V (n x n)
    Time: O(m*n^2 + n^3), Space: O(m*n)
    """
    m, n = len(A), len(A[0])

    # Step 1: Compute A^T * A (n x n symmetric matrix)
    At = [[A[j][i] for j in range(m)] for i in range(n)]
    AtA = matrix_multiply_general(At, A)

    # Step 2: Find eigenvalues and eigenvectors of A^T * A
    # These give V and sigma^2
    eigenvalues_AtA = qr_eigenvalues(AtA)

    # Step 3: Singular values = sqrt of eigenvalues (sorted descending)
    # Handle numerical issues: small negatives should be zero
    singular_values = []
    for ev in sorted(eigenvalues_AtA, reverse=True):
        if ev > 1e-10:
            singular_values.append(math.sqrt(ev))
        else:
            singular_values.append(0.0)

    # Step 4: V columns are eigenvectors of A^T * A (via power iteration)
    # Step 5: U columns: u_i = (1/sigma_i) * A * v_i

    return singular_values  # Simplified: returning singular values

def matrix_multiply_general(A, B):
    """General matrix multiplication."""
    m, n = len(A), len(A[0])
    n2, p = len(B), len(B[0])
    C = [[0.0] * p for _ in range(m)]
    for i in range(m):
        for k in range(n):
            for j in range(p):
                C[i][j] += A[i][k] * B[k][j]
    return C

# Test
A = [[3, 2, 2], [2, 3, -2]]
singular_values = svd(A)
print([f"{s:.4f}" for s in singular_values])
# Approximately [5.0, 3.0, 0.0] for this specific matrix

Problem 4: Low-Rank Approximation

Problem: Given a matrix A and a target rank k, compute the best rank-k approximation using truncated SVD. This is the Eckart-Young theorem: the best rank-k approximation in Frobenius norm is obtained by keeping only the top k singular values.

def low_rank_approximation(A, k):
    """
    Compute best rank-k approximation of matrix A.
    Uses truncated SVD: keep top k singular values, zero out the rest.
    Time: O(m*n*k), Space: O(m*n)

    For interview purposes, this shows the concept using NumPy.
    The from-scratch version would use the SVD function above.
    """
    m, n = len(A), len(A[0])

    # In practice, use the SVD we built above
    # Here we demonstrate the concept with a simplified approach

    # Step 1: Compute A^T * A
    At = [[A[j][i] for j in range(m)] for i in range(n)]
    AtA = matrix_multiply_general(At, A)

    # Step 2: Find top-k eigenvectors using repeated power iteration
    # with deflation (subtract out found eigenvectors)

    eigenpairs = []
    M = [row[:] for row in AtA]

    for _ in range(min(k, n)):
        eigenvalue, eigenvector = power_iteration(M)
        eigenpairs.append((eigenvalue, eigenvector))

        # Deflate: M = M - lambda * v * v^T
        for i in range(n):
            for j in range(n):
                M[i][j] -= eigenvalue * eigenvector[i] * eigenvector[j]

    # Step 3: Reconstruct using top-k components
    # A_k = sum_{i=1}^{k} sigma_i * u_i * v_i^T

    print(f"Top {k} singular values: ", end="")
    print([f"{math.sqrt(max(ev, 0)):.4f}" for ev, _ in eigenpairs])

    return eigenpairs  # Returns eigenvalue-eigenvector pairs

# Compression ratio example
print("Rank-2 approximation of a 100x100 matrix:")
print("Original storage: 100 * 100 = 10,000 values")
print("Rank-2 storage: 2 * (100 + 100 + 1) = 402 values")
print("Compression ratio: 24.9x")

Problem 5: PCA from Scratch

Problem: Implement Principal Component Analysis (PCA) to reduce the dimensionality of a dataset. PCA finds the directions of maximum variance in the data by computing the eigenvectors of the covariance matrix.

def pca(X, n_components):
    """
    Principal Component Analysis from scratch.
    X: m x n data matrix (m samples, n features)
    n_components: number of principal components to keep
    Returns: transformed data (m x n_components), components, explained variance
    Time: O(n^3 + m*n*k), Space: O(m*n)
    """
    m, n = len(X), len(X[0])

    # Step 1: Center the data (subtract mean of each feature)
    means = [sum(X[i][j] for i in range(m)) / m for j in range(n)]
    X_centered = [[X[i][j] - means[j] for j in range(n)] for i in range(m)]

    # Step 2: Compute covariance matrix (n x n)
    # Cov = (1/(m-1)) * X_centered^T * X_centered
    Xt = [[X_centered[i][j] for i in range(m)] for j in range(n)]
    cov = matrix_multiply_general(Xt, X_centered)
    for i in range(n):
        for j in range(n):
            cov[i][j] /= (m - 1)

    # Step 3: Find top-k eigenvectors of covariance matrix
    eigenpairs = []
    M = [row[:] for row in cov]

    for _ in range(n_components):
        eigenvalue, eigenvector = power_iteration(M)
        eigenpairs.append((eigenvalue, eigenvector))

        # Deflate
        for i in range(n):
            for j in range(n):
                M[i][j] -= eigenvalue * eigenvector[i] * eigenvector[j]

    # Step 4: Project data onto principal components
    # components is k x n matrix (each row is an eigenvector)
    components = [ev for _, ev in eigenpairs]
    eigenvalues = [ev for ev, _ in eigenpairs]
    total_var = sum(cov[i][i] for i in range(n))

    # Transform: X_transformed = X_centered * components^T
    X_transformed = []
    for i in range(m):
        row = []
        for k in range(n_components):
            val = sum(X_centered[i][j] * components[k][j] for j in range(n))
            row.append(val)
        X_transformed.append(row)

    # Explained variance ratio
    explained_var = [ev / total_var for ev in eigenvalues]

    return X_transformed, components, explained_var

# Test: 2D data that lies mostly along one direction
X = [
    [2.5, 2.4], [0.5, 0.7], [2.2, 2.9], [1.9, 2.2],
    [3.1, 3.0], [2.3, 2.7], [2.0, 1.6], [1.0, 1.1],
    [1.5, 1.6], [1.1, 0.9]
]
X_pca, components, explained = pca(X, n_components=1)
print(f"Explained variance: {explained[0]:.4f}")  # ~0.96
print(f"First PC direction: [{components[0][0]:.4f}, {components[0][1]:.4f}]")

📝

NumPy equivalents: np.linalg.eig() for eigenvalues, np.linalg.svd() for SVD, sklearn.decomposition.PCA for PCA. Know these for the "now show me how you would do it in practice" follow-up question.

Key Takeaways

Power iteration finds the dominant eigenvalue in O(n^2) per iteration. It is simple but only gives one eigenvalue.
The QR algorithm finds all eigenvalues by repeatedly decomposing into Q and R and forming RQ.
SVD decomposes any matrix as A = U * S * V^T. Singular values are square roots of eigenvalues of A^T * A.
Low-rank approximation keeps only the top k singular values. This is the optimal rank-k approximation (Eckart-Young theorem).
PCA projects data onto the eigenvectors of the covariance matrix with the largest eigenvalues, preserving maximum variance.

← Previous Matrix Operations Next → Calculus & Gradients