Beginner

Heap Fundamentals

A heap is a complete binary tree stored as an array that satisfies the heap property: every parent is smaller (min-heap) or larger (max-heap) than its children. Heaps power priority queues, enabling O(log n) insertion and O(1) access to the min/max element — essential for top-K selection, beam search decoding, and priority scheduling in ML systems.

What Is a Heap?

A binary heap is a complete binary tree where each node satisfies the heap property:

Min-heap: Every parent ≤ its children. The root is the minimum element.
Max-heap: Every parent ≥ its children. The root is the maximum element.

Because the tree is complete, we store it compactly in an array. For a node at index i:

Left child: 2*i + 1
Right child: 2*i + 2
Parent: (i - 1) // 2

Python's heapq Module

Python's heapq module implements a min-heap. It operates on a regular list and provides these core operations:

import heapq

# Create a heap from a list - O(n)
nums = [5, 3, 8, 1, 2, 7]
heapq.heapify(nums)
print(nums)  # [1, 2, 7, 5, 3, 8] - min-heap order

# Push an element - O(log n)
heapq.heappush(nums, 0)
print(nums[0])  # 0 - new minimum

# Pop the smallest element - O(log n)
smallest = heapq.heappop(nums)
print(smallest)  # 0

# Push and pop in one operation - O(log n)
result = heapq.heappushpop(nums, 4)  # push 4, then pop smallest

# Pop and push in one operation - O(log n)
result = heapq.heapreplace(nums, 6)  # pop smallest, then push 6

# Get n largest/smallest - O(n log k)
top3 = heapq.nlargest(3, nums)
bottom3 = heapq.nsmallest(3, nums)

Simulating a Max-Heap in Python

Since Python only provides a min-heap, we negate values to simulate a max-heap:

import heapq

# Max-heap by negating values
max_heap = []
for val in [5, 3, 8, 1, 2]:
    heapq.heappush(max_heap, -val)

# Pop the largest element
largest = -heapq.heappop(max_heap)
print(largest)  # 8

# Peek at the largest
print(-max_heap[0])  # 5

Heap with Custom Objects

Use tuples (priority, tiebreaker, value) for custom ordering. The tiebreaker prevents comparison errors when priorities are equal:

import heapq

# Priority queue with (priority, counter, task)
counter = 0
pq = []

tasks = [("Write code", 3), ("Fix bug", 1), ("Review PR", 2)]
for task, priority in tasks:
    heapq.heappush(pq, (priority, counter, task))
    counter += 1

while pq:
    priority, _, task = heapq.heappop(pq)
    print(f"Priority {priority}: {task}")
# Priority 1: Fix bug
# Priority 2: Review PR
# Priority 3: Write code

Complexity Summary

Operation	Time	Notes
`heapify()`	O(n)	Build heap from unsorted list
`heappush()`	O(log n)	Insert element maintaining heap
`heappop()`	O(log n)	Remove and return smallest
`heap[0]`	O(1)	Peek at smallest
`nlargest(k, heap)`	O(n log k)	Better than sorting when k << n
`nsmallest(k, heap)`	O(n log k)	Better than sorting when k << n

When to Use a Heap

Signal in Problem	Heap Pattern	Example
"K largest" or "K smallest"	Top-K with min/max heap	Kth largest element
"Merge K sorted"	Min-heap of K elements	Merge K sorted lists
"Median" or "middle element"	Two heaps (max + min)	Find median from stream
"Schedule" or "interval"	Min-heap on end times	Meeting rooms II
"Next greater/smaller"	Lazy deletion heap	Sliding window maximum

ML Applications of Heaps

🔍

Beam Search Decoding

In NLP sequence-to-sequence models (translation, summarization), beam search uses a heap to track the top-K most probable partial sequences at each decoding step. A min-heap of size K prunes low-probability beams efficiently.

📊

Top-K Recommendations

Recommendation engines score millions of items and need the top-K results. A min-heap of size K processes scores in O(n log K) — far faster than sorting all n items when K << n.

⚙

Priority Scheduling

ML training job schedulers use priority queues to manage GPU allocation. High-priority fine-tuning jobs preempt lower-priority hyperparameter sweeps, with heaps enabling O(log n) scheduling decisions.

📈

A* and Dijkstra's Algorithm

Graph-based ML (knowledge graphs, GNNs) often uses shortest-path algorithms. Both A* and Dijkstra rely on a min-heap to always expand the lowest-cost node first.

✅

Pro Tip: If a problem asks for "K largest" elements, use a min-heap of size K (not a max-heap). As you process elements, push each one and pop when the heap exceeds size K. The heap always contains the K largest seen so far, with the Kth largest at the root.

← Previous Course Overview Next → Top-K Elements