Beginner

Heap Fundamentals

A heap is a complete binary tree stored as an array that satisfies the heap property: every parent is smaller (min-heap) or larger (max-heap) than its children. Heaps power priority queues, enabling O(log n) insertion and O(1) access to the min/max element — essential for top-K selection, beam search decoding, and priority scheduling in ML systems.

What Is a Heap?

A binary heap is a complete binary tree where each node satisfies the heap property:

  • Min-heap: Every parent ≤ its children. The root is the minimum element.
  • Max-heap: Every parent ≥ its children. The root is the maximum element.

Because the tree is complete, we store it compactly in an array. For a node at index i:

  • Left child: 2*i + 1
  • Right child: 2*i + 2
  • Parent: (i - 1) // 2

Python's heapq Module

Python's heapq module implements a min-heap. It operates on a regular list and provides these core operations:

import heapq

# Create a heap from a list - O(n)
nums = [5, 3, 8, 1, 2, 7]
heapq.heapify(nums)
print(nums)  # [1, 2, 7, 5, 3, 8] - min-heap order

# Push an element - O(log n)
heapq.heappush(nums, 0)
print(nums[0])  # 0 - new minimum

# Pop the smallest element - O(log n)
smallest = heapq.heappop(nums)
print(smallest)  # 0

# Push and pop in one operation - O(log n)
result = heapq.heappushpop(nums, 4)  # push 4, then pop smallest

# Pop and push in one operation - O(log n)
result = heapq.heapreplace(nums, 6)  # pop smallest, then push 6

# Get n largest/smallest - O(n log k)
top3 = heapq.nlargest(3, nums)
bottom3 = heapq.nsmallest(3, nums)

Simulating a Max-Heap in Python

Since Python only provides a min-heap, we negate values to simulate a max-heap:

import heapq

# Max-heap by negating values
max_heap = []
for val in [5, 3, 8, 1, 2]:
    heapq.heappush(max_heap, -val)

# Pop the largest element
largest = -heapq.heappop(max_heap)
print(largest)  # 8

# Peek at the largest
print(-max_heap[0])  # 5

Heap with Custom Objects

Use tuples (priority, tiebreaker, value) for custom ordering. The tiebreaker prevents comparison errors when priorities are equal:

import heapq

# Priority queue with (priority, counter, task)
counter = 0
pq = []

tasks = [("Write code", 3), ("Fix bug", 1), ("Review PR", 2)]
for task, priority in tasks:
    heapq.heappush(pq, (priority, counter, task))
    counter += 1

while pq:
    priority, _, task = heapq.heappop(pq)
    print(f"Priority {priority}: {task}")
# Priority 1: Fix bug
# Priority 2: Review PR
# Priority 3: Write code

Complexity Summary

OperationTimeNotes
heapify()O(n)Build heap from unsorted list
heappush()O(log n)Insert element maintaining heap
heappop()O(log n)Remove and return smallest
heap[0]O(1)Peek at smallest
nlargest(k, heap)O(n log k)Better than sorting when k << n
nsmallest(k, heap)O(n log k)Better than sorting when k << n

When to Use a Heap

Signal in ProblemHeap PatternExample
"K largest" or "K smallest"Top-K with min/max heapKth largest element
"Merge K sorted"Min-heap of K elementsMerge K sorted lists
"Median" or "middle element"Two heaps (max + min)Find median from stream
"Schedule" or "interval"Min-heap on end timesMeeting rooms II
"Next greater/smaller"Lazy deletion heapSliding window maximum

ML Applications of Heaps

🔍

Beam Search Decoding

In NLP sequence-to-sequence models (translation, summarization), beam search uses a heap to track the top-K most probable partial sequences at each decoding step. A min-heap of size K prunes low-probability beams efficiently.

📊

Top-K Recommendations

Recommendation engines score millions of items and need the top-K results. A min-heap of size K processes scores in O(n log K) — far faster than sorting all n items when K << n.

Priority Scheduling

ML training job schedulers use priority queues to manage GPU allocation. High-priority fine-tuning jobs preempt lower-priority hyperparameter sweeps, with heaps enabling O(log n) scheduling decisions.

📈

A* and Dijkstra's Algorithm

Graph-based ML (knowledge graphs, GNNs) often uses shortest-path algorithms. Both A* and Dijkstra rely on a min-heap to always expand the lowest-cost node first.

Pro Tip: If a problem asks for "K largest" elements, use a min-heap of size K (not a max-heap). As you process elements, push each one and pop when the heap exceeds size K. The heap always contains the K largest seen so far, with the Kth largest at the root.