Intermediate
Top-K Elements
The Top-K pattern is the most common heap pattern in interviews. The core idea: maintain a heap of size K to efficiently track the K largest or smallest elements from a stream or array. This is the exact mechanism behind ML beam search (top-K beams) and recommendation engines (top-K items).
Problem 1: Kth Largest Element in an Array
Problem: Given an integer array
Example:
nums and an integer k, return the kth largest element. Note that it is the kth largest element in sorted order, not the kth distinct element.Example:
nums = [3,2,1,5,6,4], k = 2 → Output: 5import heapq
def findKthLargest(nums: list[int], k: int) -> int:
"""Find the kth largest element using a min-heap of size k.
Strategy: Maintain a min-heap of size k. After processing all
elements, the root of the heap is the kth largest.
Time: O(n log k) - each push/pop is O(log k), done n times
Space: O(k) - heap stores at most k elements
ML Application: In beam search, this finds the k-th best
candidate sequence at each decoding step.
"""
min_heap = []
for num in nums:
heapq.heappush(min_heap, num)
if len(min_heap) > k:
heapq.heappop(min_heap) # remove smallest
return min_heap[0] # kth largest is the smallest in our k-size heap
# Test
print(findKthLargest([3, 2, 1, 5, 6, 4], 2)) # 5
print(findKthLargest([3, 2, 3, 1, 2, 4, 5, 5, 6], 4)) # 4
Problem 2: Top K Frequent Elements
Problem: Given an integer array
Example:
nums and an integer k, return the k most frequent elements. You may return the answer in any order.Example:
nums = [1,1,1,2,2,3], k = 2 → Output: [1, 2]import heapq
from collections import Counter
def topKFrequent(nums: list[int], k: int) -> list[int]:
"""Find k most frequent elements using a min-heap of size k.
Strategy:
1. Count frequencies with Counter
2. Use a min-heap of size k keyed by frequency
3. Elements remaining in heap are the top-k most frequent
Time: O(n + m log k) where m = unique elements
Space: O(m + k) for counter and heap
ML Application: Finding top-K most common tokens in a corpus
for vocabulary construction in NLP models.
"""
freq = Counter(nums)
min_heap = []
for num, count in freq.items():
heapq.heappush(min_heap, (count, num))
if len(min_heap) > k:
heapq.heappop(min_heap)
return [num for count, num in min_heap]
# Test
print(topKFrequent([1, 1, 1, 2, 2, 3], 2)) # [2, 1]
print(topKFrequent([1], 1)) # [1]
Problem 3: K Closest Points to Origin
Problem: Given an array of
Example:
points where points[i] = [x, y] represents a point on the X-Y plane and an integer k, return the k closest points to the origin (0, 0).Example:
points = [[1,3],[-2,2]], k = 1 → Output: [[-2,2]]import heapq
def kClosest(points: list[list[int]], k: int) -> list[list[int]]:
"""Find k closest points to origin using a max-heap of size k.
Strategy: Use a max-heap (negate distances) of size k.
Points with larger distances get evicted, leaving the k closest.
We skip the sqrt since we only need relative ordering.
Time: O(n log k)
Space: O(k)
ML Application: K-nearest neighbors (KNN) classification
uses this exact pattern to find the K closest training points.
"""
max_heap = [] # max-heap via negation
for x, y in points:
dist = -(x * x + y * y) # negate for max-heap
heapq.heappush(max_heap, (dist, x, y))
if len(max_heap) > k:
heapq.heappop(max_heap) # remove farthest point
return [[x, y] for _, x, y in max_heap]
# Test
print(kClosest([[1, 3], [-2, 2]], 1)) # [[-2, 2]]
print(kClosest([[3, 3], [5, -1], [-2, 4]], 2)) # [[3,3],[-2,4]]
Problem 4: Sort Characters by Frequency
Problem: Given a string
Example:
s, sort it in decreasing order based on the frequency of the characters. If multiple answers are possible, return any of them.Example:
s = "tree" → Output: "eert"import heapq
from collections import Counter
def frequencySort(s: str) -> str:
"""Sort characters by frequency using a max-heap.
Strategy:
1. Count character frequencies
2. Push all (neg_freq, char) into a max-heap
3. Pop all and build result string
Time: O(n log k) where k = unique characters
Space: O(n) for the result string
ML Application: Feature importance ranking - sorting
features by their importance scores in tree-based models.
"""
freq = Counter(s)
# Max-heap: negate frequency for descending order
max_heap = [(-count, char) for char, count in freq.items()]
heapq.heapify(max_heap)
result = []
while max_heap:
neg_count, char = heapq.heappop(max_heap)
result.append(char * (-neg_count))
return ''.join(result)
# Test
print(frequencySort("tree")) # "eert" or "eetr"
print(frequencySort("cccaaa")) # "aaaccc" or "cccaaa"
print(frequencySort("Aabb")) # "bbAa" or "bbaA"
Problem 5: Reorganize String
Problem: Given a string
Example:
s, rearrange the characters so that no two adjacent characters are the same. If not possible, return an empty string.Example:
s = "aab" → Output: "aba"import heapq
from collections import Counter
def reorganizeString(s: str) -> str:
"""Reorganize string so no two adjacent chars are the same.
Strategy:
1. Count frequencies; if any char > (n+1)//2, impossible
2. Use a max-heap to always place the most frequent char
3. Alternate: place top char, hold it aside, place next top
Time: O(n log k) where k = unique characters
Space: O(n)
ML Application: Task scheduling in ML pipelines where
identical GPU-bound tasks should not run back-to-back
to avoid memory thrashing.
"""
freq = Counter(s)
max_heap = [(-count, char) for char, count in freq.items()]
heapq.heapify(max_heap)
result = []
prev_count, prev_char = 0, ''
while max_heap:
neg_count, char = heapq.heappop(max_heap)
# Re-add previous character if it still has remaining count
if prev_count < 0:
heapq.heappush(max_heap, (prev_count, prev_char))
result.append(char)
prev_count = neg_count + 1 # used one occurrence
prev_char = char
result_str = ''.join(result)
return result_str if len(result_str) == len(s) else ""
# Test
print(reorganizeString("aab")) # "aba"
print(reorganizeString("aaab")) # "" (impossible)
print(reorganizeString("aabb")) # "abab" or "baba"
Top-K Pattern Summary: For "K largest", use a min-heap of size K. For "K smallest", use a max-heap of size K. This gives O(n log K) time and O(K) space — optimal when K << n. In beam search, K is the beam width (typically 3-10), making this extremely efficient even over large vocabularies.
Lilly Tech Systems