Intermediate

Top-K Elements

The Top-K pattern is the most common heap pattern in interviews. The core idea: maintain a heap of size K to efficiently track the K largest or smallest elements from a stream or array. This is the exact mechanism behind ML beam search (top-K beams) and recommendation engines (top-K items).

Problem 1: Kth Largest Element in an Array

📝
Problem: Given an integer array nums and an integer k, return the kth largest element. Note that it is the kth largest element in sorted order, not the kth distinct element.

Example: nums = [3,2,1,5,6,4], k = 2 → Output: 5
import heapq

def findKthLargest(nums: list[int], k: int) -> int:
    """Find the kth largest element using a min-heap of size k.

    Strategy: Maintain a min-heap of size k. After processing all
    elements, the root of the heap is the kth largest.

    Time: O(n log k) - each push/pop is O(log k), done n times
    Space: O(k) - heap stores at most k elements

    ML Application: In beam search, this finds the k-th best
    candidate sequence at each decoding step.
    """
    min_heap = []
    for num in nums:
        heapq.heappush(min_heap, num)
        if len(min_heap) > k:
            heapq.heappop(min_heap)  # remove smallest
    return min_heap[0]  # kth largest is the smallest in our k-size heap

# Test
print(findKthLargest([3, 2, 1, 5, 6, 4], 2))  # 5
print(findKthLargest([3, 2, 3, 1, 2, 4, 5, 5, 6], 4))  # 4

Problem 2: Top K Frequent Elements

📝
Problem: Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order.

Example: nums = [1,1,1,2,2,3], k = 2 → Output: [1, 2]
import heapq
from collections import Counter

def topKFrequent(nums: list[int], k: int) -> list[int]:
    """Find k most frequent elements using a min-heap of size k.

    Strategy:
    1. Count frequencies with Counter
    2. Use a min-heap of size k keyed by frequency
    3. Elements remaining in heap are the top-k most frequent

    Time: O(n + m log k) where m = unique elements
    Space: O(m + k) for counter and heap

    ML Application: Finding top-K most common tokens in a corpus
    for vocabulary construction in NLP models.
    """
    freq = Counter(nums)
    min_heap = []

    for num, count in freq.items():
        heapq.heappush(min_heap, (count, num))
        if len(min_heap) > k:
            heapq.heappop(min_heap)

    return [num for count, num in min_heap]

# Test
print(topKFrequent([1, 1, 1, 2, 2, 3], 2))  # [2, 1]
print(topKFrequent([1], 1))  # [1]

Problem 3: K Closest Points to Origin

📝
Problem: Given an array of points where points[i] = [x, y] represents a point on the X-Y plane and an integer k, return the k closest points to the origin (0, 0).

Example: points = [[1,3],[-2,2]], k = 1 → Output: [[-2,2]]
import heapq

def kClosest(points: list[list[int]], k: int) -> list[list[int]]:
    """Find k closest points to origin using a max-heap of size k.

    Strategy: Use a max-heap (negate distances) of size k.
    Points with larger distances get evicted, leaving the k closest.

    We skip the sqrt since we only need relative ordering.

    Time: O(n log k)
    Space: O(k)

    ML Application: K-nearest neighbors (KNN) classification
    uses this exact pattern to find the K closest training points.
    """
    max_heap = []  # max-heap via negation

    for x, y in points:
        dist = -(x * x + y * y)  # negate for max-heap
        heapq.heappush(max_heap, (dist, x, y))
        if len(max_heap) > k:
            heapq.heappop(max_heap)  # remove farthest point

    return [[x, y] for _, x, y in max_heap]

# Test
print(kClosest([[1, 3], [-2, 2]], 1))  # [[-2, 2]]
print(kClosest([[3, 3], [5, -1], [-2, 4]], 2))  # [[3,3],[-2,4]]

Problem 4: Sort Characters by Frequency

📝
Problem: Given a string s, sort it in decreasing order based on the frequency of the characters. If multiple answers are possible, return any of them.

Example: s = "tree" → Output: "eert"
import heapq
from collections import Counter

def frequencySort(s: str) -> str:
    """Sort characters by frequency using a max-heap.

    Strategy:
    1. Count character frequencies
    2. Push all (neg_freq, char) into a max-heap
    3. Pop all and build result string

    Time: O(n log k) where k = unique characters
    Space: O(n) for the result string

    ML Application: Feature importance ranking - sorting
    features by their importance scores in tree-based models.
    """
    freq = Counter(s)
    # Max-heap: negate frequency for descending order
    max_heap = [(-count, char) for char, count in freq.items()]
    heapq.heapify(max_heap)

    result = []
    while max_heap:
        neg_count, char = heapq.heappop(max_heap)
        result.append(char * (-neg_count))

    return ''.join(result)

# Test
print(frequencySort("tree"))    # "eert" or "eetr"
print(frequencySort("cccaaa"))  # "aaaccc" or "cccaaa"
print(frequencySort("Aabb"))    # "bbAa" or "bbaA"

Problem 5: Reorganize String

📝
Problem: Given a string s, rearrange the characters so that no two adjacent characters are the same. If not possible, return an empty string.

Example: s = "aab" → Output: "aba"
import heapq
from collections import Counter

def reorganizeString(s: str) -> str:
    """Reorganize string so no two adjacent chars are the same.

    Strategy:
    1. Count frequencies; if any char > (n+1)//2, impossible
    2. Use a max-heap to always place the most frequent char
    3. Alternate: place top char, hold it aside, place next top

    Time: O(n log k) where k = unique characters
    Space: O(n)

    ML Application: Task scheduling in ML pipelines where
    identical GPU-bound tasks should not run back-to-back
    to avoid memory thrashing.
    """
    freq = Counter(s)
    max_heap = [(-count, char) for char, count in freq.items()]
    heapq.heapify(max_heap)

    result = []
    prev_count, prev_char = 0, ''

    while max_heap:
        neg_count, char = heapq.heappop(max_heap)
        # Re-add previous character if it still has remaining count
        if prev_count < 0:
            heapq.heappush(max_heap, (prev_count, prev_char))

        result.append(char)
        prev_count = neg_count + 1  # used one occurrence
        prev_char = char

    result_str = ''.join(result)
    return result_str if len(result_str) == len(s) else ""

# Test
print(reorganizeString("aab"))    # "aba"
print(reorganizeString("aaab"))   # "" (impossible)
print(reorganizeString("aabb"))   # "abab" or "baba"
Top-K Pattern Summary: For "K largest", use a min-heap of size K. For "K smallest", use a max-heap of size K. This gives O(n log K) time and O(K) space — optimal when K << n. In beam search, K is the beam width (typically 3-10), making this extremely efficient even over large vocabularies.