Non-Comparison Sorts: Counting, Radix, Bucket
Comparison-based sorting has a proven lower bound of O(n log n). But when your data has special properties — bounded integers, fixed-length keys, or uniform distributions — non-comparison sorts achieve O(n) time. These algorithms are critical for feature binning, histogram construction, and high-throughput data processing in ML.
1. Counting Sort
Counting sort works by counting the occurrences of each value and using cumulative counts to place elements directly. It runs in O(n + k) time where k is the range of input values.
| Property | Value |
|---|---|
| Time | O(n + k) where k = range of values |
| Space | O(n + k) |
| Stable? | Yes (when implemented correctly) |
| Constraint | Only works for non-negative integers (or values mappable to non-negative integers) |
def counting_sort(arr: list, max_val: int = None) -> list:
"""Stable counting sort for non-negative integers.
Time: O(n + k) where k = max value
Space: O(n + k)
How it works:
1. Count occurrences of each value
2. Compute cumulative counts (prefix sum)
3. Place elements in output using cumulative counts
"""
if not arr:
return []
if max_val is None:
max_val = max(arr)
# Step 1: Count occurrences
count = [0] * (max_val + 1)
for val in arr:
count[val] += 1
# Step 2: Cumulative count (prefix sum)
# count[i] now = number of elements <= i
for i in range(1, len(count)):
count[i] += count[i - 1]
# Step 3: Build output array (iterate backwards for stability)
output = [0] * len(arr)
for i in range(len(arr) - 1, -1, -1):
val = arr[i]
count[val] -= 1
output[count[val]] = val
return output
# Example: sort user ratings (0-5 scale)
ratings = [3, 1, 4, 1, 5, 2, 3, 5, 3, 2, 1, 4]
sorted_ratings = counting_sort(ratings, max_val=5)
print(sorted_ratings) # [1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5]
ML Application: Feature Binning with Counting Sort
Counting sort is perfect for discretizing continuous features into bins. When you convert float features to integer bin indices, counting sort processes them in O(n) instead of O(n log n).
import math
def bin_features(values: list, num_bins: int) -> dict:
"""Bin continuous features into equal-width bins using counting sort.
ML use case: Feature discretization for gradient boosting trees
(XGBoost, LightGBM use histogram-based binning internally).
Time: O(n + num_bins)
Space: O(n + num_bins)
"""
if not values:
return {}
min_val, max_val = min(values), max(values)
bin_width = (max_val - min_val) / num_bins if max_val != min_val else 1
# Map each value to a bin index
bin_indices = []
for v in values:
idx = min(int((v - min_val) / bin_width), num_bins - 1)
bin_indices.append(idx)
# Count sort the bin indices
count = [0] * num_bins
for idx in bin_indices:
count[idx] += 1
# Build histogram
histogram = {}
for i in range(num_bins):
bin_start = min_val + i * bin_width
bin_end = bin_start + bin_width
histogram[f"[{bin_start:.2f}, {bin_end:.2f})"] = count[i]
return histogram
# Example: bin model confidence scores
scores = [0.12, 0.45, 0.67, 0.89, 0.23, 0.91, 0.34, 0.78, 0.56, 0.95]
hist = bin_features(scores, num_bins=5)
for bin_range, count in hist.items():
print(f" {bin_range}: {count}")
# Output:
# [0.12, 0.29): 2
# [0.29, 0.45): 2
# [0.45, 0.62): 2
# [0.62, 0.78): 1
# [0.78, 0.95): 3
2. Radix Sort
Radix sort processes digits from least significant to most significant, using a stable sort (usually counting sort) as a subroutine for each digit position. It runs in O(d(n + k)) where d is the number of digits and k is the base.
| Property | Value |
|---|---|
| Time | O(d(n + k)) where d = digits, k = base (10 for decimal) |
| Space | O(n + k) |
| Stable? | Yes |
| Constraint | Works for integers or fixed-length strings |
def radix_sort(arr: list) -> list:
"""LSD Radix Sort for non-negative integers.
Time: O(d * (n + 10)) where d = number of digits
Space: O(n + 10)
Processes digits from least significant to most significant,
using counting sort as a subroutine for each digit.
"""
if not arr:
return []
max_val = max(arr)
result = arr.copy()
exp = 1 # Current digit position (1s, 10s, 100s, ...)
while max_val // exp > 0:
result = counting_sort_by_digit(result, exp)
exp *= 10
return result
def counting_sort_by_digit(arr: list, exp: int) -> list:
"""Stable counting sort on a specific digit position.
exp=1 sorts by ones digit, exp=10 by tens, etc.
"""
n = len(arr)
output = [0] * n
count = [0] * 10 # Digits 0-9
# Count occurrences of each digit
for val in arr:
digit = (val // exp) % 10
count[digit] += 1
# Cumulative count
for i in range(1, 10):
count[i] += count[i - 1]
# Build output (backwards for stability)
for i in range(n - 1, -1, -1):
digit = (arr[i] // exp) % 10
count[digit] -= 1
output[count[digit]] = arr[i]
return output
# Example: sort user IDs
user_ids = [170, 45, 75, 90, 802, 24, 2, 66]
sorted_ids = radix_sort(user_ids)
print(sorted_ids) # [2, 24, 45, 66, 75, 90, 170, 802]
ML Application: Sorting Embedding Indices
def sort_retrieval_results(doc_ids: list, scores: list) -> list:
"""Sort retrieval results by quantized score using radix sort.
In retrieval systems, scores are often quantized to integers
(e.g., BM25 scores * 1000). Radix sort handles millions of
results faster than comparison sort.
Time: O(d * n) where d = digits in max score
Space: O(n)
"""
# Pair (quantized_score, doc_id) and sort by score
n = len(doc_ids)
paired = list(zip(scores, doc_ids))
# Sort by score using radix sort on quantized values
max_score = max(scores) if scores else 0
exp = 1
while max_score // exp > 0:
# Counting sort by current digit
output = [None] * n
count = [0] * 10
for score, doc_id in paired:
digit = (score // exp) % 10
count[digit] += 1
for i in range(1, 10):
count[i] += count[i - 1]
for i in range(n - 1, -1, -1):
digit = (paired[i][0] // exp) % 10
count[digit] -= 1
output[count[digit]] = paired[i]
paired = output
exp *= 10
# Return doc_ids sorted by score (ascending)
return [(doc_id, score) for score, doc_id in paired]
# Example: BM25 retrieval results
doc_ids = [101, 205, 303, 412, 509, 678]
bm25_scores = [2340, 1876, 3102, 987, 2340, 1543]
results = sort_retrieval_results(doc_ids, bm25_scores)
print("Sorted by BM25 score:")
for doc_id, score in results:
print(f" Doc {doc_id}: {score}")
3. Bucket Sort
Bucket sort distributes elements into buckets based on their value range, sorts each bucket individually, then concatenates. It achieves O(n) average time when elements are uniformly distributed.
| Property | Value |
|---|---|
| Average Time | O(n + k) when data is uniformly distributed |
| Worst Time | O(n²) when all elements land in one bucket |
| Space | O(n + k) where k = number of buckets |
| Stable? | Yes (if inner sort is stable) |
def bucket_sort(arr: list, num_buckets: int = 10) -> list:
"""Bucket sort for floating-point numbers in [0, 1).
Time: O(n) average when data is uniformly distributed
Space: O(n + num_buckets)
Perfect for sorting probability scores, normalized features,
and other float data in [0, 1) range.
"""
if not arr:
return []
# Create empty buckets
buckets = [[] for _ in range(num_buckets)]
# Distribute elements into buckets
for val in arr:
bucket_idx = min(int(val * num_buckets), num_buckets - 1)
buckets[bucket_idx].append(val)
# Sort each bucket (insertion sort for small buckets)
for bucket in buckets:
bucket.sort() # Small buckets, so O(k^2) is fine
# Concatenate all buckets
result = []
for bucket in buckets:
result.extend(bucket)
return result
# Example: sort model prediction probabilities
probabilities = [0.78, 0.12, 0.95, 0.34, 0.67, 0.89, 0.23, 0.56, 0.91, 0.45]
sorted_probs = bucket_sort(probabilities)
print(sorted_probs)
# [0.12, 0.23, 0.34, 0.45, 0.56, 0.67, 0.78, 0.89, 0.91, 0.95]
Generalized Bucket Sort for Any Range
def bucket_sort_general(arr: list, num_buckets: int = 10) -> list:
"""Generalized bucket sort for any numeric range.
Maps values to [0, 1) range then applies bucket sort.
Time: O(n) average
Space: O(n + num_buckets)
"""
if len(arr) <= 1:
return arr.copy()
min_val = min(arr)
max_val = max(arr)
if min_val == max_val:
return arr.copy()
range_val = max_val - min_val
buckets = [[] for _ in range(num_buckets)]
for val in arr:
# Normalize to [0, 1) then map to bucket
normalized = (val - min_val) / range_val
bucket_idx = min(int(normalized * num_buckets), num_buckets - 1)
buckets[bucket_idx].append(val)
result = []
for bucket in buckets:
bucket.sort()
result.extend(bucket)
return result
# ML Application: Sort feature values for percentile computation
feature_values = [23.5, 67.8, 12.1, 89.3, 45.6, 34.2, 78.9, 56.7, 91.0, 15.4]
sorted_features = bucket_sort_general(feature_values)
n = len(sorted_features)
percentiles = {
"p25": sorted_features[n // 4],
"p50": sorted_features[n // 2],
"p75": sorted_features[3 * n // 4],
"p90": sorted_features[int(0.9 * n)]
}
print(f"Feature percentiles: {percentiles}")
When to Use Non-Comparison Sorts
| Algorithm | Use When | ML Application |
|---|---|---|
| Counting Sort | Small integer range (k << n²), need stability | Sorting categorical features, label distribution, rating histograms |
| Radix Sort | Fixed-length integers, strings, or keys | Sorting embedding indices, IP addresses, document IDs at scale |
| Bucket Sort | Uniformly distributed floats or continuous data | Sorting probability scores, normalized features, percentile computation |
- Non-comparison sorts break the O(n log n) barrier by exploiting data structure (integers, bounded range, uniform distribution)
- Counting sort is ideal when k (value range) is O(n) — directly applicable to categorical features and small integer data
- Radix sort handles large integers by processing one digit at a time — used in database index sorting and log processing
- Bucket sort is perfect for uniformly distributed floats — exactly the distribution of probability scores from sigmoid/softmax outputs
- XGBoost and LightGBM use histogram-based binning internally, which is essentially counting sort on discretized features
Lilly Tech Systems