Intermediate

Probability Questions

12 essential probability interview questions and model answers covering conditional probability, Bayes theorem, combinatorics, expected value, and classic probability puzzles.

Q1: What is conditional probability? Give a practical example.

💡
Model Answer: Conditional probability is the probability of event A occurring given that event B has already occurred, written as P(A|B) = P(A ∩ B) / P(B). It captures how knowledge of one event changes the likelihood of another. Practical example: in an e-commerce platform, the probability of a user purchasing (A) given they added an item to cart (B) is much higher than the unconditional purchase probability. If P(purchase AND cart) = 0.03 and P(cart) = 0.10, then P(purchase|cart) = 0.30, or 30%. This is fundamentally different from the overall conversion rate, which might be only 3%. Conditional probability is the basis for recommendation engines, spam filters, and any system that updates predictions based on observed behavior.

Q2: State and explain Bayes' Theorem. When do you use it?

💡
Model Answer: Bayes' Theorem states: P(A|B) = P(B|A) × P(A) / P(B). It lets you reverse conditional probabilities — given how likely we are to observe evidence B if hypothesis A is true, we can calculate how likely A is given we observed B. The components are: P(A) is the prior (initial belief about A), P(B|A) is the likelihood (probability of observing the evidence given A is true), and P(A|B) is the posterior (updated belief after seeing B). Classic use case: medical testing. If a disease affects 1% of the population (prior = 0.01) and a test has 95% sensitivity (P(positive|disease) = 0.95) and 90% specificity (P(negative|no disease) = 0.90), then a positive test result only gives P(disease|positive) = (0.95 × 0.01) / (0.95 × 0.01 + 0.10 × 0.99) = 8.76%. This counterintuitive result demonstrates why base rates matter enormously.

Q3: You flip a fair coin 10 times. What is the probability of getting exactly 7 heads?

💡
Model Answer: This is a binomial probability problem. The number of heads in 10 fair coin flips follows a Binomial(n=10, p=0.5) distribution. P(X = 7) = C(10,7) × (0.5)^7 × (0.5)^3 = C(10,7) × (0.5)^10. C(10,7) = 10! / (7! × 3!) = 120. So P(X = 7) = 120 / 1024 = 0.1172, or approximately 11.7%. The key insight is recognizing this as a binomial setting: fixed number of independent trials, each with the same probability of success. In interviews, explicitly state the distribution you are using and why it applies — this shows structured thinking.

Q4: What is expected value and how is it used in data science?

💡
Model Answer: Expected value (E[X]) is the long-run average of a random variable over many repetitions. For a discrete random variable: E[X] = ∑ x × P(X = x). It represents the "center of mass" of a probability distribution. In data science, expected value is used everywhere: (1) Revenue forecasting — E[revenue] = P(purchase) × average order value. (2) A/B test analysis — comparing expected outcomes between treatment and control. (3) Decision analysis — choosing the action with the highest expected payoff. (4) Risk assessment — expected loss = P(fraud) × transaction amount. Important properties: E[aX + b] = aE[X] + b (linearity), and E[X + Y] = E[X] + E[Y] regardless of whether X and Y are independent. Linearity of expectation is an extremely powerful tool for solving probability puzzles.

Q5: What is the difference between permutations and combinations?

💡
Model Answer: Permutations count the number of ways to arrange items where order matters. P(n, k) = n! / (n-k)!. Example: how many ways can you assign gold, silver, and bronze medals to 10 athletes? P(10, 3) = 720. Combinations count the number of ways to choose items where order does not matter. C(n, k) = n! / (k! × (n-k)!). Example: how many ways can you choose 3 athletes for a team from 10? C(10, 3) = 120. The relationship: C(n, k) = P(n, k) / k!, because each combination corresponds to k! permutations. In data science, combinations appear in feature selection (how many ways to choose 5 features from 50) and in calculating binomial probabilities, while permutations appear in ranking problems and sequence analysis.

Q6: (Puzzle) You have two envelopes, each containing money. One has twice the amount of the other. You pick one and see $100. Should you switch?

💡
Model Answer: This is the classic Two Envelopes Paradox. The naive analysis says: if your envelope has $100, the other envelope has either $50 or $200 with equal probability, so the expected value of switching is (50 + 200)/2 = $125, which seems to favor switching. But this reasoning is flawed because it applies symmetrically to both envelopes — you would always want to switch, which is contradictory. The resolution: you cannot simultaneously assign equal probability to the other envelope being $50 and $200 without specifying a prior distribution over the amounts. With a proper Bayesian analysis using a prior over the smaller amount, the expected value of switching equals the expected value of staying. The key interview insight: this puzzle tests whether you can identify flawed probabilistic reasoning and think carefully about assumptions, not just compute expected values mechanically.

Q7: What is the Birthday Problem? How is it relevant to data science?

💡
Model Answer: The Birthday Problem asks: how many people do you need in a room before there is a greater than 50% chance that two people share a birthday? The surprisingly small answer is 23. The calculation works by computing the complement: P(all different) = (365/365) × (364/365) × (363/365) × ... × ((365-n+1)/365). At n=23, this product drops below 0.5. The key insight is that we are comparing pairs, and C(23, 2) = 253 pairs, each with a 1/365 chance of matching. In data science, this principle is relevant to: (1) hash collisions — when assigning users to experiment groups or generating unique IDs, collision probability is higher than intuition suggests, (2) duplicate detection in large datasets, (3) A/B test bucketing — understanding why overlap between experiments occurs more often than expected.

Q8: (Puzzle) Three doors, one has a prize. You pick door 1. The host opens door 3 (no prize). Should you switch to door 2?

💡
Model Answer: This is the Monty Hall Problem, and yes, you should always switch. Switching wins 2/3 of the time, while staying wins only 1/3. Here is why: initially, your door has a 1/3 chance and the other two doors collectively have a 2/3 chance. When the host (who knows where the prize is) opens a losing door, that 2/3 probability concentrates on the remaining unopened door. The host's action provides information because it is constrained — the host never opens the door with the prize. Using Bayes' theorem: P(prize behind door 2 | host opens door 3) = [P(host opens 3 | prize behind 2) × P(prize behind 2)] / P(host opens 3) = [1 × 1/3] / [1/2] = 2/3. This problem tests your ability to update probabilities given new information, which is exactly what Bayesian inference does in data science.

Q9: What is the Law of Total Probability?

💡
Model Answer: The Law of Total Probability states that if B₁, B₂, ..., Bₙ form a complete partition of the sample space (mutually exclusive and collectively exhaustive), then for any event A: P(A) = ∑ P(A|Bᵢ) × P(Bᵢ). It lets you compute the overall probability of an event by breaking it down into conditional probabilities across mutually exclusive scenarios. Example: to calculate overall conversion rate P(purchase), you can break it down by traffic source: P(purchase) = P(purchase|organic) × P(organic) + P(purchase|paid) × P(paid) + P(purchase|social) × P(social). This is useful because conversion rates differ dramatically by channel. The Law of Total Probability is also a key step in deriving Bayes' theorem (it provides the denominator) and is used extensively in decision trees and probabilistic models.

Q10: You roll two fair dice. What is the probability their sum is 7?

💡
Model Answer: There are 6 × 6 = 36 equally likely outcomes when rolling two dice. The combinations that sum to 7 are: (1,6), (2,5), (3,4), (4,3), (5,2), (6,1) — that is 6 favorable outcomes. So P(sum = 7) = 6/36 = 1/6 ≈ 16.7%. Note that 7 is the most likely sum (followed by 6 and 8 with 5 favorable outcomes each). The distribution of sums is triangular, peaking at 7. In an interview, you can also mention that the expected sum is E[X+Y] = E[X] + E[Y] = 3.5 + 3.5 = 7, which is consistent with 7 being the mode. This demonstrates linearity of expectation, one of the most useful properties in probability.

Q11: What is independence vs conditional independence?

💡
Model Answer: Two events A and B are independent if P(A ∩ B) = P(A) × P(B), meaning knowing one event tells you nothing about the other. Two events A and B are conditionally independent given C if P(A ∩ B | C) = P(A|C) × P(B|C), meaning they become independent once you know C. Importantly, independence does NOT imply conditional independence, and vice versa. Example: whether two users click an ad (A, B) might be independent. But if we condition on both being in the same geographic region (C), they might become dependent (shared local trends). Conversely, exam scores of two students might be dependent (reflecting course difficulty), but conditionally independent given their individual abilities. This concept is fundamental to Naive Bayes classifiers (which assume feature independence given the class label) and to Bayesian networks.

Q12: (Puzzle) You randomly draw cards from a standard 52-card deck. What is the expected number of cards you need to draw to get an ace?

💡
Model Answer: Think of it this way: there are 4 aces and 48 non-aces. By symmetry, the 4 aces divide the deck into 5 "gaps" (before the first ace, between aces, and after the last ace), and by the symmetry of random arrangements, each gap has an expected size of 48/5 = 9.6 non-aces. The expected position of the first ace is therefore 9.6 + 1 = 10.6, meaning you expect to draw about 10.6 cards. More formally: the expected position of the first ace out of 4 in a deck of 52 is (52 + 1) / (4 + 1) = 53/5 = 10.6. This uses the general formula: if there are k special items in n total items, the expected position of the first special item is (n+1)/(k+1). This type of symmetry argument appears frequently in probability interviews and is much faster than summing over all possible positions.
Pro Tip: For probability puzzles in interviews, always start by identifying the sample space and whether events are independent. State your approach before computing. If a problem seems hard, look for symmetry or try computing the complement — these shortcuts are what interviewers want to see.