Probability – Complete Reference Guide

1. Sample Spaces & Events

Sample Space $S$: All possible outcomes

Event $A \subseteq S$: Subset of outcomes

Elementary event: Single outcome

Set Operations

Union: $A \cup B$ (A or B)

Intersection: $A \cap B$ (A and B)

Complement: $A^c$ (not A)

De Morgan: $(A \cup B)^c = A^c \cap B^c$

Disjoint: $A \cap B = \emptyset$

Partition

$A_1, A_2, \ldots$ disjoint, $\bigcup A_i = S$

2. Axioms of Probability

Kolmogorov Axioms

$0 \le P(A) \le 1$ for all $A$

$P(S) = 1$

$P(\bigcup_{i=1}^{\infty} A_i) = \sum_{i=1}^{\infty} P(A_i)$
(disjoint events)

Consequences

$P(\emptyset) = 0$

$P(A^c) = 1 - P(A)$

$P(A \cup B) = P(A) + P(B) - P(A \cap B)$

If $A \subseteq B$ then $P(A) \le P(B)$

$P(A) = \sum_i P(A \cap B_i)$ (partition)

3. Counting Principles

Permutations

$P(n,k) = \frac{n!}{(n-k)!}$

Order matters, no replacement

Combinations

$C(n,k) = \binom{n}{k} = \frac{n!}{k!(n-k)!}$

Order doesn't matter

$\binom{n}{k} = \binom{n}{n-k}$

Multinomial

$\binom{n}{k_1,k_2,\ldots,k_r} = \frac{n!}{k_1! k_2! \cdots k_r!}$

Partition $n$ into $r$ groups

Multiplication Rule

$n_1 \times n_2 \times \cdots \times n_k$ outcomes

4. Conditional Probability

Definition

$P(A|B) = \frac{P(A \cap B)}{P(B)}$ if $P(B) > 0$

Multiplication Rule

$P(A \cap B) = P(A|B)P(B)$

$P(A_1 \cap \cdots \cap A_n)$
$= P(A_1)P(A_2|A_1)\cdots P(A_n|A_1 \cap \cdots \cap A_{n-1})$

Law of Total Prob

$P(A) = \sum_i P(A|B_i)P(B_i)$

where $B_i$ form partition of $S$

Properties

$P(\cdot|B)$ is a probability

$P(A|A) = 1$

$P(A^c|B) = 1 - P(A|B)$

5. Bayes' Theorem

Basic Form

$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$

Prior: $P(A)$

Likelihood: $P(B|A)$

Posterior: $P(A|B)$

Extended Form

$P(A_i|B) = \frac{P(B|A_i)P(A_i)}{\sum_j P(B|A_j)P(A_j)}$

Applications

Medical testing

Spam filtering

Classification

Bayesian inference

Normalization: $\sum P(A_i|B) = 1$

6. Independence

Definition

$P(A \cap B) = P(A)P(B)$

or $P(A|B) = P(A)$ if $P(B) > 0$

Multiple Events

Mutually independent:

$P(A_i \cap A_j) = P(A_i)P(A_j)$

$P(A_i \cap A_j \cap A_k) = P(A_i)P(A_j)P(A_k)$

All pairwise intersections

Properties

$A, B$ indep $\Rightarrow$ $A, B^c$ indep

$A, B$ disjoint & $P(A), P(B) > 0$ $\Rightarrow$ not indep

Independence $\ne$ disjointness

Conditional Indep

$P(A \cap B|C) = P(A|C)P(B|C)$

7. Random Variables

Definition

Function $X: S \to \mathbb{R}$

$X^{-1}(x)$ is an event

Discrete RV

Countable range

PMF: $p(x) = P(X = x)$

$\sum_x p(x) = 1$

Continuous RV

Uncountable range

PDF: $f(x) \ge 0$

$\int_{-\infty}^{\infty} f(x)dx = 1$

Mixed RV

Both discrete & continuous parts

8. Probability Mass Function

Definition

$p(x) = P(X = x)$

$p(x) \ge 0$ for all $x$

$\sum_x p(x) = 1$

Properties

$P(a \le X \le b) = \sum_{a \le x \le b} p(x)$

$P(X \in A) = \sum_{x \in A} p(x)$

Support

$\text{supp}(X) = \{x: p(x) > 0\}$

Graph: vertical bars at discrete values

9. Probability Density Function

Definition

$f(x) \ge 0$ for all $x$

$\int_{-\infty}^{\infty} f(x)dx = 1$

Properties

$P(a < X < b) = \int_a^b f(x)dx$

$P(X = x) = 0$ (point probability)

$P(X \le x) = P(X < x)$

PDF vs CDF

$f(x) = \frac{d}{dx}F(x)$

Area under curve = probability

10. Cumulative Distribution Function

Definition

$F(x) = P(X \le x)$

Properties

$F$ is non-decreasing

$\lim_{x \to -\infty} F(x) = 0$

$\lim_{x \to \infty} F(x) = 1$

$F$ is right-continuous

Using CDF

$P(X > x) = 1 - F(x)$

$P(a < X \le b) = F(b) - F(a)$

$P(X = x) = F(x) - F(x^-)$

Inverse CDF

Quantile: $F^{-1}(p) = \inf\{x: F(x) \ge p\}$

Median: $F^{-1}(0.5)$

11. Expected Value & Moments

Discrete

$E[X] = \sum_x x \cdot p(x)$

Continuous

$E[X] = \int_{-\infty}^{\infty} x f(x)dx$

Linearity of Expectation

$E[aX + b] = aE[X] + b$

$E[X + Y] = E[X] + E[Y]$

Higher Moments

$E[X^k] = \sum_x x^k p(x)$ (discrete)

$E[X^k] = \int x^k f(x)dx$ (continuous)

Law of Unconscious Stat

$E[g(X)] = \sum_x g(x)p(x)$

12. Variance & Standard Deviation

Definition

$\text{Var}(X) = E[(X - \mu)^2]$

$= E[X^2] - (E[X])^2$

Standard Deviation

$\sigma = \sqrt{\text{Var}(X)}$

Properties

$\text{Var}(aX + b) = a^2\text{Var}(X)$

$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$
$+ 2\text{Cov}(X,Y)$

If $X, Y$ indep: $\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y)$

Coefficient of Variation

$CV = \frac{\sigma}{\mu}$ (relative variability)

13. Common Discrete Distributions

Bernoulli $(p)$

$P(X=k) = p^k(1-p)^{1-k}$, $k \in \{0,1\}$

$E[X] = p$, $\text{Var}(X) = p(1-p)$

Binomial $B(n,p)$

$P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}$

$E[X] = np$, $\text{Var}(X) = np(1-p)$

Poisson $\text{Po}(\lambda)$

$P(X=k) = \frac{e^{-\lambda}\lambda^k}{k!}$

$E[X] = \lambda$, $\text{Var}(X) = \lambda$

Rare events, large $n$, small $p$

Geometric $\text{Geom}(p)$

$P(X=k) = (1-p)^{k-1}p$, $k \ge 1$

$E[X] = 1/p$, $\text{Var}(X) = (1-p)/p^2$

Trials until first success

13. Discrete Dists (cont'd)

Negative Binomial $NB(r,p)$

$P(X=k) = \binom{k-1}{r-1}p^r(1-p)^{k-r}$

$E[X] = r/p$, $\text{Var}(X) = r(1-p)/p^2$

Trials until $r$-th success

Hypergeometric

$P(X=k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}$

Sampling without replacement

$E[X] = nK/N$

Uniform Discrete

$P(X=k) = 1/n$, $k=1,\ldots,n$

$E[X] = (n+1)/2$, $\text{Var}(X) = (n^2-1)/12$

14. Common Continuous Distributions

Uniform $U(a,b)$

$f(x) = \frac{1}{b-a}$ on $[a,b]$

$E[X] = (a+b)/2$, $\text{Var}(X) = (b-a)^2/12$

Exponential $\text{Exp}(\lambda)$

$f(x) = \lambda e^{-\lambda x}$ for $x \ge 0$

$E[X] = 1/\lambda$, $\text{Var}(X) = 1/\lambda^2$

Memoryless: $P(X > s+t|X > s) = P(X > t)$

Normal $N(\mu, \sigma^2)$

$f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-(x-\mu)^2/(2\sigma^2)}$

$E[X] = \mu$, $\text{Var}(X) = \sigma^2$

Standard Normal: $Z = (X-\mu)/\sigma \sim N(0,1)$

14. Continuous Dists (cont'd)

Gamma $\Gamma(\alpha, \beta)$

$f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}$

$E[X] = \alpha/\beta$, $\text{Var}(X) = \alpha/\beta^2$

Beta $B(\alpha, \beta)$

$f(x) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1}$

$E[X] = \alpha/(\alpha+\beta)$

$x \in [0,1]$, flexible shape

Lognormal

$\ln(X) \sim N(\mu, \sigma^2)$

$E[X] = e^{\mu+\sigma^2/2}$

Right-skewed, positive support

Chi-square $\chi^2_k$

Sum of $k$ squared standard normals

$E[X] = k$, $\text{Var}(X) = 2k$

15. Joint Distributions

Joint PMF/PDF

$p(x,y) = P(X=x, Y=y)$ (discrete)

$f(x,y) \ge 0$, $\int\int f(x,y)dxdy = 1$ (cont.)

Marginal Distribution

$p_X(x) = \sum_y p(x,y)$ (discrete)

$f_X(x) = \int f(x,y)dy$ (continuous)

Conditional Distribution

$P(X=x|Y=y) = \frac{p(x,y)}{p_Y(y)}$

$f_{X|Y}(x|y) = \frac{f(x,y)}{f_Y(y)}$

Independence

$p(x,y) = p_X(x)p_Y(y)$

$f(x,y) = f_X(x)f_Y(y)$

16. Covariance & Correlation

Covariance

$\text{Cov}(X,Y) = E[(X-\mu_X)(Y-\mu_Y)]$

$= E[XY] - E[X]E[Y]$

Correlation

$\rho = \text{Corr}(X,Y) = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}$

$-1 \le \rho \le 1$

$\rho = 1$: perfect positive linear

$\rho = 0$: uncorrelated

Properties

Indep $\Rightarrow$ uncorrelated

Uncorrelated $\not\Rightarrow$ indep

Cov$(aX, bY) = ab \cdot$Cov$(X,Y)$

Cov$(X+Y, Z) = $Cov$(X,Z) + $Cov$(Y,Z)$

17. Moment Generating Functions

Definition

$M_X(t) = E[e^{tX}]$

Properties

$M_X(0) = 1$

$M_X^{(k)}(0) = E[X^k]$

$M_X'(0) = E[X]$

$M_X''(0) = E[X^2]$

Uniqueness

MGF uniquely determines distribution

Sums

If $X, Y$ indep: $M_{X+Y}(t) = M_X(t)M_Y(t)$

Characteristic Function

$\phi_X(t) = E[e^{itX}]$ (always exists)

18. Law of Large Numbers

Weak LLN

$\bar{X}_n \xrightarrow{P} \mu$ as $n \to \infty$

Convergence in probability

For any $\epsilon > 0$: $P(|\bar{X}_n - \mu| > \epsilon) \to 0$

Strong LLN

$\bar{X}_n \xrightarrow{a.s.} \mu$ as $n \to \infty$

Almost sure convergence

$P(\lim_{n \to \infty} \bar{X}_n = \mu) = 1$

Sample Mean

$\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$

Justifies empirical frequencies in practice

19. Central Limit Theorem

Statement

$\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0,1)$

as $n \to \infty$

Practical Form

$\bar{X}_n \approx N(\mu, \sigma^2/n)$ for large $n$

Conditions

i.i.d. samples

Finite mean $\mu$ and variance $\sigma^2$

Works for many distributions

Applications

Confidence intervals

Hypothesis testing

Normal approximation to binomial

Remarkable: distribution-free result

20. Probability Inequalities

Markov Inequality

$P(X \ge a) \le \frac{E[X]}{a}$

For $X \ge 0$, $a > 0$

Tail bound, often loose

Chebyshev Inequality

$P(|X - \mu| \ge k\sigma) \le 1/k^2$

$P(|X - \mu| > \epsilon) \le \sigma^2/\epsilon^2$

Tighter than Markov

Chernoff Bound

$P(X \ge a) \le e^{-ta}M_X(t)$

Exponential bound

Jensen Inequality

If $g$ convex: $E[g(X)] \ge g(E[X])$

If $g$ concave: $E[g(X)] \le g(E[X])$

Ex: $E[X^2] \ge (E[X])^2$

PROBABILITY Comprehensive Reference