PROBABILITY Comprehensive Reference


COMPLETE PROBABILITY REFERENCE
Sample Spaces • Axioms • Counting • Conditional Prob • Bayes • Independence • Random Variables • Distributions • Joint Distributions • Covariance • MGF • LLN • CLT • Inequalities
1. Sample Spaces & Events
Sample Space $S$: All possible outcomes
Event $A \subseteq S$: Subset of outcomes
Elementary event: Single outcome
Set Operations
Union: $A \cup B$ (A or B)
Intersection: $A \cap B$ (A and B)
Complement: $A^c$ (not A)
De Morgan: $(A \cup B)^c = A^c \cap B^c$
Disjoint: $A \cap B = \emptyset$
Partition
$A_1, A_2, \ldots$ disjoint, $\bigcup A_i = S$
2. Axioms of Probability
Kolmogorov Axioms
$0 \le P(A) \le 1$ for all $A$
$P(S) = 1$
$P(\bigcup_{i=1}^{\infty} A_i) = \sum_{i=1}^{\infty} P(A_i)$
(disjoint events)
Consequences
$P(\emptyset) = 0$
$P(A^c) = 1 - P(A)$
$P(A \cup B) = P(A) + P(B) - P(A \cap B)$
If $A \subseteq B$ then $P(A) \le P(B)$
$P(A) = \sum_i P(A \cap B_i)$ (partition)
3. Counting Principles
Permutations
$P(n,k) = \frac{n!}{(n-k)!}$
Order matters, no replacement
Combinations
$C(n,k) = \binom{n}{k} = \frac{n!}{k!(n-k)!}$
Order doesn't matter
$\binom{n}{k} = \binom{n}{n-k}$
Multinomial
$\binom{n}{k_1,k_2,\ldots,k_r} = \frac{n!}{k_1! k_2! \cdots k_r!}$
Partition $n$ into $r$ groups
Multiplication Rule
$n_1 \times n_2 \times \cdots \times n_k$ outcomes
4. Conditional Probability
Definition
$P(A|B) = \frac{P(A \cap B)}{P(B)}$ if $P(B) > 0$
Multiplication Rule
$P(A \cap B) = P(A|B)P(B)$
$P(A_1 \cap \cdots \cap A_n)$
$= P(A_1)P(A_2|A_1)\cdots P(A_n|A_1 \cap \cdots \cap A_{n-1})$
Law of Total Prob
$P(A) = \sum_i P(A|B_i)P(B_i)$
where $B_i$ form partition of $S$
Properties
$P(\cdot|B)$ is a probability
$P(A|A) = 1$
$P(A^c|B) = 1 - P(A|B)$
5. Bayes' Theorem
Basic Form
$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$
Prior: $P(A)$
Likelihood: $P(B|A)$
Posterior: $P(A|B)$
Extended Form
$P(A_i|B) = \frac{P(B|A_i)P(A_i)}{\sum_j P(B|A_j)P(A_j)}$
Applications
Medical testing
Spam filtering
Classification
Bayesian inference
Normalization: $\sum P(A_i|B) = 1$
6. Independence
Definition
$P(A \cap B) = P(A)P(B)$
or $P(A|B) = P(A)$ if $P(B) > 0$
Multiple Events
Mutually independent:
$P(A_i \cap A_j) = P(A_i)P(A_j)$
$P(A_i \cap A_j \cap A_k) = P(A_i)P(A_j)P(A_k)$
All pairwise intersections
Properties
$A, B$ indep $\Rightarrow$ $A, B^c$ indep
$A, B$ disjoint & $P(A), P(B) > 0$ $\Rightarrow$ not indep
Independence $\ne$ disjointness
Conditional Indep
$P(A \cap B|C) = P(A|C)P(B|C)$
7. Random Variables
Definition
Function $X: S \to \mathbb{R}$
$X^{-1}(x)$ is an event
Discrete RV
Countable range
PMF: $p(x) = P(X = x)$
$\sum_x p(x) = 1$
Continuous RV
Uncountable range
PDF: $f(x) \ge 0$
$\int_{-\infty}^{\infty} f(x)dx = 1$
Mixed RV
Both discrete & continuous parts
8. Probability Mass Function
Definition
$p(x) = P(X = x)$
$p(x) \ge 0$ for all $x$
$\sum_x p(x) = 1$
Properties
$P(a \le X \le b) = \sum_{a \le x \le b} p(x)$
$P(X \in A) = \sum_{x \in A} p(x)$
Support
$\text{supp}(X) = \{x: p(x) > 0\}$
Graph: vertical bars at discrete values
9. Probability Density Function
Definition
$f(x) \ge 0$ for all $x$
$\int_{-\infty}^{\infty} f(x)dx = 1$
Properties
$P(a < X < b) = \int_a^b f(x)dx$
$P(X = x) = 0$ (point probability)
$P(X \le x) = P(X < x)$
PDF vs CDF
$f(x) = \frac{d}{dx}F(x)$
Area under curve = probability
10. Cumulative Distribution Function
Definition
$F(x) = P(X \le x)$
Properties
$F$ is non-decreasing
$\lim_{x \to -\infty} F(x) = 0$
$\lim_{x \to \infty} F(x) = 1$
$F$ is right-continuous
Using CDF
$P(X > x) = 1 - F(x)$
$P(a < X \le b) = F(b) - F(a)$
$P(X = x) = F(x) - F(x^-)$
Inverse CDF
Quantile: $F^{-1}(p) = \inf\{x: F(x) \ge p\}$
Median: $F^{-1}(0.5)$
11. Expected Value & Moments
Discrete
$E[X] = \sum_x x \cdot p(x)$
Continuous
$E[X] = \int_{-\infty}^{\infty} x f(x)dx$
Linearity of Expectation
$E[aX + b] = aE[X] + b$
$E[X + Y] = E[X] + E[Y]$
Higher Moments
$E[X^k] = \sum_x x^k p(x)$ (discrete)
$E[X^k] = \int x^k f(x)dx$ (continuous)
Law of Unconscious Stat
$E[g(X)] = \sum_x g(x)p(x)$
12. Variance & Standard Deviation
Definition
$\text{Var}(X) = E[(X - \mu)^2]$
$= E[X^2] - (E[X])^2$
Standard Deviation
$\sigma = \sqrt{\text{Var}(X)}$
Properties
$\text{Var}(aX + b) = a^2\text{Var}(X)$
$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$
$+ 2\text{Cov}(X,Y)$
If $X, Y$ indep: $\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y)$
Coefficient of Variation
$CV = \frac{\sigma}{\mu}$ (relative variability)
13. Common Discrete Distributions
Bernoulli $(p)$
$P(X=k) = p^k(1-p)^{1-k}$, $k \in \{0,1\}$
$E[X] = p$, $\text{Var}(X) = p(1-p)$
Binomial $B(n,p)$
$P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}$
$E[X] = np$, $\text{Var}(X) = np(1-p)$
Poisson $\text{Po}(\lambda)$
$P(X=k) = \frac{e^{-\lambda}\lambda^k}{k!}$
$E[X] = \lambda$, $\text{Var}(X) = \lambda$
Rare events, large $n$, small $p$
Geometric $\text{Geom}(p)$
$P(X=k) = (1-p)^{k-1}p$, $k \ge 1$
$E[X] = 1/p$, $\text{Var}(X) = (1-p)/p^2$
Trials until first success
13. Discrete Dists (cont'd)
Negative Binomial $NB(r,p)$
$P(X=k) = \binom{k-1}{r-1}p^r(1-p)^{k-r}$
$E[X] = r/p$, $\text{Var}(X) = r(1-p)/p^2$
Trials until $r$-th success
Hypergeometric
$P(X=k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}$
Sampling without replacement
$E[X] = nK/N$
Uniform Discrete
$P(X=k) = 1/n$, $k=1,\ldots,n$
$E[X] = (n+1)/2$, $\text{Var}(X) = (n^2-1)/12$
14. Common Continuous Distributions
Uniform $U(a,b)$
$f(x) = \frac{1}{b-a}$ on $[a,b]$
$E[X] = (a+b)/2$, $\text{Var}(X) = (b-a)^2/12$
Exponential $\text{Exp}(\lambda)$
$f(x) = \lambda e^{-\lambda x}$ for $x \ge 0$
$E[X] = 1/\lambda$, $\text{Var}(X) = 1/\lambda^2$
Memoryless: $P(X > s+t|X > s) = P(X > t)$
Normal $N(\mu, \sigma^2)$
$f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-(x-\mu)^2/(2\sigma^2)}$
$E[X] = \mu$, $\text{Var}(X) = \sigma^2$
Standard Normal: $Z = (X-\mu)/\sigma \sim N(0,1)$
14. Continuous Dists (cont'd)
Gamma $\Gamma(\alpha, \beta)$
$f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}$
$E[X] = \alpha/\beta$, $\text{Var}(X) = \alpha/\beta^2$
Beta $B(\alpha, \beta)$
$f(x) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1}$
$E[X] = \alpha/(\alpha+\beta)$
$x \in [0,1]$, flexible shape
Lognormal
$\ln(X) \sim N(\mu, \sigma^2)$
$E[X] = e^{\mu+\sigma^2/2}$
Right-skewed, positive support
Chi-square $\chi^2_k$
Sum of $k$ squared standard normals
$E[X] = k$, $\text{Var}(X) = 2k$
15. Joint Distributions
Joint PMF/PDF
$p(x,y) = P(X=x, Y=y)$ (discrete)
$f(x,y) \ge 0$, $\int\int f(x,y)dxdy = 1$ (cont.)
Marginal Distribution
$p_X(x) = \sum_y p(x,y)$ (discrete)
$f_X(x) = \int f(x,y)dy$ (continuous)
Conditional Distribution
$P(X=x|Y=y) = \frac{p(x,y)}{p_Y(y)}$
$f_{X|Y}(x|y) = \frac{f(x,y)}{f_Y(y)}$
Independence
$p(x,y) = p_X(x)p_Y(y)$
$f(x,y) = f_X(x)f_Y(y)$
16. Covariance & Correlation
Covariance
$\text{Cov}(X,Y) = E[(X-\mu_X)(Y-\mu_Y)]$
$= E[XY] - E[X]E[Y]$
Correlation
$\rho = \text{Corr}(X,Y) = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}$
$-1 \le \rho \le 1$
$\rho = 1$: perfect positive linear
$\rho = 0$: uncorrelated
Properties
Indep $\Rightarrow$ uncorrelated
Uncorrelated $\not\Rightarrow$ indep
Cov$(aX, bY) = ab \cdot$Cov$(X,Y)$
Cov$(X+Y, Z) = $Cov$(X,Z) + $Cov$(Y,Z)$
17. Moment Generating Functions
Definition
$M_X(t) = E[e^{tX}]$
Properties
$M_X(0) = 1$
$M_X^{(k)}(0) = E[X^k]$
$M_X'(0) = E[X]$
$M_X''(0) = E[X^2]$
Uniqueness
MGF uniquely determines distribution
Sums
If $X, Y$ indep: $M_{X+Y}(t) = M_X(t)M_Y(t)$
Characteristic Function
$\phi_X(t) = E[e^{itX}]$ (always exists)
18. Law of Large Numbers
Weak LLN
$\bar{X}_n \xrightarrow{P} \mu$ as $n \to \infty$
Convergence in probability
For any $\epsilon > 0$: $P(|\bar{X}_n - \mu| > \epsilon) \to 0$
Strong LLN
$\bar{X}_n \xrightarrow{a.s.} \mu$ as $n \to \infty$
Almost sure convergence
$P(\lim_{n \to \infty} \bar{X}_n = \mu) = 1$
Sample Mean
$\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$
Justifies empirical frequencies in practice
19. Central Limit Theorem
Statement
$\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0,1)$
as $n \to \infty$
Practical Form
$\bar{X}_n \approx N(\mu, \sigma^2/n)$ for large $n$
Conditions
i.i.d. samples
Finite mean $\mu$ and variance $\sigma^2$
Works for many distributions
Applications
Confidence intervals
Hypothesis testing
Normal approximation to binomial
Remarkable: distribution-free result
20. Probability Inequalities
Markov Inequality
$P(X \ge a) \le \frac{E[X]}{a}$
For $X \ge 0$, $a > 0$
Tail bound, often loose
Chebyshev Inequality
$P(|X - \mu| \ge k\sigma) \le 1/k^2$
$P(|X - \mu| > \epsilon) \le \sigma^2/\epsilon^2$
Tighter than Markov
Chernoff Bound
$P(X \ge a) \le e^{-ta}M_X(t)$
Exponential bound
Jensen Inequality
If $g$ convex: $E[g(X)] \ge g(E[X])$
If $g$ concave: $E[g(X)] \le g(E[X])$
Ex: $E[X^2] \ge (E[X])^2$