Random Variables

From Sample Spaces to Measurable Functions

probability

foundations

Author

Universe Office

Published

April 4, 2026

Introduction

Suppose you roll two dice and want to know the total. The sample space has 36 outcomes — $(1,1), (1,2), \ldots, (6,6)$ — but you do not care which specific pair appears. You only care about the sum. That sum is a function that converts each outcome into a number: $(1,3) \mapsto 4$, $(5,6) \mapsto 11$, and so on.

This is the idea behind a random variable: a rule that assigns a real number to every outcome in the sample space $\Omega$. Why not work with $\Omega$ directly? Because $\Omega$ can be anything — coin labels, weather states, loan outcomes — but arithmetic requires numbers. A random variable is the bridge from abstract outcomes to the concrete world of sums, averages, and integrals.

Think of a random variable as a translator: it listens to what the probability space says (in its native language of outcomes) and writes it down as a number you can compute with.

The previous article established the probability space $(\Omega, \mathcal{F}, P)$. This article introduces the tool that makes that foundation quantitatively useful (Casella and Berger 2002; Wasserman 2004).

This article covers:

The formal definition of a random variable as a measurable function
Discrete vs. continuous random variables
PMF, PDF, and CDF
Transformations of random variables
Simulation and visual verification

From Dice Sums to the Formal Definition

Why We Need Random Variables

Consider the two-dice example. The sample space $\Omega = \{(i, j) : i, j \in \{1,\ldots,6\}\}$ has 36 elements. If you want the probability that the total is 7, you must count pairs: $(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)$. The random variable $X(\omega) = i + j$ maps each pair to its sum, letting you write $P(X = 7) = 6/36 = 1/6$ without enumerating outcomes every time.

This pattern repeats everywhere. In credit risk, you do not track every borrower attribute — you summarize default as $D = 1$ or $D = 0$. In physics, you measure a particle’s position, not the full microstate of the system. Random variables extract the numerical quantity you care about from the richer probability space underneath.

The Formal Definition

Definition (Random Variable; Casella & Berger, 2002)

A random variable is a measurable function $X : \Omega \to \mathbb{R}$. “Measurable” means that for every Borel set $B \subseteq \mathbb{R}$, the preimage $X^{-1}(B) = \{\omega \in \Omega : X(\omega) \in B\}$ belongs to $\mathcal{F}$.

In practice, the measurability condition ensures that you can always ask “what is the probability that $X$ falls in the interval $[a, b]$?” and get a well-defined answer. For finite or countable sample spaces, every function from $\Omega$ to $\mathbb{R}$ is automatically measurable. The condition only bites in continuous settings, where it rules out pathological functions that would break the probability machinery.

Discrete vs. Continuous

A random variable $X$ is discrete if it takes values in a countable set $\{x_1, x_2, \ldots\}$. It is continuous if its CDF is absolutely continuous — equivalently, if there exists a density function $f$ such that $P(a \le X \le b) = \int_a^b f(x)\,dx$ for all $a \le b$.

The intuition: a discrete random variable assigns probability to individual points (like landing on a specific face of a die). A continuous random variable spreads probability over intervals (like a spinner that can land anywhere on a circle).

Code

from scipy import stats

# Discrete: Binomial(n=20, p=0.3)
X_binom = stats.binom(n=20, p=0.3)
print(f"P(X = 6)  = {X_binom.pmf(6):.4f}")
print(f"P(X <= 6) = {X_binom.cdf(6):.4f}")

# Continuous: Normal(0, 1)
X_norm = stats.norm(0, 1)
print(f"f(0)      = {X_norm.pdf(0):.4f}")
print(f"P(X <= 0) = {X_norm.cdf(0):.4f}")

P(X = 6)  = 0.1916
P(X <= 6) = 0.6080
f(0)      = 0.3989
P(X <= 0) = 0.5000

Probability Mass Function and Probability Density Function

PMF (Discrete): Probability Lives on Points

For a discrete random variable, you can ask “what is the probability of exactly this value?” The answer is the probability mass function (PMF):

\[ p(x) = P(X = x) \]

The PMF satisfies $p(x) \ge 0$ for all $x$, and $\sum_x p(x) = 1$. Visually, a PMF is a bar chart — each bar’s height is the probability of that value.

PDF (Continuous): Probability Lives in Areas

For a continuous random variable, the probability of any single point is zero. Instead, probability comes from intervals. The probability density function (PDF) satisfies:

\[ P(a \le X \le b) = \int_a^b f(x)\,dx \]

Note that $f(x)$ is not a probability — it can exceed 1. What must equal 1 is the total area under the curve: $\int_{-\infty}^{\infty} f(x)\,dx = 1$. Visually, a PDF is a smooth curve, and probability is the area under the curve between two points.

Figure 1 shows six common distributions: three discrete (Bernoulli, Binomial, Poisson) and three continuous (Normal, Exponential, Uniform).

Figure 1: PMF and PDF of common distributions

Cumulative Distribution Function

The CDF unifies discrete and continuous cases into a single framework:

\[ F(x) = P(X \le x) \]

Every CDF has three properties (Casella & Berger, 2002):

Non-decreasing: $x_1 < x_2 \implies F(x_1) \le F(x_2)$
Right-continuous: $\lim_{h \to 0^+} F(x + h) = F(x)$
Limits: $\lim_{x \to -\infty} F(x) = 0$ and $\lim_{x \to \infty} F(x) = 1$

For discrete random variables, the CDF is a step function with jumps at each value. For continuous random variables, the PDF and CDF are related by:

\[ F(x) = \int_{-\infty}^{x} f(t)\,dt, \qquad f(x) = F'(x) \]

Figure 2 illustrates this relationship for $N(0, 1)$. The shaded area under the PDF up to $x = 1.0$ equals $F(1.0) \approx 0.841$.

Figure 2: Relationship between PDF and CDF for Normal(0,1)

Transformations of Random Variables

Given a random variable $X$ with known distribution, suppose you define $Y = g(X)$. What is the distribution of $Y$?

Discrete Case

If $X$ is discrete, collect all $x$ values that map to the same $y$:

\[ P(Y = y) = \sum_{\{x : g(x) = y\}} P(X = x) \]

Continuous Case (Change of Variables)

If $g$ is monotone and differentiable with inverse $g^{-1}$, the PDF of $Y$ is:

\[ f_Y(y) = f_X(g^{-1}(y)) \cdot \left| \frac{d}{dy} g^{-1}(y) \right| \]

Intuition first. The factor $|dg^{-1}/dy|$ — the Jacobian — corrects for how the transformation stretches or compresses the number line. If $g$ squeezes a wide interval of $x$ into a narrow interval of $y$, the density must increase to keep the total probability at 1. The Jacobian measures exactly that stretching factor.

Formally, the absolute value of the derivative of the inverse function accounts for the change in “length” of infinitesimal intervals under the transformation.

Example: $Y = X^2$, $X \sim N(0, 1)$

Since $g(x) = x^2$ is not monotone, split into $x > 0$ and $x < 0$ branches. The result is a $\chi^2(1)$ distribution:

\[ f_Y(y) = \frac{1}{\sqrt{2\pi y}} e^{-y/2}, \quad y > 0 \]

Figure 3 shows the simulated histogram of $Y = X^2$ overlaid with the theoretical $\chi^2(1)$ density.

Figure 3: Transformation: $X \sim N(0,1) \to Y = X^2 \sim \chi^2(1)$

Practical Example: Simulation and Verification

How do you check whether a simulation matches a theoretical distribution? Two standard tools:

Histogram overlay: compare the empirical density to the theoretical PDF
QQ plot: plot sample quantiles against theoretical quantiles — if the data follow the target distribution, the points lie on the 45-degree line

Figure 4 shows a QQ plot for 100,000 samples drawn from $N(0, 1)$. The Kolmogorov–Smirnov test provides a formal check.

Figure 4: QQ plot: 100,000 simulated $N(0,1)$ samples vs. theory

Summary and Connections

This article introduced random variables as the bridge from abstract probability spaces to numerical computation. The key takeaways:

A random variable converts abstract outcomes into numbers you can compute with — it is the reason we can use arithmetic in probability
Discrete random variables have PMFs (bar charts); continuous ones have PDFs (curves where probability = area)
The CDF unifies both cases and fully characterizes a distribution
Transformations follow the change-of-variables formula, where the Jacobian corrects for stretching
Simulation + QQ plots provide a practical way to verify distributional assumptions

Next: Expectation and Variance — once you have a random variable, the natural next question is “what is its average value, and how spread out is it?”

Application preview: In credit risk modeling, a borrower’s default is a Bernoulli random variable: $D \sim \text{Bernoulli}(PD)$, where $PD$ is the probability of default. The number of defaults in a portfolio is Binomial (under independence) or follows more complex distributions when correlation is present. The tools introduced here are the starting point for every risk model.

References

Casella, George, and Roger L. Berger. 2002. Statistical Inference. 2nd ed. Cengage Learning.

Wasserman, Larry. 2004. All of Statistics: A Concise Course in Statistical Inference. Springer. https://doi.org/10.1007/978-0-387-21736-9.

--- title: "Random Variables" subtitle: "From Sample Spaces to Measurable Functions" author: "Universe Office" date: 2026-04-04 categories: [probability, foundations] bibliography: references.bib format: html: code-fold: true toc: true --- ## Introduction Suppose you roll two dice and want to know the total. The sample space has 36 outcomes --- $(1,1), (1,2), \ldots, (6,6)$ --- but you do not care which specific pair appears. You only care about the sum. That sum is a function that converts each outcome into a number: $(1,3) \mapsto 4$, $(5,6) \mapsto 11$, and so on. This is the idea behind a **random variable**: a rule that assigns a real number to every outcome in the sample space $\Omega$. Why not work with $\Omega$ directly? Because $\Omega$ can be anything --- coin labels, weather states, loan outcomes --- but arithmetic requires numbers. A random variable is the bridge from abstract outcomes to the concrete world of sums, averages, and integrals. Think of a random variable as a **translator**: it listens to what the probability space says (in its native language of outcomes) and writes it down as a number you can compute with. The [previous article](../probability-space/index.qmd) established the probability space $(\Omega, \mathcal{F}, P)$. This article introduces the tool that makes that foundation quantitatively useful [@casella2002; @wasserman2004]. This article covers: - The formal definition of a random variable as a measurable function - Discrete vs. continuous random variables - PMF, PDF, and CDF - Transformations of random variables - Simulation and visual verification ## From Dice Sums to the Formal Definition ### Why We Need Random Variables Consider the two-dice example. The sample space $\Omega = \{(i, j) : i, j \in \{1,\ldots,6\}\}$ has 36 elements. If you want the probability that the total is 7, you must count pairs: $(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)$. The random variable $X(\omega) = i + j$ maps each pair to its sum, letting you write $P(X = 7) = 6/36 = 1/6$ without enumerating outcomes every time. This pattern repeats everywhere. In credit risk, you do not track every borrower attribute --- you summarize default as $D = 1$ or $D = 0$. In physics, you measure a particle's position, not the full microstate of the system. Random variables extract the numerical quantity you care about from the richer probability space underneath. ### The Formal Definition ::: {.callout-note} ## Definition (Random Variable; Casella & Berger, 2002) A **random variable** is a measurable function $X : \Omega \to \mathbb{R}$. "Measurable" means that for every Borel set $B \subseteq \mathbb{R}$, the preimage $X^{-1}(B) = \{\omega \in \Omega : X(\omega) \in B\}$ belongs to $\mathcal{F}$. ::: In practice, the measurability condition ensures that you can always ask "what is the probability that $X$ falls in the interval $[a, b]$?" and get a well-defined answer. For finite or countable sample spaces, every function from $\Omega$ to $\mathbb{R}$ is automatically measurable. The condition only bites in continuous settings, where it rules out pathological functions that would break the probability machinery. ## Discrete vs. Continuous A random variable $X$ is **discrete** if it takes values in a countable set $\{x_1, x_2, \ldots\}$. It is **continuous** if its CDF is absolutely continuous --- equivalently, if there exists a density function $f$ such that $P(a \le X \le b) = \int_a^b f(x)\,dx$ for all $a \le b$. The intuition: a discrete random variable assigns probability to individual points (like landing on a specific face of a die). A continuous random variable spreads probability over intervals (like a spinner that can land anywhere on a circle). ```{python} #| label: discrete-continuous-example from scipy import stats # Discrete: Binomial(n=20, p=0.3) X_binom = stats.binom(n=20, p=0.3) print(f"P(X = 6) = {X_binom.pmf(6):.4f}") print(f"P(X <= 6) = {X_binom.cdf(6):.4f}") # Continuous: Normal(0, 1) X_norm = stats.norm(0, 1) print(f"f(0) = {X_norm.pdf(0):.4f}") print(f"P(X <= 0) = {X_norm.cdf(0):.4f}") ``` ## Probability Mass Function and Probability Density Function ### PMF (Discrete): Probability Lives on Points For a discrete random variable, you can ask "what is the probability of *exactly* this value?" The answer is the **probability mass function** (PMF): $$ p(x) = P(X = x) $$ The PMF satisfies $p(x) \ge 0$ for all $x$, and $\sum_x p(x) = 1$. Visually, a PMF is a **bar chart** --- each bar's height is the probability of that value. ### PDF (Continuous): Probability Lives in Areas For a continuous random variable, the probability of any single point is zero. Instead, probability comes from *intervals*. The **probability density function** (PDF) satisfies: $$ P(a \le X \le b) = \int_a^b f(x)\,dx $$ Note that $f(x)$ is *not* a probability --- it can exceed 1. What must equal 1 is the total area under the curve: $\int_{-\infty}^{\infty} f(x)\,dx = 1$. Visually, a PDF is a **smooth curve**, and probability is the **area under the curve** between two points. @fig-pmf-pdf shows six common distributions: three discrete (Bernoulli, Binomial, Poisson) and three continuous (Normal, Exponential, Uniform). ![PMF and PDF of common distributions](figures/fig1_pmf_pdf.png){#fig-pmf-pdf} ## Cumulative Distribution Function The **CDF** unifies discrete and continuous cases into a single framework: $$ F(x) = P(X \le x) $$ Every CDF has three properties (Casella & Berger, 2002): 1. **Non-decreasing**: $x_1 < x_2 \implies F(x_1) \le F(x_2)$ 2. **Right-continuous**: $\lim_{h \to 0^+} F(x + h) = F(x)$ 3. **Limits**: $\lim_{x \to -\infty} F(x) = 0$ and $\lim_{x \to \infty} F(x) = 1$ For discrete random variables, the CDF is a step function with jumps at each value. For continuous random variables, the PDF and CDF are related by: $$ F(x) = \int_{-\infty}^{x} f(t)\,dt, \qquad f(x) = F'(x) $$ @fig-cdf illustrates this relationship for $N(0, 1)$. The shaded area under the PDF up to $x = 1.0$ equals $F(1.0) \approx 0.841$. ![Relationship between PDF and CDF for Normal(0,1)](figures/fig2_cdf.png){#fig-cdf} ## Transformations of Random Variables Given a random variable $X$ with known distribution, suppose you define $Y = g(X)$. What is the distribution of $Y$? ### Discrete Case If $X$ is discrete, collect all $x$ values that map to the same $y$: $$ P(Y = y) = \sum_{\{x : g(x) = y\}} P(X = x) $$ ### Continuous Case (Change of Variables) If $g$ is monotone and differentiable with inverse $g^{-1}$, the PDF of $Y$ is: $$ f_Y(y) = f_X(g^{-1}(y)) \cdot \left| \frac{d}{dy} g^{-1}(y) \right| $$ **Intuition first.** The factor $|dg^{-1}/dy|$ --- the **Jacobian** --- corrects for how the transformation stretches or compresses the number line. If $g$ squeezes a wide interval of $x$ into a narrow interval of $y$, the density must increase to keep the total probability at 1. The Jacobian measures exactly that stretching factor. **Formally**, the absolute value of the derivative of the inverse function accounts for the change in "length" of infinitesimal intervals under the transformation. ### Example: $Y = X^2$, $X \sim N(0, 1)$ Since $g(x) = x^2$ is not monotone, split into $x > 0$ and $x < 0$ branches. The result is a $\chi^2(1)$ distribution: $$ f_Y(y) = \frac{1}{\sqrt{2\pi y}} e^{-y/2}, \quad y > 0 $$ @fig-transformation shows the simulated histogram of $Y = X^2$ overlaid with the theoretical $\chi^2(1)$ density. ![Transformation: $X \sim N(0,1) \to Y = X^2 \sim \chi^2(1)$](figures/fig3_transformation.png){#fig-transformation} ## Practical Example: Simulation and Verification How do you check whether a simulation matches a theoretical distribution? Two standard tools: 1. **Histogram overlay**: compare the empirical density to the theoretical PDF 2. **QQ plot**: plot sample quantiles against theoretical quantiles --- if the data follow the target distribution, the points lie on the 45-degree line @fig-qq-plot shows a QQ plot for 100,000 samples drawn from $N(0, 1)$. The Kolmogorov--Smirnov test provides a formal check. ![QQ plot: 100,000 simulated $N(0,1)$ samples vs. theory](figures/fig4_qq_plot.png){#fig-qq-plot} ## Summary and Connections This article introduced random variables as the bridge from abstract probability spaces to numerical computation. The key takeaways: - A random variable converts abstract outcomes into numbers you can compute with --- it is the reason we can use arithmetic in probability - Discrete random variables have PMFs (bar charts); continuous ones have PDFs (curves where probability = area) - The CDF unifies both cases and fully characterizes a distribution - Transformations follow the change-of-variables formula, where the Jacobian corrects for stretching - Simulation + QQ plots provide a practical way to verify distributional assumptions **Next**: [Expectation and Variance](../expectation-variance/index.qmd) --- once you have a random variable, the natural next question is "what is its average value, and how spread out is it?" **Application preview**: In credit risk modeling, a borrower's default is a Bernoulli random variable: $D \sim \text{Bernoulli}(PD)$, where $PD$ is the probability of default. The number of defaults in a portfolio is Binomial (under independence) or follows more complex distributions when correlation is present. The tools introduced here are the starting point for every risk model. ## References ::: {#refs} :::