Statistics for Economists and Intelligent Data Analysts


Designed by Professor Sasha Shapoval
University of Lodz
Previous: Bayesian approach Review of probabilities Next: hypothesis testing

Overview

We recall properties of random variable related to the expectation time, Poisson processes, and order statistics. They will be used as soon as we turn to estimates. The end of this part covers the influential Kolmogorov-Smirnov statistics.

Exponential distribution

$\newcommand{\Prob}[1]{\mathsf{P}\{#1\}}$

Exponential waiting time

Let $T$ be the moment of the first success in a sequence of Bernoulli trials with the probability of success $p_{\delta}$, where the trials occur at time moments $\delta$, $2\delta$, $\ldots$. Then $\Prob{T > n\delta} = (1-p_{\delta})^n$. The expected number of trials before the first success is $1/p_{\delta}$ (why?). Therefore, the expected waiting time is $\delta/p_{\delta}$. The continuous case is obtained through the tendency of $\delta$ to zero. A reasonable assumption regarding other parameters is that the expected waiting time is conserved: $\delta/p_{\delta} \to a$. Then

$$ \Prob{T > n\delta} = e^{n\ln(1-p_{\delta})} \sim e^{-np_{\delta}} \to e^{\frac{-n\delta}{a}}. $$

Eventually,

$$ \Prob{T > t} = e^{-t/a}. $$

In other words, the random variable $T$ has the exponential distribution with the mean $a$.

$\newcommand{\Prob}[1]{\mathsf{P}\{#1\}}$

Markovian Property

Let $U(t) = \Prob{T > t}$ be the complement distribution function. It may be interpreted as the probability at birth that the lifetime exceeds $t$. The conditional probability that the residual lifetime exceeds $t$, given that the lifetime is at least $s$, $\Prob{T > s+t | T > s} = U(s+t) / U(s)$. It coincides with the unconditional probability to live at least $t$ time units, if and only if

$$ U(s+t) = U(s)U(t). \quad\quad \textrm{(MARKOV)} $$

Equation (MARKOV) holds for the exponential distribution.

Prove the inverse statement: if equation (MARKOV) holds then the distribution is exponential.

The lack of memory is known as the \emph{Markov} property of the distribution.

$\newcommand{\Prob}[1]{\mathsf{P}\{#1\}}$

Convolution

Let $X$ and $Y$ be independent random variables. Their cumulative distribution functions are $F$ and $G$ and the probability densities are $f$ and $g$ respectively. Then we define $Z = X+Y$. Let $H$ be the cumulative distribution function function of $Z$. Then

\begin{equation*} H(s) = \Prob { X+Y < s } = \sum_y \Prob{Y\approx y \text{ and } X < s-y} = \int_{-\infty}^{\infty} F(s-y) g(y) dy. \end{equation*}

Differentiating,

$$ h(s) = \int_{-\infty}^{\infty} f(s-y) g(y) dy. $$

The probability density function $h(s)$ is obtained with the convolution of $f$ and $g$: $h = f*g$.

$\newcommand{\Prob}[1]{\mathsf{P}\{#1\}}$

Gamma Distribution
Theorem:
If $X_1$, $\ldots$, $X_n$ are mutually independent random variables with the exponential distribution, then their sum $X_1 + \ldots + X_n$ has a density $g_n$ and distribution function $G_n$ given by \begin{equation*} g_n(x) = a \frac{(ax)^{n-1}}{(n-1)!}e^{-ax}. \quad x > 0, \quad\quad \mathrm{(Gamma-Density)} \end{equation*} \begin{equation*} G_n(x) = 1-e^{-ax} \left( 1 + \frac{ax}{1!} + \ldots + \frac{(ax)^{n-1}}{(n-1)!}\right), \quad x > 0. \quad\quad \mathrm{(gamma-cdf)} \end{equation*}
Proof:
Induction. $n=1$: the assertion coincides with the definition of the exponential distribution. $n \to n+1$: The density $g_{n+1}(t)$ is given by the convolution: $$ g_{n+1}(t) = \int_0^t g_1(t-x)g_n(x) dx = \frac{a^{n}}{(n-1)!} \int_0^t a x^{n-1} e^{-a(t-x)} e^{-ax} \, dx $$ $$ = \frac{a^{n}}{(n-1)!}e^{-at} \int_0^t a x^{n-1} \, dx = a\frac{(at)^{n}}{(n)!}e^{-at}. $$ Equation (Gamma-Density) is proved. Prove (gamma-cdf).

$\newcommand{\Prob}[1]{\mathsf{P}\{#1\}}$

Waiting Time and Poisson Distribution

Let $X_1$, $\ldots$, $X_n$ be independent random variables with the common exponential distribution. Put, $S_0 = 0$, \begin{equation*} S_n = X_1 + \ldots + X_n, \quad n = 1, 2, \ldots \end{equation*} We introduce a family of new random variables $N(t)$ as follows: $N(t)$ is the number of indices $k \ge 1$ such that $S_k \le t$. The event $\{N(t) = n\}$ occurs if and only if $S_n \le t$ and $S_{n+1} > t$. Therefore,

$$ \Prob{N(t) = n} = G_{n}(t) - G_{n+1}(t) = \frac{(at)^n}{n!}e^{-at}. $$

(To prove, equate two events: $\{S_n \le t < S_{n+1} \,\,$ OR $\,\, S_{n+1} \le t\} = \{S_n \le t\}$). In words, the random variable $N(t)$ has a Poisson distribution with the expectation $at$. To repeat, $N(t)$ is the number of events' occurrence over the interval $[0, t]$, where the waiting time of events is given by the exponential distribution.

$\newcommand{\Prob}[1]{\mathsf{P}\{#1\}}$ $\newcommand{\Exp}{\mathbf{E}}$

Persistence of bad luck. Records

Let $X_0$ be my waiting time (or financial loss) of some random event. The results of friends are $X_1$, $X_2$, $\ldots$, where $X_0$, $X_1$, $X_2$, $\ldots$, are random variables with a common distribution.

Measure of my bad luck is the time required to my friends to get the result that is worse than mine. Formally, let $N$ be the value of the first subscript $n$ such that $X_n > X_0$. Then the event $\{N > n-1\}$ means that $X_0$ is the maximal term among $X_0$, $X_1$, $X_2$, $\ldots$, $X_{n-1}$. For reason of symmetry, the probability of this event is $n^{-1}$: $\Prob{N > n-1} = 1/n$. As $\{N = n\} = \{N > n-1\} \setminus \{N > n\}$,

$$ \Prob{N = n} = \frac{1}{n} - \frac{1}{n+1} = \frac{1}{n(n+1)}, \quad \Exp N = +\infty. $$
Persistence of bad luck. Ratios

Let $X$ and $Y$ be two independent random variables with a common exponential distribution; $Z = Y/X$. We denote $R$ and $r$ the cdf and pdf of $Z$.

$$ R(t) = \int_0^{+\infty} \Prob{Y \le tx} ae^{-ax} dx = \int_0^{+\infty} \big(1-e^{-atx}\big) ae^{-ax} dx = \frac{t}{1+t}. $$$$ r(t) = \frac{1}{(1+t)^2}. $$

Mind that $\Exp Z = +\infty$.

Task:
Create samples with the values taken from the observation of the random variable $Z$. Find the mean of each sample. Sketch a graph that displays the dependence of the mean on the volume of the sample. Do you observe some regularities in the graph?

Waiting Time and Order Statistics

Notation: ordered tuple $(x_1, x_2, \ldots, x_n)$ is written as

$$ (x_{(1)}, x_{(2)}, \ldots, x_{(n)}), \quad\text{where}\quad x_{(1)} \le x_{(2)} \le \ldots \le x_{(n)}. $$

Further,

$$ (X_{(1)}, X_{(2)}, \ldots, X_{(n)}), $$

is the ordered values of the random variables $(X_1, X_2, \ldots, X_n)$. Let $X_1$, $\ldots$, $X_n$ be independent random variables with the common exponential distribution. Then the event $\{X_{(1)} > t\}$ means that all values are larger than $t$:

$$ \Prob{X_{(1)} > t} = (1-F(t))^n = e^{-nat}. $$

At the epoch $X_{(1)}$, the waiting time for the next event is $X_{(2)}-X_{(1)}$ which is the time for the first event among {\color{red} $n-1$ events}. Because of the lack of memory this waiting time follows the same distribution but with $n-1$ random variables:

$$ \Prob{X_{(2)} - X_{(1)} > t} = (1-F(t))^{n-1} = e^{-(n-1)at}. $$
Theorem:
The cumulative distribution function of the random variable $X_{(k+1)} - X_{(k)}$ is $1-e^{-(n-k)at}$.
Corollary:
$$ \Exp(X_{(\nu)}) = \frac{1}{a} \left( \frac{1}{n} + \frac{1}{n-1} + \ldots + \frac{1}{n-\nu+1}\right). $$
Theorem:
$$ \Prob{X_{(k)} \le t} = \sum_{j=k}^n \binom{n}{j} (1-e^{-at})^j \big(e^{-at}\big)^{n-j}. $$
Proof:
The event $\{X_{(k)} \le t\}$ means that at least $k$ among $n$ variables $X_j$ are not greater than $t$. This represents at least $k$ successes in $n$ independent trials.
Uniform Distribution and Order Statistics

Let $X_1$, $\ldots$, $X_n$ be independent random variables that are uniformly distributed on $(0, 1)$.

Theorem:
$$ \Prob{X_{(k)} < t} = \sum_{j=k}^n \binom{n}{j} t^{j}(1-t)^{n-j} $$
Proof:
Each term in the sum means that there are $j$ successes $n-j$ failures in the Bernoulli trials, where success is the inequality $X_i < t$ for each $i$, which numbers the trials. The probability of the success is $t$.
Limit Theorems

Verify that $\Exp(X_{(1)}) = 1/(n+1)$. This defines $1/(n+1)$ as a unit of measure. Let $n$ tend to $\infty$. Then

$$ \Prob{n X_{(1)} > t} = \left( 1-\frac{t}{n} \right)^n \to e^{-t}. $$

In words, $X_{(1)}$ is exponentially distributed with expectation $n^{-1}$. Further exploiting the equation of the probability density of $X_{(k)}$, we get

$$ \Prob{n X_{(2)} > t} = \left( 1-\frac{t}{n} \right)^n +\binom{n}{1}\frac{t}{n} \left( 1-\frac{t}{n} \right)^{n-1} \to e^{-t} + te^{-t}. $$

As $n\to\infty$, the distribution of $nX_{(k)}$ tends to the gamma distribution $G_k$. Then if $X_1$, $\ldots$, $X_n$ determines the partition of $[0,1]$ onto $n+1$ intervals, then $X_{(k)}$ is the sum of the length of the first $k$ intervals. Our computation shows that the length of the successive intervals of the partition behaves in the limit as if they were mutually independent exponentially distributed variables.

Empirical Distributions and Kolmogorov-Smirnov Test

Definition:
The empirical distribution function $F_n$ of $n$ points $a_1$, $\ldots$, $a_n$ on the line is the step function with jumps $1/n$ at $a_1$, $\ldots$, $a_n$.

The definition yields that $F_n(x; a_1,\ldots,a_n)$ is the fraction of points $a_1$, $\ldots$, $a_n$ that are less than $x$.

Definition:
Given $n$ random variables $X_1$, $X_2$, $\ldots$, $X_n$, their values at a particular point of the sample space form an $n$-tuple of numbers and its empirical distribution function is called the empirical sample distribution.

For each $x$, the value $\mathbf{F}_n(x)$ of the empirical sample distribution defines a new random variable as the fraction of values of the random variables $X_1$, $X_2$, $\ldots$, $X_n$ that are less than $x$: $$ \mathbf{F}_n(x) = \frac{1}{n}\cdot\# \{X_j : X_j \le x, j = 1, \ldots, n\}. $$

We will assume today that $X_1$, $X_2$, $\ldots$, $X_n$ are mutually independent random variables with a common continuous distribution $F$. For fixed $x$ the number of variables $X_k$ with $X_k < x$ has a binomial distribution with the probability of success $p = F(x)$. Therefore, the random variable $\mathbf{F}_n(x)$ has a binomial distribution with possible values $0$, $1/n$, $\ldots$, $1$.

KS statistics

More interesting, is the deviation of the whole graph of $\mathbf{F}_n(x)$, which depends on the realization of random variables, from the graph of $F(x)$. Put,

$$ D_n = \sup_{x} |\mathbf{F}_n(x) - F(x)|. $$
Theorem:
Let $F$ be a continuous function. Then the probability distribution of the random variable $D_n$ is independent of $F$.
Proof:
1. $Y_k = F(X_k)$ is uniformly distributed over $[0, 1]$.
Indeed, let $G(y)$ is the probability distribution function of $Y_k$. Then \[ G(t) = \Prob{Y_k < t} = \Prob{X_k < F^{-1}(t)} = F(F^{-1}(t)) = t \]
2. Transformation to the uniform distribution.
We denote $\mathbf{G}(t)$ the empirical distribution of the variables $G_1$, $\ldots$, $G_n$. As the fraction of random variables among $Y_1$, $\ldots$, $Y_n$ with values that are less than $t$ is the same as the fraction of random variables among $X_1$, $\ldots$, $X_n$ with values that are less than $F^{-1}(t)$, the random variables $\mathbf{G}(t)$ and $\mathbf{F}(F^{-1}(t))$ are identical. By the definition of the inverse functions, $|\mathbf{G}(t) - t| = \mathbf{F}(F^{-1}(t)) - F(F^{-1}(t))|$. Finally, if $x = F^{-1}(t)$, then we can continue \[ \sup_t |\mathbf{G}(t) - t| = \sup_x |\mathbf{F}(F^{-1}(t)) - F(F^{-1}(t))| = D_n \]
KS statistics for the pair of samples
  • Let $X_1$, $\ldots$, $X_n$, $X'_1$, $\ldots$, $X'_n$, be $2n$ mutually independent random variables with the common continuous distribution $F$, and denote the empirical distributions $(X_1, \ldots, X_n)$ and $(X'_1, \ldots, X'_n)$ by $\mathbf{F}_n$ and $\mathbf{F}'_n$ respectively.

  • Put,

$$ D_{n,n} = \sup_x |\mathbf{F}_n(x) - \mathbf{F}'_n(x)|. $$
  • This the maximum discrepancy between the two empirical distributions.
  • As $D_n$, the random variable $D_{n,n}$ does not depend on $F$.
Theorem:
The probability $\Prob{D_{n,n} < r/n}$ equals the probability in a symmetric random walk that a path of length $2n$ starting and terminating at the origin does not reach the points $\pm r$.
Proof:
  • Order the $2n$ variables $X_1$, $\ldots$, $X'_n$ in the increasing order
  • and put $\varepsilon_k = 1$ if the $k$th place is occupied by an $X_j$ or $\varepsilon_k = -1$ if the $k$th place is occupied by an $X'_j$.
  • The resulting arrangement contains $n$ plus ones and $n$ minus ones.
  • All these $\binom{2n}{n}$ orderings are equally likely.
  • Therefore, each $(\varepsilon_1,\ldots,\varepsilon_{2n})$ corresponds to a path of the length of $2n$ starting and terminating at the origin, and vice versa.
  • If $\varepsilon_1 + \ldots + \varepsilon_j = k$, i.e., the first $j$ places contains $(j+k)/2$ elements without primes and $(j-k)/2$ elements with primes, then there is $x$ such that $\mathbf{F}_n(x) = (j+k) / 2n$ and $\mathbf{F}'_n(x) = (j-k) / 2n$.
  • Then $|\mathbf{F}_n(x) - \mathbf{F}'_n(x)| = |k| / n$, and $D_{n,n} \ge |k| / n$.
  • The same arguments in the reverse order complete the proof.
  • As you see the probability $\Prob{D_{n,n} < r/n}$ is the probability that a particle starting at the origin returns at epoch $2n$ to the origin without touching $\pm r$.
  • These probabilities are characterized by a typical scale given through $\sqrt{n}$ (neither the statement nor the proof is here).
  • Eventually, the limit exists for $\sqrt{n}D_{n,n}$ and $\sqrt{n}D_n$.

Summary

In this manual, we discussed

  • Exponential distribution
  • The sum of independent exponential distributions
  • Order statistics for exponential and uniform distributions
  • Empirical distribution functions
  • Kolmogorov-Smirnov statistics
Previous: Bayesian approach Next: Hypothesis testing