The distribution function completely caracterizes a random variable.
Assume that $X$ is a continuous random variable and let $Z = F_X(X)$. If $x \in [0,1]$, the cumulative distribution function of $Z$ is $$ F_Z(x) = P[Z \leq x] = P[F_X(X) \leq x] = P[X \leq F_X^{-1}(x)] = F_X(F_X^{-1}(x)) = x. $$ In other terms, the distribution function of any continuous random variable follows a $U(0,1)$.
Assume that we observe ${x_1,\dots,x_n}$, and sort them by ascending order. The sorted observations are denoted by ${x_{(1)},\dots,x_{(n)}}$. The empirical distribution function is defined as $$ {\hat F_n(x)} = \frac{1}{n} \sum_{i=1}^n I[x_{(i)} \le x], $$ where $$ I[y \le x] = \begin{cases} 1 \mbox{ if } y \le x, \\ 0 \mbox{ otherwise.} \end{cases} $$
using RandomStreams
using Distributions
const SEED = 12345
seeds = [SEED, SEED, SEED, SEED, SEED, SEED]
gen = MRG32k3aGen(seeds)
unif = next_stream(gen)
n = 10
x = Array(Float64, n)
for i = 1:n
x[i] = rand(Poisson(10000))/10000
end
x
We can directly represent the empirical distribution function in Julia using the method ef.
using StatsBase
ef = ecdf(x)
methods(ef)
We can the evaluate it as any other distribution function.
u = ef(0.99)
Several definitions exist to quantify the quantile of a sample, but all of them are consistent as an Monte Carlo estimator of a quantile.
y = sort(x)
l = length(y)
m = Int64(floor(n*0.45))
y[m]
y[m+1]
y[m+2]
n*0.6
quantile(y,0.45)
ef(quantile(y,0.6))
0.5*(y[m+1]+y[m+2])
n/2.0
methods(quantile)
ef(9)
ef(10)
?ecdf
ef(quantile(y,0.6))
quantile(y,0.6)
y[700]
ef(y[700])
ceil(n*0.6)
y[Int64(ceil(n*0.6))]
?quantile
x = [1, 3, 5, 6, 6, 21]
quantile(x,0.5)