Processing math: 100%

Analytics

19 March 2015

Sampling from a standard normal distribution: CLT, Box-Muller and rnorm.

Following Exercise 2.3. from Introducing Monte Carlo Methods with R by Robert & Casella.

Given
U_1, U_2, \dots, U_{12} \overset{iid}{\sim} \mathbb{U}[-\tfrac{1}{2}, \tfrac{1}{2}], set Z = \sum_{i=1}^{12}{U_i}.
Then, \mathbb{E}[Z] = \mathbb{E}[\sum_{i=1}^{12}{U_i}] = \sum_{i=1}^{12}{\mathbb{E}[U_i]} = \sum_{i=1}^{12}{0} = 0, and \mathbb{V}[Z] = \mathbb{V}[\sum_{i=1}^{12}{U_i}] = \sum_{i=1}^{12}{\mathbb{V}[U_i]} = \sum_{i=1}^{12}{\tfrac{1}{12}} = 1, where \mathbb{V}[\sum_{i=1}^{12}{U_i}] = \sum_{i=1}^{12}{\mathbb{V}[U_i]} because U_1, U_2, \dots, U_{12} are iid.

We sample n = 10^4 observations in three different ways: appealing to CLT (see above), using the Box-Muller algorithm and finally calling R's base rnorm.

set.seed(42)
n = 10^4
k = 12
u = matrix(runif(n*k, -0.5, 0.5), n)
z1 = rowSums(u) # CLT
z2 = sqrt(-2 * log(runif(n))) * cos(2*pi*runif(n)) # Box-Muller
z3 = rnorm(n) # R's base rnorm


A quick look reveals all the samples show the visual characteristics of a standard normal distribution: bell shape, symmetry, centered at 0. We focus on the tails now.


The central values are well aligned, meaning algorithms produce similarly distributed samples to the center, but the tails of the observations from the Box-Muller algorithm differ from CLT and rnorm.


From KDE, we can see two things. First, the CLT approximation is wider between [-2, \tfrac{1}{2}] and [\tfrac{1}{2}, 2], yet it doesn't offer a good approximation at the peak. Second, rnorm provides fatter tails compared to the other two algorithms.

Still, I think it's worth coming up with additional ways to compare the samples, especially to characterise the tails. Any idea on how to do that?


Sampling from the uniform and the inverse transform

Following Exercise 2.2. from Introducing Monte Carlo Methods with R by Robert & Casella.

Given

u = F(x) = \frac{1}{1 + e^{\frac{-(x-\mu)}{\beta}}},

we rearrange for x,

x = F^{-1}(u) = \mu - \beta \ln(1/u - 1),

with u \sim U(0, 1).

Both sampling from u and applying the inverse function and sampling from the logistic distribution yield similar results.

set.seed(42)
antiF <- function(u, mu, beta) {mu - beta * log(1/u - 1)}
n = 10000
mu = 5
beta = 2
s1 = antiF(runif(n), mu, beta)
s2 = rlogis(n, mu, beta)
We visualise the similarity,



Neato!

5 March 2015

Welcome!



We're working on our first posts, trying hard to come up with some neat code and shit. Please stand by!