Processing math: 100%

Analytics

Thursday, January 24, 2013

Sampling a Normal

The normal distribution is one of the most commonly used probability distributions because it turns out that many probabilities in the real world are approximately normal as a result of the central limit theorem. The standard normal distribution has a probability density function of

f(x) = \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}x^2}
For something so common, its density function is not particularly nice to deal with; its cumulative distribution cannot be expressed in terms of elementary functions. This makes it a bit tricky to sample from as well, since the easiest way to sample a random variable is to sample a value from the uniform distribution on [0, 1] and then invert the cumulative distribution function. Fortunately, people have come up with very clever ways of sampling a normal, and I'll provide a derivation of one, called the Box-Muller transform, here.

Consider the joint distribution of two independent normal variables:

f(x, y) = f(x)f(y) = \frac{1}{2\pi} e^{-\frac{1}{2}(x^2+y^2)}
This is a distribution across all points in the two-dimensional plane given by their (x, y) coordinates. If we instead consider the polar representation of points, we can transform the given joint distribution via the change of variables formula. Using x = r\cos{\theta} and y = r\sin{\theta} gives (most calculations omitted for brevity)

f(r, \theta) = f(x, y) \left| \begin{array}{cc} \cos{\theta} & -r\sin{\theta} \\ \sin{\theta} & r\cos{\theta} \end{array} \right| = f(x, y) \cdot r = \frac{1}{2\pi} r e^{-\frac{1}{2} r^2}
Integrating out \theta over the range [0, 2\pi) leaves us with something that we can obtain the cumulative distribution for:

f(r) = r e^{-\frac{1}{2} r^2} \Rightarrow F(r) = \int_0^r f(t) dt = \int_0^r t e^{-\frac{1}{2} t^2} dt = 1 - e^{-\frac{1}{2} r^2}
Now we can apply the inversion technique to sample r; let U_1 be a random variable drawn from a uniform distribution on [0, 1]. We can compute r = F^{-1}(1-U_1) = \sqrt{-2\ln{U_1}} (we decide to swap U_1 with 1-U_1 to make the result nicer since the two have the same distribution). Notice that we can sample \theta as well because it's actually uniformly distributed on [0, 2\pi), so if U_2 is another uniform random variable on [0, 1] then we can take \theta = 2 \pi U_2. Putting it all together, we get two (independent) sampled normals

x = r\cos{\theta} = \sqrt{-2\ln{U_1}} \cos{(2 \pi U_2)} \\ y = r\sin{\theta} = \sqrt{-2\ln{U_1}} \sin{(2 \pi U_2)}
To wrap things up and make it a bit more practical, here's a Java snippet of this in action.

public class NormalSampler {
private static Random RANDOM = new Random();
public double next() {
double U1 = RANDOM.nextDouble(), U2 = RANDOM.nextDouble();
return Math.sqrt(-2 * Math.log(U1)) * Math.cos(2 * Math.PI * U2);
}
}

No comments:

Post a Comment