Pseudo-random number sampling

Short description: Generating pseudo-random numbers that follow a probability distribution

Pseudo-random number sampling or non-uniform pseudo-random variate generation is the numerical practice of generating pseudo-random numbers that are distributed according to a given probability distribution.

Methods of sampling a non-uniform distribution are typically based on the availability of a pseudo-random number generator producing numbers X that are uniformly distributed. Computational algorithms are then used to manipulate a single random variate, X, or often several such variates, into a new random variate Y such that these values have the required distribution.

Historically, basic methods of pseudo-random number sampling were developed for Monte-Carlo simulations in the Manhattan project; they were first published by John von Neumann in the early 1950s.[1]

Finite discrete distributions

For a discrete probability distribution with a finite number n of indices at which the probability mass function f takes non-zero values, the basic sampling algorithm is straightforward. The interval [0, 1) is divided in n intervals [0, f(1)), [f(1), f(1) + f(2)), ... The width of interval i equals the probability f(i). One draws a uniformly distributed pseudo-random number X, and searches for the index i of the corresponding interval. The so determined i will have the distribution f(i).

Formalizing this idea becomes easier by using the cumulative distribution function

$\displaystyle{ F(i)=\sum_{j=1}^i f(j). }$

It is convenient to set F(0) = 0. The n intervals are then simply [F(0), F(1)), [F(1), F(2)), ..., [F(n − 1), F(n)). The main computational task is then to determine i for which F(i − 1) ≤ X < F(i).

This can be done by different algorithms:

• Linear search, computational time linear in n.
• Binary search, computational time goes with log n.
• Indexed search,[2] also called the cutpoint method.[3]
• Alias method, computational time is constant, using some pre-computed tables.
• There are other methods that cost constant time.[4]

Continuous distributions

Generic methods for generating independent samples:

Generic methods for generating correlated samples (often necessary for unusually-shaped or high-dimensional distributions):

For generating a normal distribution:

For generating a Poisson distribution:

Software libraries

GNU Scientific Library has a section entitled "Random Number Distributions" with routines for sampling under more than twenty different distributions.

Footnotes

1. Von Neumann, John (1951). "Various Techniques Used in Connection with Random Digits". Monte Carlo Methods. National Bureau of Standards Applied Mathematics Series. 12. US Government Printing Office. pp. 36–38. "Any one who considers arithmetical methods of producing random digits is of course, in a state of sin."  Also online is a low-quality scan of the original publication.
2. Ripley (1987)
3. Fishman (1996)
4. Fishman (1996)