# Pseudo-random number sampling

__: Generating pseudo-random numbers that follow a probability distribution__

**Short description**

**Pseudo-random number sampling** or **non-uniform pseudo-random variate generation** is the numerical practice of generating pseudo-random numbers that are distributed according to a given probability distribution.

Methods of sampling a non-uniform distribution are typically based on the availability of a pseudo-random number generator producing numbers *X* that are uniformly distributed. Computational algorithms are then used to manipulate a single random variate, *X*, or often several such variates, into a new random variate *Y* such that these values have the required distribution.

Historically, basic methods of pseudo-random number sampling were developed for Monte-Carlo simulations in the Manhattan project;^{[citation needed]} they were first published by John von Neumann in the early 1950s.^{[1]}

## Finite discrete distributions

For a discrete probability distribution with a finite number *n* of indices at which the probability mass function *f* takes non-zero values, the basic sampling algorithm is straightforward. The interval [0, 1) is divided in *n* intervals [0, *f*(1)), [*f*(1), *f*(1) + *f*(2)), ... The width of interval *i* equals the probability *f*(*i*).
One draws a uniformly distributed pseudo-random number *X*, and searches for the index *i* of the corresponding interval. The so determined *i* will have the distribution *f*(*i*).

Formalizing this idea becomes easier by using the cumulative distribution function

- [math]\displaystyle{ F(i)=\sum_{j=1}^i f(j). }[/math]

It is convenient to set *F*(0) = 0. The *n* intervals are then simply [*F*(0), *F*(1)), [*F*(1), *F*(2)), ..., [*F*(*n* − 1), *F*(*n*)). The main computational task is then to determine *i* for which *F*(*i* − 1) ≤ *X* < *F*(*i*).

This can be done by different algorithms:

- Linear search, computational time linear in
*n*. - Binary search, computational time goes with log
*n*. - Indexed search,
^{[2]}also called the*cutpoint method*.^{[3]} - Alias method, computational time is constant, using some pre-computed tables.
- There are other methods that cost constant time.
^{[4]}

## Continuous distributions

Generic methods for generating independent samples:

- Rejection sampling for arbitrary density functions
- Inverse transform sampling for distributions whose CDF is known
- Ratio of uniforms, combining a change of variables and rejection sampling
- Slice sampling
- Ziggurat algorithm, for monotonically decreasing density functions as well as symmetric unimodal distributions
- Convolution random number generator, not a sampling method in itself: it describes the use of arithmetics on top of one or more existing sampling methods to generate more involved distributions.

Generic methods for generating correlated samples (often necessary for unusually-shaped or high-dimensional distributions):

- Markov chain Monte Carlo, the general principle
- Metropolis–Hastings algorithm
- Gibbs sampling
- Slice sampling
- Reversible-jump Markov chain Monte Carlo, when the number of dimensions is not fixed (e.g. when estimating a mixture model and simultaneously estimating the number of mixture components)
- Particle filters, when the observed data is connected in a Markov chain and should be processed sequentially

For generating a normal distribution:

For generating a Poisson distribution:

## Software libraries

GNU Scientific Library has a section entitled "Random Number Distributions" with routines for sampling under more than twenty different distributions.

## Footnotes

- ↑ Von Neumann, John (1951). "Various Techniques Used in Connection with Random Digits".
*Monte Carlo Methods*. National Bureau of Standards Applied Mathematics Series.**12**. US Government Printing Office. pp. 36–38. https://mcnp.lanl.gov/pdf_files/nbs_vonneumann.pdf. "Any one who considers arithmetical methods of producing random digits is of course, in a state of sin." Also online is a low-quality scan of the original publication. - ↑ Ripley (1987)
^{[page needed]} - ↑ Fishman (1996)
^{[page needed]} - ↑ Fishman (1996)
^{[page needed]}

## Literature

- Devroye, L. (1986)
*Non-Uniform Random Variate Generation.*New York: Springer - Fishman, G.S. (1996)
*Monte Carlo. Concepts, Algorithms, and Applications.*New York: Springer - Hörmann, W.; J Leydold, G Derflinger (2004,2011)
*Automatic Nonuniform Random Variate Generation.*Berlin: Springer. - Knuth, D.E. (1997)
*The Art of Computer Programming*, Vol. 2*Seminumerical Algorithms*, Chapter 3.4.1 (3rd edition). - Ripley, B.D. (1987)
*Stochastic Simulation*. Wiley.