Xorshift

From HandWiki
Short description: Class of pseudorandom number generators
Example random distribution of Xorshift128

Xorshift random number generators, also called shift-register generators, are a class of pseudorandom number generators that were invented by George Marsaglia.[1] They are a subset of linear-feedback shift registers (LFSRs) which allow a particularly efficient implementation in software without the excessive use of sparse polynomials.[2] They generate the next number in their sequence by repeatedly taking the exclusive or of a number with a bit-shifted version of itself. This makes execution extremely efficient on modern computer architectures, but it does not benefit efficiency in a hardware implementation. Like all LFSRs, the parameters have to be chosen very carefully in order to achieve a long period.[3]

For execution in software, xorshift generators are among the fastest PRNGs, requiring very small code and state. However, they do not pass every statistical test without further refinement. This weakness is amended by combining them with a non-linear function, as described in the original paper. Because plain xorshift generators (without a non-linear step) fail some statistical tests, they have been accused of being unreliable.[3]:360

Example implementation

A C version[lower-alpha 1] of three xorshift algorithms[1]:4,5 is given here. The first has one 32-bit word of state, and period 232−1. The second has one 64-bit word of state and period 264−1. The last one has four 32-bit words of state, and period 2128−1. The 128-bit algorithm passes the diehard tests. However, it fails the MatrixRank and LinearComp tests of the BigCrush test suite from the TestU01 framework.

All use three shifts and three or four exclusive-or operations:

#include <stdint.h>

struct xorshift32_state {
    uint32_t a;
};

/* The state must be initialized to non-zero */
uint32_t xorshift32(struct xorshift32_state *state)
{
	/* Algorithm "xor" from p. 4 of Marsaglia, "Xorshift RNGs" */
	uint32_t x = state->a;
	x ^= x << 13;
	x ^= x >> 17;
	x ^= x << 5;
	return state->a = x;
}

struct xorshift64_state {
    uint64_t a;
};

uint64_t xorshift64(struct xorshift64_state *state)
{
	uint64_t x = state->a;
	x ^= x << 13;
	x ^= x >> 7;
	x ^= x << 17;
	return state->a = x;
}

/* struct xorshift128_state can alternatively be defined as a pair
   of uint64_t or a uint128_t where supported */
struct xorshift128_state {
    uint32_t x[4];
};

/* The state must be initialized to non-zero */
uint32_t xorshift128(struct xorshift128_state *state)
{
	/* Algorithm "xor128" from p. 5 of Marsaglia, "Xorshift RNGs" */
	uint32_t t  = state->x[3];
    
    uint32_t s  = state->x[0];  /* Perform a contrived 32-bit shift. */
	state->x[3] = state->x[2];
	state->x[2] = state->x[1];
	state->x[1] = s;

	t ^= t << 11;
	t ^= t >> 8;
	return state->x[0] = t ^ s ^ (s >> 19);
}

Non-linear variations

All xorshift generators fail some tests in the BigCrush test suite. This is true for all generators based on linear recurrences, such as the Mersenne Twister or WELL. However, it is easy to scramble the output of such generators to improve their quality.

The scramblers known as + and * still leave weakness in the low bits,[4] so they are intended for floating point use, as double-precision floating-point numbers only use 53 bits, so the lower 11 bits are not used. For general purpose, the scrambler ** (pronounced starstar) makes the LFSR generators pass in all bits.

xorwow

Marsaglia suggested scrambling the output by combining it with a simple additive counter modulo 232 (which he calls a "Weyl sequence" after Weyl's equidistribution theorem). This also increases the period by a factor of 232, to 2192−232:

#include <stdint.h>

struct xorwow_state {
    uint32_t x[5];
    uint32_t counter;
};

/* The state array must be initialized to not be all zero in the first four words */
uint32_t xorwow(struct xorwow_state *state)
{
    /* Algorithm "xorwow" from p. 5 of Marsaglia, "Xorshift RNGs" */
    uint32_t t  = state->x[4];
 
    uint32_t s  = state->x[0];  /* Perform a contrived 32-bit shift. */
    state->x[4] = state->x[3];
    state->x[3] = state->x[2];
    state->x[2] = state->x[1];
    state->x[1] = s;
 
    t ^= t >> 2;
    t ^= t << 1;
    t ^= s ^ (s << 4);
    state->x[0] = t;
    state->counter += 362437;
    return t + state->counter;
}

This performs well, but fails a few tests in BigCrush.[5] This generator is the default in Nvidia's CUDA toolkit.[6]

xorshift*

An xorshift* generator applies an invertible multiplication (modulo the word size) as a non-linear transformation to the output of an xorshift generator, as suggested by Marsaglia.[1] All xorshift* generators emit a sequence of values that is equidistributed in the maximum possible dimension (except that they will never output zero for 16 calls, i.e. 128 bytes, in a row).[7]

The following 64-bit generator has a maximal period of 264−1.[7]

#include <stdint.h>

/* xorshift64s, variant A_1(12,25,27) with multiplier M_32 from line 3 of table 5 */
uint64_t xorshift64star(void) {
    /* initial seed must be nonzero, don't use a static variable for the state if multithreaded */
    static uint64_t x = 1;
    x ^= x >> 12;
    x ^= x << 25;
    x ^= x >> 27;
    return x * 0x2545F4914F6CDD1DULL;
}

The generator fails only the MatrixRank test of BigCrush, however if the generator is modified to return only the high 32 bits, then it passes BigCrush with zero failures.[8]:7 In fact, a reduced version with only 40 bits of internal state passes the suite, suggesting a large safety margin.[8]:19 A similar generator suggested in Numerical Recipes[9] as RanQ1 also fails the BirthdaySpacings test.

Vigna[7] suggests the following xorshift1024* generator with 1024 bits of state and a maximal period of 21024−1; however, it does not always pass BigCrush.[4] xoshiro256** is therefore a much better option.

#include <stdint.h>

/* The state must be seeded so that there is at least one non-zero element in array */
struct xorshift1024s_state {
	uint64_t x[16];
	int index;
};

uint64_t xorshift1024s(struct xorshift1024s_state *state)
{
	int index = state->index;
	uint64_t const s = state->x[index++];
	uint64_t t = state->x[index &= 15];
	t ^= t << 31;		// a
	t ^= t >> 11;		// b  -- Again, the shifts and the multipliers are tunable
	t ^= s ^ (s >> 30);	// c
	state->x[index] = t;
	state->index = index;
	return t * 1181783497276652981ULL;
}

xorshift+

An xorshift+ generator can achieve an order of magnitude fewer failures than Mersenne Twister or WELL. A native C implementation of an xorshift+ generator that passes all tests from the BigCrush suite can typically generate a random number in fewer than 10 clock cycles on x86, thanks to instruction pipelining.[10]

Rather than using multiplication, it is possible to use addition as a faster non-linear transformation. The idea was first proposed by Saito and Matsumoto (also responsible for the Mersenne Twister) in the XSadd generator, which adds two consecutive outputs of an underlying xorshift generator based on 32-bit shifts.[11] However, one disadvantage of adding consecutive outputs is that, while the underlying xorshift128 generator is 2-dimensionally equidistributed, the xorshift128+ generator is only 1-dimensionally equidistributed.[12]

XSadd has some weakness in the low-order bits of its output; it fails several BigCrush tests when the output words are bit-reversed. To correct this problem, Vigna introduced the xorshift+ family,[12] based on 64-bit shifts. xorshift+ generators, even as large as xorshift1024+, exhibit some detectable linearity in the low-order bits of their output;[4] it passes BigCrush, but doesn't when the 32 lowest-order bits are used in reverse order from each 64-bit word.[4] This generator is one of the fastest generators passing BigCrush.[10]

The following xorshift128+ generator uses 128 bits of state and has a maximal period of 2128−1.

#include <stdint.h>

struct xorshift128p_state {
    uint64_t x[2];
};

/* The state must be seeded so that it is not all zero */
uint64_t xorshift128p(struct xorshift128p_state *state)
{
	uint64_t t = state->x[0];
	uint64_t const s = state->x[1];
	state->x[0] = s;
	t ^= t << 23;		// a
	t ^= t >> 18;		// b -- Again, the shifts and the multipliers are tunable
	t ^= s ^ (s >> 5);	// c
	state->x[1] = t;
	return t + s;
}

xoshiro

xoshiro and xoroshiro use rotations in addition to shifts. According to Vigna, they are faster and produce better quality output than xorshift.[13][14]

This class of generator has variants for 32-bit and 64-bit integer and floating point output; for floating point, one takes the upper 53 bits (for binary64) or the upper 23 bits (for binary32), since the upper bits are of better quality than the lower bits in the floating point generators. The algorithms also include a jump function, which sets the state forward by some number of steps – usually a power of two that allows many threads of execution to start at distinct initial states.

For 32-bit output, xoshiro128** and xoshiro128+ are exactly equivalent to xoshiro256** and xoshiro256+, with uint32_t in place of uint64_t, and with different shift/rotate constants.

More recently, the xoshiro++ generators have been made as an alternative to the xoshiro** generators. They are used in some implementations of Fortran compilers such as GNU Fortran, Java, and Julia.[15]

xoshiro256**

xoshiro256** is the family's general-purpose random 64-bit number generator. It is used in GNU Fortran compiler, Lua (as of Lua 5.4), and the .NET framework (as of .NET 6.0).[15]

/*  Adapted from the code included on Sebastiano Vigna's website */

#include <stdint.h>

uint64_t rol64(uint64_t x, int k)
{
	return (x << k) | (x >> (64 - k));
}

struct xoshiro256ss_state {
	uint64_t s[4];
};

uint64_t xoshiro256ss(struct xoshiro256ss_state *state)
{
	uint64_t *s = state->s;
	uint64_t const result = rol64(s[1] * 5, 7) * 9;
	uint64_t const t = s[1] << 17;

	s[2] ^= s[0];
	s[3] ^= s[1];
	s[1] ^= s[2];
	s[0] ^= s[3];

	s[2] ^= t;
	s[3] = rol64(s[3], 45);

	return result;
}

xoshiro256+

xoshiro256+ is approximately 15% faster than xoshiro256**, but the lowest three bits have low linear complexity; therefore, it should be used only for floating point results by extracting the upper 53 bits.

#include <stdint.h>

uint64_t rol64(uint64_t x, int k)
{
	return (x << k) | (x >> (64 - k));
}

struct xoshiro256p_state {
	uint64_t s[4];
};

uint64_t xoshiro256p(struct xoshiro256p_state *state)
{
	uint64_t* s = state->s;
	uint64_t const result = s[0] + s[3];
	uint64_t const t = s[1] << 17;

	s[2] ^= s[0];
	s[3] ^= s[1];
	s[1] ^= s[2];
	s[0] ^= s[3];

	s[2] ^= t;
	s[3] = rol64(s[3], 45);

	return result;
}

xoroshiro

If space is at a premium, xoroshiro128** and xoroshiro128+ are equivalent to xoshiro256** and xoshiro256+. These have smaller state spaces, and thus are less useful for massively parallel programs. xoroshiro128+ also exhibits a mild dependency in the population count, generating a failure after TB of output. The authors do not believe that this can be detected in real world programs.

xoroshiro64** and xoroshiro64* are equivalent to xoroshiro128** and xoroshiro128+. Unlike the xoshiro generators, they are not straightforward ports of their higher-precision counterparts.

Initialization

In the xoshiro paper, it is recommended to initialize the state of the generators using a generator which is radically different from the initialized generators, as well as one which will never give the "all-zero" state; for shift-register generators, this state is impossible to escape from.[14][16] The authors specifically recommend using the SplitMix64 generator, from a 64-bit seed, as follows:

#include <stdint.h>

struct splitmix64_state {
	uint64_t s;
};

uint64_t splitmix64(struct splitmix64_state *state) {
	uint64_t result = (state->s += 0x9E3779B97f4A7C15);
	result = (result ^ (result >> 30)) * 0xBF58476D1CE4E5B9;
	result = (result ^ (result >> 27)) * 0x94D049BB133111EB;
	return result ^ (result >> 31);
}

struct xorshift128_state {
    uint32_t x[4];
};

// one could do the same for any of the other generators
void xorshift128_init(struct xorshift128_state *state, uint64_t seed) {
	struct splitmix64_state smstate = {seed};

	uint64_t tmp = splitmix64(&smstate);
	state->x[0] = (uint32_t)tmp;
	state->x[1] = (uint32_t)(tmp >> 32);

	tmp = splitmix64(&smstate);
	state->x[2] = (uint32_t)tmp;
	state->x[3] = (uint32_t)(tmp >> 32);
}

Notes

  1. In C and most other C-based languages, ^ represents bitwise XOR, and << and >> represent bitwise shifts.

References

  1. 1.0 1.1 1.2 "Xorshift RNGs". Journal of Statistical Software 8 (14). July 2003. doi:10.18637/jss.v008.i14. 
  2. "Note on Marsaglia's Xorshift Random Number Generators". Journal of Statistical Software 11 (5). August 2004. doi:10.18637/jss.v011.i05. 
  3. 3.0 3.1 Panneton, François; L'Ecuyer, Pierre (October 2005). "On the xorshift random number generators". ACM Transactions on Modeling and Computer Simulation 15 (4): 346–361. doi:10.1145/1113316.1113319. https://www.iro.umontreal.ca/~lecuyer/myftp/papers/xorshift.pdf. 
  4. 4.0 4.1 4.2 4.3 Lemire, Daniel; O’Neill, Melissa E. (April 2019). "Xorshift1024*, Xorshift1024+, Xorshift128+ and Xoroshiro128+ Fail Statistical Tests for Linearity". Computational and Applied Mathematics 350: 139–142. doi:10.1016/j.cam.2018.10.019. "We report that these scrambled generators systematically fail Big Crush—specifically the linear-complexity and matrix-rank tests that detect linearity—when taking the 32 lowest-order bits in reverse order from each 64-bit word.". 
  5. Le Floc'h, Fabien (12 January 2011). "XORWOW L'ecuyer TestU01 Results". https://chasethedevil.github.io/post/xorwow-lecuyer-testu01-results/. 
  6. "cuRAND Testing". Nvidia. https://docs.nvidia.com/cuda/curand/testing.html. 
  7. 7.0 7.1 7.2 Vigna, Sebastiano (July 2016). "An experimental exploration of Marsaglia's xorshift generators, scrambled". ACM Transactions on Mathematical Software 42 (4): 30. doi:10.1145/2845077. http://vigna.di.unimi.it/ftp/papers/xorshift.pdf.  Proposes xorshift* generators, adding a final multiplication by a constant.
  8. 8.0 8.1 Template:Cite tech report
  9. "Section 7.1.2.A. 64-bit Xorshift Method". Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. 2007. ISBN 978-0-521-88068-8. http://apps.nrbook.com/empanel/index.html#pg=345. 
  10. 10.0 10.1 Vigna, Sebastiano. "xorshift*/xorshift+ generators and the PRNG shootout". http://prng.di.unimi.it. 
  11. Saito, Mutsuo; Matsumoto, Makoto (2014). "XORSHIFT-ADD (XSadd): A variant of XORSHIFT". http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/XSADD/. 
  12. 12.0 12.1 Vigna, Sebastiano (May 2017). "Further scramblings of Marsaglia's xorshift generators". Journal of Computational and Applied Mathematics 315 (C): 175–181. doi:10.1016/j.cam.2016.11.006. http://vigna.di.unimi.it/ftp/papers/xorshiftplus.pdf.  Describes xorshift+ generators, a generalization of XSadd.
  13. Vigna, Sebastiano. "xoshiro/xoroshiro generators and the PRNG shootout". http://xoroshiro.di.unimi.it. 
  14. 14.0 14.1 Blackman, David; Vigna, Sebastiano (2018). "Scrambled Linear Pseudorandom Number Generators". Data Structures and Algorithms. 
  15. 15.0 15.1 "xoshiro / xoroshiro generators and the PRNG shootout". https://prng.di.unimi.it. 
  16. Matsumoto, Makoto; Wada, Isaku; Kuramoto, Ai; Ashihara, Hyo (September 2007). "Common defects in initialization of pseudorandom number generators". ACM Transactions on Modeling and Computer Simulation 17 (4): 15–es. doi:10.1145/1276927.1276928. 

Further reading