# Generating Uniform Random Numbers on [0,1]

Chris Lomont, March 2017

Common advice for generating uniform random numbers in [0,1] in many languages looks like this [1,2,3]:

1 |
double value = ((double)rand())/(RAND_MAX); |

This however does not generate many possible floating-point numbers in $[0,1]$, leaving gaps – there are many floating-point numbers that should occur but cannot due to this method. To understand why, you need to understand how floating-point numbers are (usually) stored on computers.

Throughout we’ll assume the random number generators produce uniformly random integers in a known range.

## IEEE 754 floating-point format

The most common way to store floating-point numbers is the IEEE 754 format, which has 1 sign bit, $e$ exponent bits, and $m$ mantissa bits. I’ll write $S$ for the sign bit, $E$ for the exponent bit pattern and $M$ for the mantissa bit pattern. For a 64 bit `double`

used in C, C#, JavaScript, and many similar languages, it has this layout:

sign | exponent | mantissa |
---|---|---|

1 bit | 11 bits | 52 bits |

When $E\neq 0$ and $E\neq 2047$, this represents the floating point value $f=(-1)^S \times 2^{E-bias}\times 1.M$.

When $E=0$ and $M=0$, then $f=0$.

These two cases are called *normal* numbers, and most numbers you’ll use fall in this category.

When $E=0$ and $M\neq 0$, then the value $f$ is $f=(-1)^S \times 2^{1-bias}\times 0.M$.

This case is called a *subnormal* number. It lets you use smaller numbers than naively using the above format for all bit patterns. Note the implicit 0 in the mantissa and the 1 in the exponent, instead of 0. These ensure that all values behave nicely.

When $E=2047$ and $M=0$, the bit patterns represents signed $\infty$.

When $E=2047$ and $M\neq0$, the bit pattern represents **NaNs** (Not a Number), used for error conditions.

The *bias* for a 64-bit IEEE 754 floating-point number is 1023, allowing positive and negative exponents. The $1.M$ symbols mean there is an implied 1 bit, then a decimal point, then $M$ is the 52 bits following the decimal. Thus if $M=1010…000_2$, $1.M$ is $1+1\times\frac{1}{2}+0\times\frac{1}{4}+1\times\frac{1}{8}+0…0=1.625$. When a number is stored in this format the hardware determines position of the leading 1 bit, from that computes the exponent bits $E$, and if $E>0$, removes the implicit leading bit, and stores the next 52 bits for $M$. If $E>2046$, then one of the infinities is stored. If $E=0$, then subnormal numbers are stored.

For the rest of this article we’ll use the non-negative floating-point values, i.e., $S=0$.

## The problem

So, now that you understand the format, why is the original code bad? The first problem is `rand()`

in many languages does not generate enough values; in many C environments `RAND_MAX`

was (is?) 32767, meaning there are only 32767 possible output values. This is fine for a few uses, but if you really need a large number of floating-point values with more variety for modern simulations, you need more possible output values.

How many IEEE 754 floating-point values are there in $[0,1]$? For each exponent in $E=0,E=1,…,E=bias-1$ there are $2^{52}$ distinct floating-point values, and the final value $f=1$ is the exponent $E=bias$ with $M=0$. This is $1+1023\times 2^{52}$ which is almost $2^{62}$. So there is no way to generate every possible floating-point number in [0,1] with the appropriate odds using less than a 62 random bits.

So the next attempt might be something like

1 |
double value = (double)Rand64()/(pow(2,64)); |

The problem with this (and all such attempts), is this assumes the possible floating-point values are equally spaced, which they are not. For example, when $E=1$, the lowest bit of $M$ is a change of $2^{-1074}$, whereas when $E=1000$, the lowest bit of $M$ is a change of $2^{-75}$.

Thus all such attempts will fail at hitting all possible floating-point values in $[0,1]$, lowering the quality of the random number generation for demanding simulations.

## The solution

To generate all possible floating-point bit patterns with the appropriate probabilities, consider generating arbitrarily long random bit sequences $0.b_1b_2b_3…$, and rounding to the nearest representable floating-point numbers. This terminates since the smallest possible non-zero positive IEEE 754 64-bit number is $E=0$ and $M=1$, which is $2^{1-1023-51}=2^{-1074}$.

To do rounding, note that the probability of a trailing infinite sequence of 0s or 1s is each zero. (Note this is not the same as saying that it cannot happen; it is saying the probability of the infinite sequence happening is zero, which are two distinct things. As long as your simulation uses a finite number of random values, which seems pretty likely, then this is mathematically correct.).

To generate numbers in $[0,1)$, round down; to generate in $(0,1]$, round up, and for our case $[0,1]$, we’ll round up if the first bit past $M$ is 1, else we’ll round down. This will generate all possible floating-point numbers, with the correct probabilities.

In pseudo code (note this does not work in JavaScript since there is no 64 bit integer type) this is (we assume rand64 returns a uniform random 64 bit integer, over the whole range. Using other sources can be made to fit by concatenating bits as needed.):

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
// get exponent and first part of mantissa i32 e = -1 + 64 // want e = -1, -64 happens always below uint64 m = 0 // this must be at least 64 bits do m = rand64() // get 64 random bits e = e - 64 // we used 64 bits if (e < -1074) return 0.0 // will round to 0.0 while m == 0 // if all bits are so far 0, continue till we get a 1 // pos will be the index of the leading 1 in the mantissa i32 pos = 63 // start at highest bit while ((m >> pos) & 1) == 0 pos = pos - 1 pos = 63 - pos // invert for ease of use. pos is now the # of leading 0s if (pos != 0) // we need more bits: shift m, and adjust exponent m = (m << pos) | (rand64() >> (64-pos)) e = e - pos // at this point, e is the exponent value // m has a 1 in the highest bit, then 63 more bits, // enough to pick the proper 52 for the mantissa and round correctly if (e > -1023) // normal numbers m = m >> 10 // shift unneeded bits out, 54 left m = (m >> 1) + (m & 1) // shift and round if ((m>>53) == 1) // rounding caused overflow past implicit leading 1, fix exponent, m fixed below e = e + 1 else // subnormal numbers, want top 52 bits of m i32 d = -1023 - e // amount e is past the smallest e = -1023 m = m >> (11 + d) // want 52 bits left, minus the excess, plus 1 for carry m = (m >> 1) + (m & 1) // shift and round e = -1023 // where we are if ((m>>52) == 1) // rounding caused overflow into implicit leading 0, fix exponent, m fixed below e = e + 1 uint64 mask = 1 // a single one bit mask = (mask << 52) - 1 // done this way to avoid language specific shift issues m = m & mask // mask bits 52-63 out. m now ready // now e is the final exponent and m is the final 52 bit mantissa, with no leading 1 // create final 64 bit representation of the final value int64 fi fi = e + 1023 // exponent fi = fi << 52 // room for m fi = fi | m // final value // language specific conversion trick double f = convert fi // some example ways to convert are // C/C++ : memcpy(&f,&fi,8) // other methods often use undefined behavior // C# : f = BitConverter.Int64BitsToDouble(fi) // Java : f = Double.longBitsToDouble(fi) return f |

Finally, specific language tricks/support allow shortcuts for some of these steps.

## More problems

Do the same analysis for the interval $[a,b]$, $a\leq b$ arbitrary. It’s messy but doable from the techniques explained above.

[2] http://stackoverflow.com/questions/6218399/how-to-generate-a-random-number-between-0-and-1

[3] http://stackoverflow.com/questions/9878965/rand-between-0-and-1