Problems with the random function

mark · Post by **mark** » 13 Mar 2014, 20:17

Hi, I took a quick look at your code and grepped for the usage of rand() and want to give you some insight that I gained recently on my work in the Cockatrice project, which is another opensource game for Win/Linux/BSD/Mac.

1. The standard rand() implementation is not state of the art, which means in short: you can do better. There are faster (like in magnitudes) and "more random" mathematical designs for RNGs. For example, the cockatrice project uses SFMT (http://www.math.sci.hiroshima-u.ac.jp/~ ... index.html), which should be the best implementation out there but even if you didn't want to include another external code, you could switch to C++11 <random> and use the Mersenne Twister RNG with a uniform_distribution<int/float> distribution from there, if you are using C++11 -- I have seen that you have a C++11 thread here but I didn't read it so far (or at least recently).
SFMT is really small btw: add 4 files to your repo and you probably want one little wrapper function and you're ready.

2. There are many pitfalls which the standard developer does not know about, e.g. check http://www.azillionmonkeys.com/qed/random.html or http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx for information: People want a distribution function that translates the rand() random number into a random number of a given interval (e.g. numbers from 1 to 6 for a die roll) but they implement this distribution function with bias without being aware of that so some numbers will have a higher probability than others, which is not what youu want. This happens especially if you use rand() with modulo $value where $value is not a divisor of RAND_MAX.

Now that you are aware, you will also find many google hits discussing this.
The common sense (under people that are aware) is: use rand() only if you absolutely must (e.g. because of portability), otherwise use better RNGs and in any case, check if your distribution function is unbiased.

If you need more information, just ask. I didn't understand at first that there is a problem but after a long discussion in our project and refreshing my stochastics knowledge and fixing our code I can tell that the OpenMW code needs some random() love too

Post by **Zini** » 13 Mar 2014, 20:39

Thanks, but we are well aware of the limitations of our current RNG usage. I deem it acceptable for the time being, but we will move to a C++11 RNG, once we have moved to C++11.

Chris · Post by **Chris** » 13 Mar 2014, 21:11

mark wrote:2. There are many pitfalls which the standard developer does not know about, e.g. check http://www.azillionmonkeys.com/qed/random.html or http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx for information: People want a distribution function that translates the rand() random number into a random number of a given interval (e.g. numbers from 1 to 6 for a die roll) but they implement this distribution function with bias without being aware of that so some numbers will have a higher probability than others, which is not what youu want. This happens especially if you use rand() with modulo $value where $value is not a divisor of RAND_MAX.

We generally use

Code: Select all

int value = int(rand() / (RAND_MAX+1.0) * count);

when we need a random number in [0,count), which should give a correct 'unbiased' result (in as much as rand itself is unbiased).

We should probably create our own random methods anyway, because there's quite a number of places where we do things like the above, or other things like a random number in between [0,1] or [-n,+n]. And it would make it easier to replace the implementation later.

mark · Post by **mark** » 14 Mar 2014, 11:00

We generally use
Code: Select all
int value = int(rand() / (RAND_MAX+1.0) * count);
when we need a random number in [0,count), which should give a correct 'unbiased' result (in as much as rand itself is unbiased).

Should, but doesn't

Here is why: The result from this formula is always skewed unless RAND_MAX is a multiple of count (which varies on whatever count is exactly; it's probably a dynamic range so some times it works, some times it doesn't). Here is the problem:
If you divide the interval from 0 to RAND_MAX (which are RAND_MAX+1 numbers) into piles of size count and map rand()'s output to those piles (which is what your formula does), there are two possibilities:

RAND_MAX+1 is a multiple of count, and the piles come out even. So: you are lucky this time, everything is in equal size, you have a uniform distribution function.
RAND_MAX+1 is not a multiple of count, and the piles come out uneven. So there is a number of piles of equal size and one pile of unequal size. As long as the code is allowed to choose numbers from the small pile, you have no uniform distribution function.

RAND_MAX is often a prime number, test it: cout << RAND_MAX is probably 2147483647 = 2^32 -1, aka the 8th Mersenne prime (but don't rely on that), so if you had one case where your count had to divide a prime value, the function were always skewed.
But in this case RAND_MAX + 1 = 2^32 which means that count must be a power of 2 to make your formula uniform. If you're interested e.g. in numbers 1-6 then you're screwed: 2^32 / 6 = 715827882.667. I hope that MW is no D6 based RPG

If you wanted numbers from 1 to 16 = 2^4 instead then you're lucky because 2^32 / 16 = 268435456; it is equal pile time. Same formula, different quality of results. Get it? This is a small but important detail most programmers don't think about (no criticism intended, I never thought about that either), which is why the internet is full of biased random formulas.
The solution is to ask for random numbers until you get one which is not from the last pile if it is smaller than the others (which is called rejection sampling).

We should probably create our own random methods anyway, because there's quite a number of places where we do things like the above, or other things like a random number in between [0,1] or [-n,+n]. And it would make it easier to replace the implementation later.

exactly. This solves the biased RNG and it is a cleaner approach to have one function instead of writing the same, probably "wrong" formulas everywhere. And you can implement, test and choose whatever RNG you like on one place.
FYI, grep found modulo occurances with rand, the formula above and different looking stuff, so the OpenMW code does not use the above formula generally, it's rather a mix of everything I've seen on google (or maybe the operands were just switched, I don't remember).

We wrote a class where the constructor seeds the RNG and one method delivers a uniformly chosen random number from [min, max]. As said before, we use SFMT instead of rand() and more precisly use the function that returns uint64 random values instead of double values (which is also possible, but I like integer operations better than this unprecise double stuff). If you want you can have a look at our solution, which is well commented:
https://github.com/VanNostrand/Cockatri ... rng_sfmt.h
https://github.com/VanNostrand/Cockatri ... g_sfmt.cpp

In C++11 the code would look like this (and be 2-6x slower than SFMT, but hey... still better than rand()):

Code: Select all

// do some seeding before
...
std::mt19937_64 generator;
std::uniform_int_distribution<int> distribution(min, max);
int my_random_number = distribution(generator);

openmw.org

Problems with the random function

Problems with the random function

Re: Problems with the random function

Re: Problems with the random function

Re: Problems with the random function