Thursday, 5 July 2007

Stochastic Processes and Distributed Computations

The most disturbing problem I am struggling with way too often is related to the way random numbers are generated in the majority of modern computers. As many of you know, those are not really random, and are generated with some kind of a numerical algorithm. Most of the random number generation algorithms rely on something called "random number generator seed", that is a parameter to such an algorithm to start a particular pseudo-random sequence. But beware - the majority of algorithms will spit out exactly the same random sequence when given the same seed. The traditional solution for this problem is to initialise the seed with the current time (when the program is started), something akin to:
gsl_rng_set(r,time(NULL));
if you are using C and GSL.

This, however, will not help if you are running distributed computations on a cluster, and you need to get an independent random sequence for each of the processes. The solution in this case is to generate a unique random seed for each of the processes. Adding the rank of the current job to the current time value works the best for me when using MPI. The solution looks like the following:
gsl_rng_set(r,time(NULL) + MPI::COMM_WORLD.Get_rank());
for C++, and
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
gsl_rng_set(r,time(NULL) + rank);
if you are using C.

If you are not using MPI, but rather run independent jobs through the execution queue, you cannot compute such a rank, but you can use hostname checksum / process id combination instead. It is extremely unlikely (I am speaking about the most UNIX architectures) that two instances of your program will be run on the same host within the same second and have exactly the same process id. If it is the case, you have to think a little, but you already have the general idea.

No comments: