Random Treatment for Random Data

Unlike typical methods, this random number generator algorithm gives you a different set of numbers every time it’s used in a run. It’s also much faster than typical standard library implementations.

By Kirk J Krauss

A discussion delving into construction of skip lists of the sort described in several articles here at developforperformance.com can’t avoid at least a short foray into the concept of randomness. That’s because a garden-variety skip list doesn’t know which kind of data it will be containing. For the sake of keeping things generic, most introductions to skip lists describe setting the level for each entry based on one or more random numbers. A well-arranged list has a majority of entries at level 0, considerably fewer entries at level 1, considerably fewer still at level 2, and so forth up the line. This can be arranged using random control, based on code such as the following that grabs random numbers over and over:

Listing One

#include "windows.h"
#include "stdafx.h"
#include "stdio.h"

// CreateSkiplistEntry()
//  Returns a pointer to a skip list entry, including a number of pointers to 
//  next entries as indicated by the level.
//
//  Every entry has a level.  A search through the skip list involves walking 
//  at a given level until the target is reached, or if the target is missed, 
//  then continuing from the entry before the target, but at the next lower 
//  level.
//
//  If the indicated level is out of range (e.g. -1) then a random level is 
//  indicated.
//
void * SkipList::CreateEntry(int iLevel, unsigned int uSizeOfEntry)
{
    SkipListEntry *pEntry = NULL;
 
    if (iLevel < 0 /* || [CAN ALSO CHECK] iLevel > this->pBaseEntry->Level */)
    {
        // The new entry will have a random level.
        while ((GetTypicallyTrueRandomNumber() % 2))
        {
            if (++iLevel == this->pBaseEntry->Level)
            {
                break;
            }
        }
    }
 
    pEntry = (SkipListEntry *) malloc(uSizeOfEntry);
    pEntry->Level = iLevel;			
    return (void *) pEntry;
}

A custom random number generator routine is invoked here, not only as a randomness enhancement over the standard routines available on modern systems, but also as a performance enhancement. But what’s the performance penalty of getting random numbers by calling those standard routines? The stock random number generator routines available with most C/C++ libraries rely on data stored in thread local storage (TLS) or fiber local storage (FLS). Among other things, these data areas store seed values for the standard library’s random number generator routines srand() and rand().

As operating systems get more sophisticated, standard TLS/FLS accessors are changing. The FLS data area on Windows, for example, now contains encoded pointers for security reasons. The routine that decodes these pointers resides in a system module that must be loaded to access FLS as needed for the standard srand() and rand() implementations. Because Windows contains versionable system modules, this means that on current versions of Windows, a call to rand() ends up deep in system code looking at module versions specified via activation contexts. Stepping into a call to rand() in the debugger generates call chains fifteen routines deep into those security-related functions.

Such a roundabout method for generating random numbers is costly in terms of performance. What’s more, the rand() function doesn’t generate very random numbers; it returns pseudorandom numbers in a sequence that won’t change from run to run. Most random number generators available online also are pseudorandom number generators.

In other words, suppose you write a program that uses the Park-Miller pseudorandom number generator algorithm. And suppose your program “seeds” the algorithm with a constant value. If your program then calls the algorithm over and over without “reseeding” then you get the same output from one run to the next. Not very random, is it?

How to make things more random? By getting the computer to count cosmic rays? That won’t happen without special hardware. But the hardware that most computers already have includes the system clock. A routine that checks the system clock periodically, masking off all but the least significant bits, will generate randomish numbers. On the other hand, checking the system clock many times in quick succession will generate numbers that are all the same, or very close. Not very random, once again.

The code below can generate 16-bit random numbers on Windows, in rapid succession if necessary, without special hardware. Unlike a straight pseudorandom algorithm, this algorithm generates a different set of numbers every time it runs, and it runs much faster than rand() implementations that require TLS or FLS access. It works by treating the system clock value as a seed value for a pseudorandom number generator and by preserving the original clock seed value from one invocation to the next. It doesn’t require FLS access because it stores this seed value as static data (in user module space). If you use this algorithm in a multithreaded scenario, you may need to ensure that a lock is held as the algorithm runs, to preserve the integrity of the static data. If the clock changes substantially between calls, then this algorithm updates the seed value, to maintain real randomness rather than always going for pseudorandom numbers.

Listing Two

// GetTypicallyTrueRandomNumber()
// Returns a 16-bit random number based on the clock.  If the clock hasn't 
// changed since the last time this function was called, returns a pseudo-
// random number based on an algorithm seeded from the clock.  Compiles with 
// Microsoft Visual C++ and outperforms rand() / srand() on Windows.  A 
// Unix/Linux version could call one of the ‹chrono› routines, such as 
// std::chrono::steady_clock::now(), in place of the Windows API calls coded 
// here.
//
#define PM88_CONST 16807   // full-period multiplier		
 
unsigned int GetTypicallyTrueRandomNumber(void)
{
    static unsigned int uSeed;          // Seed value
    unsigned int        uRandom = 0;    // 32 random bits
    FILETIME            ftCreation, ftTossThis1, ftTossThis2, ftUser;
 
    // Get a random number based on time.
    if (GetThreadTimes(GetCurrentThread(), &ftCreation, &ftTossThis1, 
                       &ftTossThis2, &ftUser))
    {
        uRandom = ftCreation.dwLowDateTime - ftUser.dwLowDateTime;
 
        if (!uRandom)
        {
            GetSystemTimeAsFileTime(&ftCreation);
            uRandom = ftCreation.dwLowDateTime;
        }
    }
 
    if (uRandom == uSeed)
    {
        // Get a pseudorandom number based on the last seed value.
        uSeed = (unsigned int) ((uSeed * PM88_CONST) % 0x7fff);
    }
    else
    {
        // Save this seed value for next time.
        uSeed = uRandom;
    }
 
    return (unsigned int) uRandom & 0xffff;
}

This algorithm is useful with skip lists or any other use case where both randomness and performance matter. Finding any other online code that successfully combines a clock-based approach with a pseudorandom approach is hard; at least, I couldn’t find any, when I looked. In the above code, the clock need only change slightly to reseed the pseudorandom portion of the algorithm. You could modify it to ensure that a more substantial clock change has occurred between algorithm invocations, by comparing some threshold value with the absolute value of the difference between the current clock reading and the saved seed value. But for most purposes the output is sufficiently random, just as the algorithm is coded here. A Unix / Linux version of the routine could call clock() or one of the chrono::steady_clock routines where the Windows-specific Get* calls appear in this code.


Copyright © 2018 developforperformance.com.

Develop for Performance