Random Treatment for Random Data

Unlike typical methods, this random number generator algorithm gives you a different set of numbers every time it’s used in a run without requiring a new “seed” value. It’s also much faster than typical C standard library implementations. A JavaScript® version appears here, too.

By Kirk J Krauss

Random number generators serve many purposes, from data organization to cryptographic security, among a wide range of others. A native code method for true random number generation can be useful for scalable data management. A full stack method can help maximize cryptographic security. We’ll consider each of these use cases in turn.

Getting True Random Numbers in Native Code

Skip lists of the sort described in several articles here at developforperformance.com are among the most effective structures for searchability at scale. A discussion delving into construction of skip lists can’t avoid at least a short foray into the concept of randomness. That’s because a garden-variety skip list doesn’t know which kind of data it will be containing. For the sake of keeping things generic, most introductions to skip lists describe setting the level for each entry based on one or more random numbers. A well-arranged list has a majority of entries at level 0, considerably fewer entries at level 1, considerably fewer still at level 2, and so forth up the line. This can be accomplished using random control, based on code such as the following that grabs random numbers over and over:

Listing One

#include ‹windows.h›
#include ‹stdafx.h›
#include ‹stdio.h›

// CreateSkiplistEntry()
//  Returns a pointer to a skip list entry, including a number of pointers to 
//  next entries as indicated by the level.
//
//  Every entry has a level.  A search through the skip list involves walking 
//  at a given level until the target is reached, or if the target is missed, 
//  then continuing from the entry before the target, but at the next lower 
//  level.
//
//  If the indicated level is out of range (e.g. -1) then a random level is 
//  indicated.
//
void * SkipList::CreateEntry(int iLevel, unsigned int uSizeOfEntry)
{
    SkipListEntry *pEntry = NULL;
 
    if (iLevel < 0 /* || [CAN ALSO CHECK] iLevel > this->pBaseEntry->Level */)
    {
        // The new entry will have a random level.
        while ((GetTypicallyTrueRandomNumber() % 2))
        {
            if (++iLevel == this->pBaseEntry->Level)
            {
                break;
            }
        }
    }
 
    pEntry = (SkipListEntry *) malloc(uSizeOfEntry);
    pEntry->Level = iLevel;			
    return (void *) pEntry;
}

A custom random number generator routine is invoked here, not only as a randomness enhancement over the standard routines available on modern systems, but also as a performance enhancement. But what’s the performance penalty of getting random numbers by calling those standard routines? The stock random number generator routines available with most C/C++ libraries rely on data stored in thread local storage (TLS) or fiber local storage (FLS). Among other things, these data areas store seed values for the standard library’s random number generator routines srand() and rand().

As operating systems get more sophisticated, standard TLS/FLS accessors are changing. The FLS data area on Windows, for example, now contains encoded pointers for security reasons. The routine that decodes these pointers resides in a system module that must be loaded to access FLS as needed for the standard srand() and rand() implementations. Because Windows contains versionable system modules, this means that on current versions of Windows, a call to rand() ends up deep in system code looking at module versions specified via activation contexts. Stepping into a call to rand() in the debugger generates call chains fifteen routines deep into those security-related functions.

Such a roundabout method for generating random numbers is costly in terms of performance. What’s more, the rand() function doesn’t generate very random numbers; it returns pseudorandom numbers in a sequence that won’t change from run to run, at least not without a “reseeding” step. Most random number generators available online similarly are in fact pseudorandom number generators.

In other words, suppose you write a program that uses the Park-Miller pseudorandom number generator algorithm. And suppose your program seeds the algorithm with a constant value. If your program then calls the algorithm over and over, then you get just the same output from one run to the next. Not very random, is it?

How to make things more random? How to even come up with an actually random seed value? By getting the computer to count cosmic rays? That won’t happen without special hardware. But the hardware that most computers already have includes the system clock. A routine that checks the system clock periodically, masking off all but the least significant bits, will generate randomish numbers. On the other hand, checking the system clock many times in quick succession will generate numbers that are all the same, or very close. Not very random, once again.

The code below can generate 16-bit random numbers on Windows, in rapid succession if necessary, without special hardware. Unlike a straight pseudorandom algorithm, this algorithm generates a different set of numbers every time it runs, and it runs much faster than rand() implementations that require TLS or FLS access. It works by treating the system clock value as a seed value for a pseudorandom number generator and by preserving the original clock seed value from one invocation to the next. It doesn’t require FLS access because it stores this seed value as static data (in user module space). If you use this algorithm in a multithreaded scenario, you may need to ensure that a lock is held as the algorithm runs, to preserve the integrity of the static data. Whenever the clock changes substantially between calls, this algorithm updates the seed value, to bring in true randomness rather than always going for pseudorandom numbers.

Listing Two

// GetTypicallyTrueRandomNumber()
// Returns a 16-bit random number based on the clock.  If the clock hasn't 
// changed since the last time this function was called, returns a pseudo-
// random number based on an algorithm seeded from the clock.  Compiles with 
// Microsoft Visual C++ and outperforms rand() / srand() on Windows.  A 
// Unix/Linux version could call one of the ‹chrono› routines, such as 
// std::chrono::steady_clock::now(), in place of the Windows API calls coded 
// here.
//
#define PM88_CONST 16807   // full-period multiplier (7 to the 5th)
#define BITNESS 0xffff     // mask for random bits to be returned (16)
 
unsigned int GetTypicallyTrueRandomNumber(void)
{
    static unsigned int uSeed;          // Seed value
    unsigned int        uRandom = 0;    // 32 random bits
    FILETIME            ftCreation, ftTossThis1, ftTossThis2, ftUser;
 
    // Get a random number based on time.
    if (GetThreadTimes(GetCurrentThread(), &ftCreation, &ftTossThis1, 
                       &ftTossThis2, &ftUser))
    {
        uRandom = ftCreation.dwLowDateTime - ftUser.dwLowDateTime;
 
        if (!uRandom)
        {
            GetSystemTimeAsFileTime(&ftCreation);
            uRandom = ftCreation.dwLowDateTime;
        }
    }
 
    if (uRandom - uSeed < BITNESS)
    {
        // Get a pseudorandom number based on the last seed value.
        uSeed = (unsigned int) ((uSeed * PM88_CONST) % BITNESS);
    }
    else
    {
        // Save this seed value for next time.
        uSeed = uRandom;
    }
 
    return (unsigned int) uRandom & BITNESS;
}

This algorithm is useful with skip lists or any other use case where both randomness and performance matter. Finding any other online code that successfully combines a clock-based approach with a pseudorandom approach is hard; at least, I couldn’t find any, when I looked. In the above code, the clock need only change slightly to reseed the pseudorandom portion of the algorithm. You could modify it to ensure that a more substantial clock change has occurred between algorithm invocations, for example by comparing some threshold value with the absolute value of the difference between the current clock reading and the saved seed value. The above code simply applies a mask to make sure the whole 16 bits has ticked over. For most purposes the output is sufficiently random, just as the algorithm is coded here. A Unix / Linux version of the routine could call clock() or one of the chrono::steady_clock routines where the Windows-specific Get* calls appear in this code.

Getting True Random Numbers in JavaScript®

Random numbers are of course needed for more than just native code skip lists. In modern full-stack development, a good random number generator can make a big difference toward securing your systems. True randomness can play a significant role in cryptographic security measures and can be used for seeding popular crypto libraries, such as the OpenSSL library used for transport layer security (TLS, but not the same as the TLS described above). Providing actual random data to initialize cryptographic routines is often considered a best security practice.

The native code techniques described above depend on the system clock and expect it to be ticking over pretty quickly, compared to the time taken by functions that can call for true random numbers. Does JavaScript® provide access to a timer with high enough resolution to behave similarly? With at least some browsers where I’ve tried the following code, I’ve found that repeating calls to performance.now() can get a narrow range of results — in some cases, no change at all — unless there’s some computational, or other, delay between the calls. In those circumstances, trying to get 16-bit random numbers means mostly getting pseudorandom numbers unless I wait an appreciable amount of time for performance.now() to return something reasonably new. This can be true even with the browser settings arranged for the best available performance.now() precision. But some situations call for nothing more than an occasional true random number. For cases where random numbers are needed no more often than once every second or so, performance.now() can dish out apparently random 8-bit numbers. Here’s the code that either will do that, or will (like the above native code) back off to pseudorandom number generation if it’s called in more rapid succession.

Listing Three

// GetTypicallyTrueRandomNumber.js
// Returns an 8-bit random number based on the performance.now() timer.  In
// case the timer hasn't changed significantly since this function was last 
// called, returns a pseudorandom number based on a simple Park-Miller 
// algorithm seeded from the timer.  This code can be extended to return a 
// 12- or 16-bit random number, but the randomness of the results will depend 
// on the timing resolution of performance.now(), which involves such factors 
// as browser security settings and overall timer resolution constraints.  
//
const g_iPM88_Const = 16807; // Full-period multiplier (7 to the 5th)
const g_iBitness = 0xff;     // Can add 1 or 2 f's to get 12- or 16-bit output
var g_iSeed = 0;             // Stores last computed random number between calls
 
function GetTypicallyTrueRandomNumber()
{
    // Get a random integer.
	let iRandom = performance.now();
	iRandom = 100 * iRandom.toFixed(2);

	// Has the timer turned over, w/rt its least significant bits, since we 
	// were last here?
    if (iRandom - g_iSeed < g_iBitness)
    {
        // Get a pseudorandom number based on the last seed value.
        g_iSeed = (g_iSeed * g_iPM88_Const) % g_iBitness;
    }
    else
    {
        // Save this seed value for next time.
        g_iSeed = iRandom;
    }
 
    return iRandom & g_iBitness;
}

How does this code compare, performance-wise, with the built-in Math.random() function? Clearly, the Park-Miller algorithm incorporated in the above code could be swapped for calls into Math.random() and Math.floor(), which ideally will perform as well as Park-Miller and might even be just that. That makes the above code almost equivalent to arranging a call to Math.random() in the event that performance.now() doesn’t have enough resolution to provide random numbers as often as they’re needed. The performance difference between the above code and the built-in functionality comes down to the difference between checking the timer to decide whether to fall back on a pseudorandom return value, and just skipping that check and returning pseudorandom values no matter what.

A modest benefit of open-coding the Park-Miller algorithm above, rather than making that call to Math.random(), is that we get to use the same code to reseed the pseudorandom algorithm as we do to generate true random numbers whenever possible. But what about generating bigger random numbers, i.e. more than 8 bits apiece? What about keeping it as random as possible by waiting long enough to not fall back on pseudorandomness? This next code may not operate in a hurry, but if you know you’ll have the need for a 64-bit random number at some point, you can invoke something like this to make some asynchronous calls to the above routine, far enough in advance to have your big random number ready when you need it.

Listing Four

// GetBigTypicallyTrueRandomNumber.js
// Invokes an 8-bit random number generator at intervals to piece together
// a bigger random number.

// This value represents the number of random bits to be generated.
const g_iRandomBits = 64;

// This string contains the concatenated 8-bit results of multiple calls to 
// GetTypicallyTrueRandomNumber().
var g_strRandomNumberInHex;

// This is the number of actual random bits in the above value.
// It serves as a placeholder as they're being accumulated.
var g_iRandomBitsInString;

// These next values represent a one second interval for timing calls to the 
// 8-bit routine.  You can tweak the timing by adjusting the first value here.
//
const g_iIntervalMs = 1000;
var g_objInterval;

// This event is set up at load time and fires once a big random number is 
// all in place.
var g_eventDone;

// Accessor for DOM elements.
//
var $ = function(id)
{
	return document.getElementById(id);
}

// Calls the random number generator at intervals.
//
function AccumlateRandomBits()
{
	if (g_iRandomBitsInString < g_iRandomBits)
	{
		g_iRandomBitsInString += 8;
		g_strRandomNumberInHex += GetTypicallyTrueRandomNumber().toString(16);
	}
	else
	{
		// Shut down the calls to the random number generator routine.
		clearInterval(g_objInterval);
		document.body.dispatchEvent(g_eventDone);		
	}

	return;
}

// Arranges calls to the random number generator at intervals.
//
function GetBigTypicallyTrueRandomNumber()
{
	// Reset the global values that represent the collected result as well as 
	// the placeholder maintained during the run.
	g_strRandomNumberInHex = "";
	g_iRandomBitsInString = 0;

	// Disable the button that kicked off random number generation, until 
	// we're done generating one.
	$("getrandomnumber").disabled = true;	

	// Let 'em know we're going to take our time.
	$("randomnumber").innerHTML = " &mdash; Running &mdash; ";

	// Start up calls to the random number generator routine.
	g_objInterval = setInterval(AccumlateRandomBits, g_iIntervalMs);
	return;
}

// Displays a random number having the number of bits specified by the global 
// value that's set at TOF.
//
function DisplayRandomNumber()
{
	// Display the big random number.
	let iRandomNumber = parseInt(g_strRandomNumberInHex, 16);
	$("randomnumber").firstChild.nodeValue = iRandomNumber + " (0x" + g_strRandomNumberInHex +")";

	// Done with random number generation.  Enable the button again.
	$("getrandomnumber").disabled = false;	
	return;
}

// Resets the DOM elements that may have been updated via the above code.
//
function ClearForm()
{
	$("randomnumber").firstChild.nodeValue = "";
	return;
}

// Entry point.
//
window.onload = function()
{
	ClearForm();
	$("getrandomnumber").onclick = GetBigTypicallyTrueRandomNumber;
	g_eventDone = new CustomEvent("GotBigRandomNumber");
	document.body.addEventListener("GotBigRandomNumber", DisplayRandomNumber);	
	return;
}

The above code is complete enough to drive an HTML form like the one in Listing Five, below, so you can check out some 64-bit random numbers on-screen. You can set alternate values for g_iRandomBits, such as 32, to get any number of random bits within the valid range of an integer. But in case you change that global value and want to use this HTML with it, you might change the text within the ‹h4› tags to match.

Listing Five

‹!DOCTYPE HTML›
‹html›
‹head›
‹title›GetBigTypicallyTrueRandomNumber.js‹/title›
‹meta charset="UTF-8"›
‹meta name="description" content="Random number generator example"›
‹meta name="keywords" content="HTML,CSS,JavaScript"›
‹meta name="author" content="Kirk J Krauss"›
‹meta name="viewport" content="width=device-width, initial-scale=1.0"›
‹link href="GetBigTypicallyTrueRandomNumber.css" rel="stylesheet" /›
‹script src="GetTypicallyTrueRandomNumber.js"›‹/script›
‹script src="GetBigTypicallyTrueRandomNumber.js"›‹/script›
‹/head›

‹body›
‹main›
‹h4›64-bit random number generator‹/h4›
‹form id="randomgen_form" name="randomgen_form"›
‹input type="button" id="getrandomnumber" value="Get random number" /›
‹br /›‹br /›
‹p id="randomnumber"›&nbsp;‹/p›
‹/form›
‹/main›
‹/body›

‹footer›
‹p›‹small›Sample use case for a typically true random number generator routine.‹/small›‹/p›
‹/footer›
‹/html›

Here’s a bit of CSS code to go with that:

Listing Six

/* GetBigTypicallyTrueRandomNumber.css */
body
{
	font-family: 'Palatino', 'Palatino Linotype', 'serif';	
	font-weight: 200;
	background: #A6BAD8;
	color: #222222;
	padding: 2em;
}

textarea
{
	font-family: 'Palatino', 'Palatino Linotype', 'serif';
	font-weight: 200;
	font-size: 12pt;
	height: 100%;
	width: 100%;
	outline: none;
	background: none;
	border: none;
}

#getrandomnumber
{
	font-family: 'Helvetica Neue', 'Helvetica', 'Arial';
	font-weight: 200;
	font-size: 11pt;
	cursor: pointer;
	position: relative;
	top: 8px;
	left: 20px;
	text-align: center;
}

As a trial run using this example will show, when we’re relying on the timer to get random values, obviously we may not get them so fast as the pseudorandom way. An interpreted code environment limits our ability to do much about that, though we can reduce the g_iIntervalMs delay, to speed things along at risk of introducing some pseudorandomness. Anyway, the added performance cost of capturing true randomness in the smaller amounts of Listing Three is fairly modest, particularly if the occasional 8- or 16-bit random number is all we need. Unless we opt for a native code solution, a mix of randomness like this probably can’t be arranged to happen much faster than we’re doing it here.

Complete source code associated with this article, along with a performance comparison test case against straight Math.random() pseudorandom number generation, is available at GitHub > kirkjkrauss > RandomTreatmentForRandomData.

Develop for Performance