High Performance Debug Logging
By Kirk J Krauss
Mostly, the computerized world is getting faster. We get more bandwidth, more throughput, and more performance as bus and network capacity increases and as more and faster processors keep rolling out. Increasingly sophisticated application programs are built, and debugging them becomes increasingly tricky. Yet we still rely mostly on traditional debugging techniques — and some of those are unfortunately becoming slower.
Debug logging is a time-honored way to find out why software repeatedly gets into an unexpected state. Because software is sophisticated, knowing the right data to log isn’t always straightforward. It can be tempting to log everything that might be relevant, in the hope of collecting the right clues for diagnosing the problem at hand. The sophistication of software is hardly the only barrier to logging all of that data. The security mechanisms put in place to stave off malware, exploits, and just plain mistakes have tended to add overhead to debug logging, even as more data typically needs to be logged.
Windows provides a typical example of the security slowdown affecting modern debug logging. As with other platforms, most Windows native code developers turn to the runtime libraries provided with their tool stacks, to access files for most purposes such as creating a log. In recent versions of Visual Studio, Microsoft has provided secure versions of the C runtime functions that require variable argument lists. The legacy versions are either deprecated or simply reimplemented atop the new, more secure versions.
For example, where you once would have called vfprintf() to write formatted text to a file, you now get the relatively secure vfprintf_s(). How does this help you? Among other things, you get a guarantee that the string you pass in is treated as a NULL-terminated string. This can prevent extraneous, possibly malicious data from being passed along with the string. You also get parameter validation, including validation of the format string (that’s the parameter containing the ‘%’ characters). Applications that rely on these functions are indeed more secure than those that used the older printf() / scanf() family of functions.
There are a couple of downsides to these security measures. One is that if you use them explicitly, they make your Windows code less portable, and that’s probably why their older counterparts have been deprecated but left in place, and/or made to implicitly invoke them. Another is that there’s a performance penalty associated with the added security checking — a penalty you may no longer be able to avoid without abandoning the C runtime functions altogether. If you build your application using Visual Studio 2008 or later, you can find these secure C runtime functions not only underlying many of their older counterparts but also calling deep into Windows code that resolves activation contexts to choose which versions of modules to work with. In other words, those older counterparts, like printf() and scanf(), do these same things too, as of Visual Studio 2008. And if your code still calls these older functions, the compiler will issue warnings stating that they’ve been deprecated.
Since that already makes file I/O via the C runtime functions slower and less portable than it used to be, I set out to pursue a fast solution for debug logging by going as direct-to-disk as possible. I like to debug using logs whenever I need to understand detailed history leading up to a given condition that I can see in the debugger. I don’t need any particular security; this is debugging code. Sometimes I need the logging task to operate fast, if for no other reason than to prevent the overhead of logging from radically changing the behavior of the program I’m trying to debug. I want the fastest logging I can get.
Before I developed the code in the listings below, I already had a function that wrote to a log file using vfprintf(). And I had long since arranged for this logging function to be callable from all sorts of places in special builds of my application. With those builds, I can arrange to enable various types of logging activities. I didn’t want any of that code to change. Instead, I decided to replace the call to vfprintf(), and to go with a faster function that I would write myself.
There’s a fairly straightforward way for an application to bypass the Windows disk buffering mechanism. If you use it, your code can write straight to the sectors on your hard drive. The general technique is documented in a few places like these:
http://msdn.microsoft.com/en-us/library/cc644950(VS.85).aspx
https://support.microsoft.com/en-us/help/99794/info-file-flag-write-through-and-file-flag-no-buffering
The code in Listings 1 and 2 provides an implementation of this direct-to-disk technique. These listings effectively contain high-performance replacements for the C runtime functions fopen() and vfprintf(). Reading the above Microsoft documentation, the most significant tricks I found for achieving high-performance debug logs on Windows are these:
- Create your log file using these flags:
FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING - Create your own buffer to hold a data set that can be written to the file at least one disk sector at a time.
- It’s important that you write an entire sector at once, or multiple entire sectors, when you use the above flags. Otherwise, probably nothing will be written.
The implementation I use for creating the debug log file with the above flags is in Listing 1. The CreateLogFile() function sets things up so that your program will generate a new log file each time the function is called. The Log() function calls the code of Listing 2 to write text to a log file that’s been opened via CreateLogFile().
// Headers for code in these listings
#include ‹windows.h›
#ifdef __STDC__
# include ‹stdarg.h›
#else
# include ‹varargs.h›
#endif
#include ‹io.h>
Listing One
// This code is Windows-specific, but the approach can be used on any platform // that supports unbuffered I/O. // Log() // Interface for logging formatted text. // int Log(HANDLE hLog, const char *szFormat, ...) { va_list args; int len; int iRetVal = 0; static int iNumOperations; va_start(args, szFormat); len = BufferLogSector(hLog, szFormat, args); va_end(args); iNumOperations++; return(iRetVal); } // CreateLogFile() // Sets up a debug log file with a unique name. Bails if we can't // access the path, which can be specified in the environment or // defaults to c:\temp. // HANDLE CreateLogFile(void) { int iLogPathSize; HANDLE hTestLog = INVALID_HANDLE_VALUE; char *szLog, *szBase, *szExt; int i = 0; iLogPathSize = GetEnvironmentVariable("DEBUG_LOG", NULL, 0); if (iLogPathSize == 0) { szLog = strdup("c:\\temp\\debug.log"); iLogPathSize = strlen(szLog); } else { szLog = (char *) alloca(iLogPathSize + 1); GetEnvironmentVariable("DEBUG_LOG", szLog, iLogPathSize); } szBase = alloca(iLogPathSize + 1); szExt = alloca(7); strcpy(szBase, szLog); while (1) { int lastError = GetLastError(); hTestLog = CreateFile(szLog, // name of the write GENERIC_WRITE, // open for writing FILE_SHARE_READ, // share read access NULL, // default security CREATE_NEW, // overwrite existing FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING, NULL); // no attr. template lastError = GetLastError(); if (hTestLog == INVALID_HANDLE_VALUE) { // Found an existing file. Try for a new log file name. // Use the original name as a "base" and tack numbers onto the end. // strcpy(szLog, szBase); _itoa(i++, szExt, 10); strcat(szLog, "."); strcat(szLog, szExt); } else { break; } } if (!hTestLog || hTestLog = INVALID_HANDLE_VALUE) { // Flag a message like ("Cannot open log file %s.nn\n", szLog) } else { // Start this log off right. Log(hDebugLog, "Debug logging:\r\n"); char szWood[] = "bug free!"; // You might want to mention your application's name // here, instead, in case someone stumbles across your // log and wonders what it's about. Also can add an // identifier for a run, a timestamp, etc. Log(hDebugLog, "It's big, it's heavy, it's %s\r\n\r\n", szWood); // Leave the log file open here; it can be closed at end of the run. } return hTestLog; }
To write text to the log file, my application calls the functions in Listing 2. These functions format incoming text similarly to printf() and buffer the result in a memory block big enough to span several disk sectors on current Windows systems. The memory block is filled up to a sector-aligned limit within the block. (I chose a limit at 0x3fff bytes from the start of the block, based on what Microsoft has to say about sector sizes on current systems; I got into enough places where I had to accommodate this limit in subtractions and multiplications that I open-coded my numbers that are based on it, in the example code here.) The function then writes those bytes to disk and copies any text that’s been stored above the limit to the start of the block. After the disk write, subsequent calls to this function continue filling the memory block from there, until the limit is reached again.
Listing Two
// High-performance debug logging // The following functions, BufferLogSector(), Bufprintf(), and // WriteLogSector(), are used to maintain a sector-aligned data area for // debug log data. They're invoked via Log() as follows: // // o Before these functions are called, the log file should have been // created with the flags... // FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING // o Log() is called vfprintf-style, and in turn calls BufferLogSector(). // o The first time BufferLogSector() is called, it sets up a sector-aligned // data area with a lock to serialize access to it. // o Text strings are sequentially stored in the data area via Bufprintf(). // o When the data area fills or overflows, Bufprintf() writes the sector- // aligned data to disk in a single disk write operation via // WriteLogSector(). Any overflow data is left at the base of the data // area. // // Any text remaining in the data area after a Log() call can be written // out as needed, via a call to the FlushLog() function. This is probably // just useful for invoking from the debugger to ensure that the logfile is // complete. // // The static variables below are shared by all of the aforementioned // functions except FlushLog(). Access to them from Log() and its // descendants is serialized by the logFileCS lock initialized in Log(). // If FlushLog() is used programmatically, grab the logFileCS lock in it // too, to keep access to the variables below thread safe. // static bool bLoggingInitialized = false; static CRITICAL_SECTION logFileCS; // Serializes writes to debug log data static char * pszStaticBufBase; // Sector-aligned base of debug log data static char * pszStaticBufLimit; // Sector-aligned limit of debug log data static char * pszStaticBufOverflow; // "Top" of overflow space "above" limit static char * pszStaticBuffer; // Pointer to current text data position // WriteLogSector() // Called when the sector-aligned debug log data area is full, this function // writes the full sector of the data area out to the debug log file. // void WriteLogSector(HANDLE hLog) { int iLastError, iOverflowBytes; int iBytesWritten = 0; WriteFile(hLog, pszStaticBufBase, 0x4000, &iBytesWritten, NULL); iLastError = GetLastError(); // Did any data spill into the overflow area? iOverflowBytes = ((int) ((AddrT) pszStaticBuffer - (AddrT) pszStaticBufLimit)) - 1; if (iOverflowBytes) { // Copy it to the base of the sector-aligned data area. memcpy(pszStaticBufBase, pszStaticBufLimit + 1, iOverflowBytes); } // Set up to continue buffering into the sector-aligned data area. pszStaticBuffer = (char *) ((AddrT) pszStaticBuffer - 0x4000); } // CHAR_BUF and STR_BUF macros // These are used throughout the Bufprintf() function below. // #define CHAR_BUF(c) \ { \ if (pszStaticBuffer <= pszStaticBufOverflow) \ { \ *pszStaticBuffer++ = (c); \ } \ else \ { \ return pszStaticBufOverflow - pszStaticBuffer; \ } \ } #define STR_BUF(s) \ { \ size_t str_size = strlen(s); \ if (str_size < field_width) \ { \ memset(pszStaticBuffer, cLeading, (field_width - str_size)); \ pszStaticBuffer += field_width - str_size; \ } \ if ((pszStaticBuffer + str_size) >= pszStaticBufOverflow) \ { \ strncpy(pszStaticBuffer, (s), (pszStaticBufOverflow - pszStaticBuffer)); \ return (pszStaticBufOverflow - szpszStaticBuffer); \ } \ else \ { \ strcpy(pszStaticBuffer, (s)); \ pszStaticBuffer += str_size; \ } \ } // Bufprintf() // Places formatted text strings into the sector-aligned buffer used for // high-performance debug logging. Writes its content to disk by calling // WriteLogSector() when the buffer overruns. Any overflowed data is moved // to the beginning of the buffer. Subsequent calls to this function keep // filling the buffer beyond its previous, unwritten content. // int Bufprintf(HANDLE hLog, char * szBuffer, const char * szFormat, va_list args) { bool bIsArg64 = false; unsigned int iFieldWidth = 0; char cLeading = ' '; int iRetVal = 0; while (*szFormat) { if (*szFormat != '%') { if (*szFormat == '\n') { CHAR_BUF('\r'); } CHAR_BUF(*szFormat); } else { szFormat++; // Handle padding. if (isdigit((int) *(szFormat))) { if ((char) *(szFormat) == '0') { cLeading = '0'; szFormat++; } while (isdigit((int) *(szFormat))) { iFieldWidth = 10 * iFieldWidth + (int) ((char) *(szFormat) - '0'); szFormat++; } } // Handle the I64 size modifier. if ((*(szFormat) == 'I') && (*(szFormat + 1) == '6') && (*(szFormat + 2) == '4') ) { szFormat += 3; bIsArg64 = true; // I64 => __int64 } switch(*(szFormat)) { case 'g': case 'e': { (void) va_arg(args, double); /* Use up arg */ } // Fall through. case 'c': { char arg = va_arg(args, int); cLeading = ' '; CHAR_BUF(arg); } break; case 'C': { char tmpbuf[16]; wchar_t arg = va_arg(args, wchar_t); WideCharToMultiByte(CP_ACP, 0, &arg, 1, tmpbuf, 16, NULL, NULL); cLeading = ' '; CHAR_BUF(tmpbuf[0]); } break; case 's': { char* arg = va_arg(args, char*); if (arg == NULL) { arg = "‹null›"; } cLeading = ' '; STR_BUF(arg); } break; case 'S': { // Assume a 2048-byte range here. char tmpbuf[2048]; wchar_t *arg = va_arg(args, wchar_t*); if (arg == NULL) { arg = L"‹null›"; } WideCharToMultiByte(CP_ACP, 0, arg, -1, tmpbuf, 2048, NULL, NULL); cLeading = ' '; STR_BUF(tmpbuf); } break; case 'f': // Effectively %10.2f case 'd': case 'u': { char * szDigits = "0123456789"; double fArg; double fValue = 0.0; __int64 iRemainder; unsigned int iTmpDigit; bool bNegative = false; char szBuf[100]; char* pszBuf = &szBuf[100]; *--pszBuf = 0; if (*szFormat == 'f') { if (iFieldWidth) { STR_BUF("[padded float: can't handle] "); } fArg = va_arg(args, double); if (fArg < 0) { fArg = -fArg; bNegative = true; } iRemainder = (long) fArg; // Truncates fValue = fArg - (double) iRemainder; if (fArg > 1e10) { STR_BUF( " [float fArg too long: can't handle] "); break; } } else if (*szFormat == 'd') { if (bIsArg64) { __int64 n = va_arg(args, __int64); if (n < 0) { bNegative = true; n = - n; } iRemainder = n; bIsArg64 = false; } else { int n = va_arg(args, int); if (n < 0) { bNegative = true; n = - n; } iRemainder = (__int64) n; } } else // *szFormat == 'u' { if (bIsArg64) { iRemainder = va_arg(args, unsigned __int64); bIsArg64 = false; } else { iRemainder = (__int64) va_arg(args, unsigned); } } do { iTmpDigit = (unsigned int) iRemainder % 10; *--pszBuf = szDigits[iTmpDigit]; iRemainder = iRemainder / 10; } while (iRemainder); if (bNegative) { *--pszBuf = '-'; } STR_BUF(pszBuf); if (*szFormat == 'f') { CHAR_BUF('.'); CHAR_BUF(szDigits[(long)(fValue *= 10) % 10]); CHAR_BUF(szDigits[(long)(fValue *= 10) % 10]); } } break; case 'p': #if defined(PLATFORM64) bIsArg64 = true; #endif case 'x': { char* szDigits = "0123456789abcdef"; unsigned __int64 iRemainder; unsigned int iValue; char szBuf[100]; char* pszBuf = &szBuf[100]; *--pszBuf = 0; if (bIsArg64) { iRemainder = va_arg(args, unsigned __int64); bIsArg64 = false; } else { iRemainder = (unsigned __int64) va_arg(args, unsigned int); } do { iValue = (unsigned int) iRemainder % 16; *--pszBuf = szDigits[iValue]; iRemainder = iRemainder / 16; } while (iRemainder); STR_BUF(pszBuf); } break; case '%': CHAR_BUF(*szFormat); break; default: // Unsupported type. // Buffer an error message, and go on to any further args. { int n = va_arg(args, int); } STR_BUF(" [complex printf arg: can't handle] "); CHAR_BUF('%'); CHAR_BUF(*szFormat); CHAR_BUF(']'); break; } } szFormat++; if (pszStaticBuffer > pszStaticBufLimit) { WriteLogSector(hLog); } } CHAR_BUF(0); iRetVal = pszStaticBuffer - szBuffer; return iRetVal; } // BufferLogSector() // Maintains the sector-aligned data area used for debug logging. // Stores the log data in this area, to achieve fast writes that bypass // other buffering / caching provided by the system. // int BufferLogSector(HANDLE hLog, const char * szFormat, va_list args) { int iRetVal = 0; char * szBuffer = pszStaticBuffer; if (hLog && szFormat) { if (!bLoggingInitialized) { InitializeCriticalSection(&logFileCS); bLoggingInitialized = true; } EnterCriticalSection(&logFileCS); if (!pszStaticBuffer) { // Create the sector-aligned data area in memory, to be filled // from the base to the limit. // pszStaticBuffer = VirtualAlloc(NULL, 0x7fff, MEM_COMMIT, PAGE_READWRITE); pszStaticBufBase = szBuffer = pszStaticBuffer; pszStaticBufLimit = pszStaticBuffer + 0x3fff; // Everything from the base to the limit will get written to the // disk at once. The rest will get moved "down" to the base // afterwards. // pszStaticBufOverflow = pszStaticBuffer + 0x7fff; } iRetVal = Bufprintf(hLog, szBuffer, szFormat, args); LeaveCriticalSection(&logFileCS); } return iRetVal; }
When you run with debug logging using the code in Listings 1 and 2, you’ll find that the debug log appears to be 0 bytes while your application is running. That’s because the file system doesn’t update itself with knowledge that bytes have been written using the fast-acting scheme you’ve set up. But you can still dump the file or read it into an editor while the application runs.
When your program exits, or when you attach a debugger and want to see the whole log including the text that’s still in the buffer, you can arrange to call the function in Listing 3. This function flushes any buffered text from the memory block to the disk. Since the buffer presumably won’t be full when this function is called, the unused space is filled with NULLs, up to the limit described above. You should avoid calling this function on a routine basis during the run, since these NULLs get written to the log file, making it bigger than necessary. In other words, if you stop in the debugger, the code in Listing 3 is necessary only when there’s critical information still in the buffer. If you need that information, you can use the Set Next Statement debugging feature to execute the code of Listing 3.
Listing Three
// FlushLog() // Writes buffered data maintained by Log() to the next sector(s) in the log // file. The buffer's data area is written to disk at once. Unused bytes // are filled with NULLs before the write. // // Writing NULLs can be minimized by determining the disk's sector size and // writing just the number of sectors needed to store the buffered data. // This example code assumes a sector size that's equal to or a multiple of // the sector size of any formatted disk, including disks on Win64 systems // with large volume sizes according to MSDN. // int FlushLog(HANDLE hLog) { int iBytesToWrite = 0x4000; int iBytesWritten; if ((pszStaticBuffer - pszStaticBufBase) > iBytesToWrite) { iBytesToWrite = 0x8000; // Assume (log entry size) < (sector size). } memset(pszStaticBuffer, /* fill with NULLs */ 0, (iBytesToWrite - (pszStaticBuffer - pszStaticBufBase) - 1); WriteFile(hLog, pszStaticBufBase, iBytesToWrite, &iBytesWritten, NULL); if (iBytesWritten) { FlushFileBuffers(hLog); memset(pszStaticBufBase, 0, iBytesWritten); pszStaticBuffer = pszStaticBufBase; } return iBytesWritten; }
Using this approach to high-performance debug logging, I was able to generate debug logs in the hundreds of megabytes while running a multithreaded MFC application on Windows 7, without noticeably slowing down the application compared to running it without debug logging. Then I was able to use the logs to diagnose threading problems that didn’t manifest with my old logging implementation based on vfprintf() calls. Buffered I/O is possible on other platforms, such as Linux, where it may also become necessary as built-in security checks make their way across development environments and platforms.
Copyright © 2018 developforperformance.com.