Matching Wildcards in JavaScript®
By Kirk J Krauss
Regular expressions can be tricky to use — and sometimes even the decision to work with them can be tricky. On one hand, they offer a wide range of character pattern matching features, including support for matching wildcards. On the other hand, they require a syntax of their own that may not be entirely consistent from one platform (such as Notepad++, where you can easily play around with them) to another (such as your favorite environment for executing JavaScript®). Nevertheless, they’re among JavaScript’s standard built-in objects, made available through the RegExp() functionality. Since they’re built-in, we might as well just figure them into our code wherever we need to match wildcards or other character patterns, right?
In some cases, it makes perfect sense to apply the flexibility of regular expressions. But considering the relative simplicity of algorithms dedicated to wildcard pattern matching developed, typically, for use with native code, could it be that for JavaScript, too, a dedicated wildcard matching routine might be helpful? Since JavaScript is ordinarily handled as interpreted code, how does an algorithm open-coded in it fare, performance-wise, against the built-in RegExp()? Is it even straightforward to port a native code algorithm for matching wildcards to JavaScript? A few tests can shed light toward answering these questions.
An Implementation Ported from Native Code
As it turns out, converting ordinary character array handling logic from C/C++ to JavaScript® is about as simple as can be. JavaScript’s operators and expressions, after all, are derived from old-fashioned C. You don’t get to do a NULL check for the end of a string, but a check against JavaScript’s undefined property works about the same.
There are many carefully thought out algorithms for matching wildcards. One of them, which I’ve developed based on experience testing and optimizing these kinds of algorithms over several years, is already coded for C/C++ in two versions. There’s a first version based on pointer manipulation for native performance, and a second version based on relatively portable character array handling. Starting with that portable version, a serviceable JavaScript implementation involves only a few significant changes. The result appears in Listing One.
Listing One
// FastWildCompare.js // JavaScript(R) version of FastWildCompare(). // // Copyright 2018 Zettalocity Software. This is a Derivative Work based // on material that is copyright 2018 IBM Corporation and available at // // http://developforperformance.com/MatchingWildcards_AnImprovedAlgorithmForBigData.html // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // // Compares two text strings. Accepts '?' as a single-character wildcard. // For each '*' wildcard, seeks out a matching sequence of any characters // beyond it. Otherwise compares the strings a character at a time. // function FastWildCompare(strWild, strTame) { let iWild = 0; // Index for both tame and wild strings in upper loop let iTame; // Index for tame string, set going into lower loop let iWildSequence; // Index for prospective match after '*' (wild string) let iTameSequence; // Index for prospective match (tame string) // Find a first wildcard, if one exists, and the beginning of any // prospectively matching sequence after it. do { // Check for the end from the start. Get out fast, if possible. if (strTame[iWild] == undefined) { if (strWild[iWild] != undefined) { while (strWild[iWild++] == '*') { if (strWild[iWild] == undefined) { return true; // "ab" matches "ab*". } } return false; // "abcd" doesn't match "abc". } else { return true; // "abc" matches "abc". } } else if (strWild[iWild] == '*') { // Got wild: set up for the second loop and skip on down there. iTame = iWild; while (strWild[++iWild] == '*') { continue; } if (strWild[iWild] == undefined) { return true; // "abc*" matches "abcd". } // Search for the next prospective match. if (strWild[iWild] != '?') { while (strWild[iWild] != strTame[iTame]) { if (strTame[++iTame] == undefined) { return false; // "a*bc" doesn't match "ab". } } } // Keep fallback positions for retry in case of incomplete match. iWildSequence = iWild; iTameSequence = iTame; break; } else if (strWild[iWild] != strTame[iWild] && strWild[iWild] != '?') { return false; // "abc" doesn't match "abd". } ++iWild; // Everything's a match, so far. } while (true); // Find any further wildcards and any further matching sequences. do { if (strWild[iWild] == '*') { // Got wild again. while (strWild[++iWild] == '*') { continue; } if (strWild[iWild] == undefined) { return true; // "ab*c*" matches "abcd". } if (strTame[iTame] == undefined) { return false; // "*bcd*" doesn't match "abc". } // Search for the next prospective match. if (strWild[iWild] != '?') { while (strWild[iWild] != strTame[iTame]) { if (strTame[++iTame] == undefined) { return false; // "a*b*c" doesn't match "ab". } } } // Keep the new fallback positions. iWildSequence = iWild; iTameSequence = iTame; } else if (strWild[iWild] != strTame[iTame] && strWild[iWild] != '?') { // The equivalent portion of the upper loop is really simple. if (strTame[iTame] == undefined) { return false; // "*bcd" doesn't match "abc". } // A fine time for questions. while (strWild[iWildSequence] == '?') { ++iWildSequence; ++iTameSequence; } iWild = iWildSequence; // Fall back, but never so far again. while (strWild[iWild] != strTame[++iTameSequence]) { if (strTame[iTameSequence] == undefined) { return false; // "*a*b" doesn't match "ac". } } iTame = iTameSequence; } // Another check for the end, at the end. if (strTame[iTame] == undefined) { if (strWild[iWild] == undefined) { return true; // "*bc" matches "abc". } else { return false; // "*bc" doesn't match "abcd". } } ++iWild; // Everything's still a match. ++iTame; } while (true); }
The same code, but with shortened variable names and without the comments and white space, can be useful to load and run in minimal time in an interpreted JavaScript environment. Here’s the listing.
Listing Two
// FastWildCompare_min.js // Compares two text strings, the first of which may contain wildcards ('*' or '?'). // Copyright 2018 Zettalocity Software. // This is a Derivative Work, licensed under the Apache 2.0 license. // See comments for FastWildCompare.js, the human-readable form of this code. function FastWildCompare(w, t) { let i = 0;let j;let k;let l; do{if (t[i] == undefined){if (w[i] != undefined){while (w[i++] == '*')if (w[i] == undefined)return true;return false;}else return true;}else if (w[i] == '*'){j = i;while (w[++i] == '*')continue;if (w[i] == undefined)return true;if (w[i] != '?'){while (w[i] != t[j])if (t[++j] == undefined)return false;}k = i;l = j;break;}else if (w[i] != t[i] && w[i] != '?')return false;++i;} while (true); do{if (w[i] == '*'){while (w[++i] == '*')continue;if (w[i] == undefined)return true;if (t[j] == undefined)return false;if (w[i] != '?'){while (w[i] != t[j])if (t[++j] == undefined)return false;}k = i;l = j;}else if (w[i] != t[j] && w[i] != '?'){if (t[j] == undefined)return false;while (w[k] == '?'){++k;++l;}i = k;while (w[i] != t[++l])if (t[l] == undefined)return false;j = l;}if (t[j] == undefined){if (w[i] == undefined)return true;else return false;}++i;++j;} while (true); }
To try out this code via an HTML form, you can set up that form to have input fields for both a plain text (“tame”) string and a wildcard-containing (“wild”) string. Here’s a small example.
Listing Three
‹!DOCTYPE HTML› ‹!-- FastWildCompare.html --› ‹!-- Copyright 2018 Zettalocity Software. --› ‹!-- Licensed under the Apache License, Version 2.0 (the "License"); --› ‹!-- you may not use this file except in compliance with the License. --› ‹!-- You may obtain a copy of the License at --› ‹!-- http://www.apache.org/licenses/LICENSE-2.0 --› ‹!-- Unless required by applicable law or agreed to in writing, software --› ‹!-- distributed under the License is distributed on an "AS IS" BASIS, --› ‹!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --› ‹!-- See the License for the specific language governing permissions and --› ‹!-- limitations under the License. --› ‹html› ‹head› ‹title›FastWildCompare.js‹/title› ‹meta charset="UTF-8"› ‹meta name="description" content="Example code for matching wildcards"› ‹meta name="keywords" content="HTML,CSS,JavaScript"› ‹meta name="author" content="Kirk J Krauss"› ‹meta name="viewport" content="width=device-width, initial-scale=1.0"› ‹link href="FastWildCompare.css" rel="stylesheet" /› ‹script src="FastWildCompare_min.js"›‹/script› ‹script src="FastWildCompareForm.js"›‹/script› ‹/head› ‹body› ‹main› ‹h4›FastWildCompare.js — Matching wildcards example‹/h4› ‹form id="compare_form" name="compare_form"› ‹label for="tame"›Tame text: ‹/label› ‹input type="text" id="tame" name="tame" /› ‹label for="wild"›Wild text: ‹/label› ‹input type="text" id="wild" name="wild" /› ‹br /›‹br /› ‹input type="button" id="compare" value="Compare" /› ‹input type="button" id="clear" value="Clear" /› ‹br /›‹br /› ‹label for="result"›Match result: ‹/label› ‹p id="result"› ‹/p› ‹br /›‹br /› ‹/form› ‹/main› ‹/body› ‹footer› ‹p›‹small›Input a pair of text strings, one of which may contain wildcards (‘*’ or ‘?’).‹/small›‹/p› ‹/footer› ‹/html›
The CSS file referenced in the HTML code above can be as simple as this:
Listing Four
/* FastWildCompare.css */ body { font-family: 'Palatino', 'Palatino Linotype', 'serif'; font-weight: 200; background: #A6BAD8; color: #222222; padding: 2em; } textarea { font-family: 'Palatino', 'Palatino Linotype', 'serif'; font-weight: 200; font-size: 12pt; height: 100%; width: 100%; outline: none; background: none; border: none; } #compare { font-family: 'Helvetica Neue', 'Helvetica', 'Arial'; font-weight: 200; font-size: 11pt; cursor: pointer; position: relative; top: 8px; left: 20px; text-align: center; } #clear { font-family: 'Helvetica Neue', 'Helvetica', 'Arial'; font-weight: 200; font-size: 11pt; cursor: pointer; position: relative; top: 8px; left: 20px; text-align: center; }
The form can be driven by JavaScript code that invokes our ported routine for matching wildcards.
Listing Five
// FastWildCompareForm.js // Copyright 2018 Zettalocity Software. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // Accessor for DOM elements. // var $ = function(id) { return document.getElementById(id); } // Resets the DOM elements that may get updated via the code below. // function ClearForm() { $("wild").value = ""; $("tame").value = ""; $("result").firstChild.nodeValue = "" $("tame").focus(); return; } // Takes text strings from the two input fields. // Passes them to FastWildCompare() to find out whether there's a match. // function ValidateAndCompareInput() { var strWild = $("wild").value; var strTame = $("tame").value; var strMessage = ""; // Validation: Make sure we have two input strings. if ((typeof strWild == 'string' || strWild instanceof String) && (typeof strTame == 'string' || strTame instanceof String)) { let bMatch = FastWildCompare(strWild, strTame); if (bMatch) { $("result").firstChild.nodeValue = "Match."; } else { $("result").firstChild.nodeValue = "No match."; } } else { $("result").firstChild.nodeValue = "Please enter 'tame' and 'wild' text strings."; } return; } // Entry point. // window.onload = function() { ClearForm(); // Connect the buttons to their respective routines. $("compare").onclick = ValidateAndCompareInput; $("clear").onclick = ClearForm; // Get the Enter key to effectively click the "Compare" button. var fieldTame = $("tame"); var fieldWild = $("wild"); fieldTame.addEventListener("keyup", function(event) { if (event.keyCode === 13) { $("compare").click(); } }); fieldWild.addEventListener("keyup", function(event) { if (event.keyCode === 13) { $("compare").click(); } }); return; }
Now, by navigating your browser to the HTML file, you can compare plain text and wildcard text:
An Implementation Using Regular Expressions
A useful aspect of the code we’ve just discussed is that it doesn’t require the user to understand regular expressions. To be sure, the arsenal of special character syntax that you’d have available, in case the “Wild text” field were to support them all, could confound a typical user. That’s because special characters such as ‘$’ or ‘+’ modify the meaning of pattern matching using regular expressions. In fact, the ‘?’ character has an entirely different interpretation in the world of regular expressions than it means in conventional wildcard semantics, where it matches any single character in the “tame” string.
Because ‘*’ wildcard pattern matching is just one of the many features of regular expression pattern matching, it’s possible to simply rule out all the other possibilities with a bit of string manipulation. That way, the RegExp() function can be used to drive a user-friendly wildcard handler, like the kind discussed above, in only a few lines of code. It’s already been done. Don McCurdy’s wildcard-to-regexp.js routine, available at gist.github.com › donmccurdy/wildcard-to-regexp.js, provides the logic needed to escape the special characters of a regular expression, other than the ‘*’wildcard character. The only modification needed to make that code usable in a general-purpose routine for matching wildcards is to add a call to the test() method of the resulting RegExp object. Here’s the complete code, including that small addition:
Listing Six
// RegExWildCompare.js // A routine for matching wildcards based on JavaScript's built-in regular // expression handler. // // Copyright 2018 Zettalocity Software. This is a Derivative Work based // on material that is copyright 2018 Don McCurdy and available at // // https://gist.github.com/donmccurdy // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // // For each '*' wildcard, seeks out a matching sequence of any characters // beyond it. Otherwise compares the strings a character at a time. // Any characters that would be specially treated as regular expression // matching patterns are preserved based on code derived from // donmccurdy/wildcard-to-regexp.js, at // // https://gist.github.com/donmccurdy/6d073ce2c6f3951312dfa45da14a420f // function EscapeTameRegExChars(strWild) { return strWild.replace(/[|\\{}()[\]^$+*?.]/g, '\\$&'); } // Creates a RegExp from the given string, converting asterisks to .* // expressions, and escaping all other characters. // function RegExToWild(strInput) { return new RegExp('^' + strInput.split(/\*+/).map(EscapeTameRegExChars).join('.*') + '$'); } // Compares two text strings. For each '*' wildcard, seeks out a matching // sequence of any characters beyond it. Otherwise compares the strings // character by character. // function RegExWildCompare(strWild, strTame) { let strEscWild = new RegExp('^' + strWild.split(/\*+/).map(EscapeTameRegExChars).join('.*') + '$'); return strEscWild.test(strTame); }
As with the FastWildCompare() routine we’ve ported from C/C++ to JavaScript, we can create a fast-loading version of the above code with comments removed, variable names shortened, and white space minimized.
Listing Seven
// RegExWildCompare_min.js // Compares two text strings, the first of which may contain wildcards ('*'). // Copyright 2018 Zettalocity Software. // This is a Derivative Work, licensed under the Apache 2.0 license. // See comments for RegExWildCompare.js, the human-readable form of this code. function EscapeTameRegExChars(w){ return w.replace(/[|\\{}()[\]^$+*?.]/g, '\\$&'); } function RegExToWild(strInput){ return new RegExp('^' + strInput.split(/\*+/).map(EscapeTameRegExChars).join('.*') + '$'); } function RegExWildCompare(w, t){ let strEscWild = new RegExp('^' + w.split(/\*+/).map(EscapeTameRegExChars).join('.*') + '$');return strEscWild.test(t); }
The above JavaScript code can be used in an HTML form similar to the one we created previously.
Listing Eight
‹!DOCTYPE HTML› ‹!-- RegExWildCompare.html --› ‹!-- Copyright 2018 Zettalocity Software. --› ‹!-- Licensed under the Apache License, Version 2.0 (the "License"); --› ‹!-- you may not use this file except in compliance with the License. --› ‹!-- You may obtain a copy of the License at --› ‹!-- http://www.apache.org/licenses/LICENSE-2.0 --› ‹!-- Unless required by applicable law or agreed to in writing, software --› ‹!-- distributed under the License is distributed on an "AS IS" BASIS, --› ‹!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --› ‹!-- See the License for the specific language governing permissions and --› ‹!-- limitations under the License. --› ‹html› ‹head› ‹title›RegExWildCompare.js‹/title› ‹meta charset="UTF-8"› ‹meta name="description" content="Example code for matching wildcards"› ‹meta name="keywords" content="HTML,CSS,JavaScript"› ‹meta name="author" content="Kirk J Krauss"› ‹meta name="viewport" content="width=device-width, initial-scale=1.0"› ‹link href="RegExWildCompare.css" rel="stylesheet" /› ‹script src="RegExWildCompare_min.js"›‹/script› ‹/head› ‹body› ‹main› ‹h4›FastWildCompare.js — Matching wildcards example‹/h4› ‹form id="compare_form" name="compare_form"› ‹label for="tame"›Tame text: ‹/label› ‹input type="text" id="tame" name="tame" /› ‹label for="wild"›Wild text: ‹/label› ‹input type="text" id="wild" name="wild" /› ‹br /›‹br /› ‹input type="button" id="compare" value="Compare" /› ‹input type="button" id="clear" value="Clear" /› ‹br /›‹br /› ‹label for="result"›Match result: ‹/label› ‹p id="result"› ‹/p› ‹br /›‹br /› ‹/form› ‹/main› ‹/body› ‹footer› ‹p›‹small›Input a pair of text strings, one of which may contain wildcards (‘*’ or ‘?’).‹/small›‹/p› ‹/footer› ‹/html›
The CSS file to go with this can include code identical to what’s in Listing Four above (i.e. the same as FastWildCompare.css). The JavaScript code that manages the form also can be fairly similar; the only change needed is in the ValidateAndCompareInput() routine of Listing Five, which will call RegExWildCompare() instead of FastWildCompare().
Listing Nine
// Takes text strings from the two input fields. // Passes them to RegExWildCompare() to find out whether there's a match. // function ValidateAndCompareInput() { var strWild = $("wild").value; var strTame = $("tame").value; var strMessage = ""; // Validation: Make sure we have two input strings. if ((typeof strWild == 'string' || strWild instanceof String) && (typeof strTame == 'string' || strTame instanceof String)) { let bMatch = RegExWildCompare(strWild, strTame); if (bMatch) { $("result").firstChild.nodeValue = "Match."; } else { $("result").firstChild.nodeValue = "No match."; } } else { $("result").firstChild.nodeValue = "Please enter 'tame' and 'wild' text strings."; } return; }
So with a very small program based on the regular expression functionality that comes with JavaScript, we can do pattern matching with ‘*’ wildcards.
Now that we have two different ways to implement matching wildcards — one ported to JavaScript from native code, and another that uses the built-in regular expression functionality — how to choose among them? Clearly, if we want to support ‘?’ wildcard matching using regular expression methods, we’ll need to extend the character replacement logic of Listings Six and Seven. But that would add some performance overhead and code complexity. Is there a reason to do that, given that our ported routine already supports ‘?’ wildcards?
As with the port of the native code wildcard matching algorithm itself, a port of a set of routines for testing it isn’t hard to arrange. A set of correctness and performance comparison tests that I’ve put together with the input of many other developers can be augmented with HTML form handling similar to what’s coded above. So the tests can be run at the touch of a “Run test set” button, in any modern browser.
Here’s JavaScript code, ported from that native code test set, that’ll do both correctness and performance testing of the two routines for matching wildcards coded above. You can substitute any two similar routines, to compare their performance. Or you can set the bComparePerformance value to false, to check the correctness of just one routine based on a fairly comprehensive suite of tests for matching wildcards. Because our RegExWildCompare() routine doesn’t support ‘?’ wildcards, all of the correctness tests for single-character wildcard matching are left out in the bComparePerformance mode.
Listing Ten
// FastWildCompareTest.js // This is a set of performance comparison and correctness tests, for // routines for matching wildcards. // Copyright 2018 Zettalocity Software. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // These values set up specific sets of tests for the "Run test set" button. // const bCompareWild = true; const bCompareTame = true; const bCompareEmpty = true; // When this value is true, it sets up the run to compare the performance of // two routines for matching wildcards. Otherwise, just one routine is // tested, for correctness. // const bComparePerformance = true; // This next value sets a number of repetitions for performance comparison, // but applies only if the above Boolean value is true. Choose about as many // repetitions as you're expecting in the real world. // const nComparisonReps = 100; // These values store performance comparison results. // // To get meaningful results, be sure to disable any "reduced timer precision" // environment feature (e.g. privacy.reduceTimerPrecision in the about:config // settings in FireFox(R)). // var tf_cumulative; var tr_cumulative; // Accessor for DOM elements. // var $ = function(id) { return document.getElementById(id); } // This function compares a tame/wild string pair via each algorithm. // function Confirm(strTame, strWild, bExpectedResult) { let bPassed = true; let tf_delta = performance.now(); if (bExpectedResult != FastWildCompare(strWild, strTame)) { bPassed = false; } tf_cumulative += (performance.now() - tf_delta); if (bComparePerformance) { let tr_delta = performance.now(); if (bExpectedResult != RegExWildCompare(strWild, strTame)) { bPassed = false; } tr_cumulative += (performance.now() - tr_delta); } return bPassed; } // A set of wildcard comparison tests. // function TestWild() { let nReps = 1; let bAllPassed = true; if (bComparePerformance) { nReps = nComparisonReps; } while (nReps--) { // Case with first wildcard after total match. bAllPassed &= Confirm("Hi", "Hi*", true); // Case with mismatch after '*' bAllPassed &= Confirm("abc", "ab*d", false); // Cases with repeating character sequences. bAllPassed &= Confirm("abcccd", "*ccd", true); bAllPassed &= Confirm("mississipissippi", "*issip*ss*", true); bAllPassed &= Confirm("xxxx*zzzzzzzzy*f", "xxxx*zzy*fffff", false); bAllPassed &= Confirm("xxxx*zzzzzzzzy*f", "xxx*zzy*f", true); bAllPassed &= Confirm("xxxxzzzzzzzzyf", "xxxx*zzy*fffff", false); bAllPassed &= Confirm("xxxxzzzzzzzzyf", "xxxx*zzy*f", true); bAllPassed &= Confirm("xyxyxyzyxyz", "xy*z*xyz", true); bAllPassed &= Confirm("mississippi", "*sip*", true); bAllPassed &= Confirm("xyxyxyxyz", "xy*xyz", true); bAllPassed &= Confirm("mississippi", "mi*sip*", true); bAllPassed &= Confirm("ababac", "*abac*", true); bAllPassed &= Confirm("ababac", "*abac*", true); bAllPassed &= Confirm("aaazz", "a*zz*", true); bAllPassed &= Confirm("a12b12", "*12*23", false); bAllPassed &= Confirm("a12b12", "a12b", false); bAllPassed &= Confirm("a12b12", "*12*12*", true); // RegExWildCompare doesn't handle '?' wildcards. if (!bComparePerformance) { // From DDJ reader Andy Belf: a case of repeating text matching // the different kinds of wildcards in order of '*' and then '?'. bAllPassed &= Confirm("caaab", "*a?b", true); // This similar case was found, probably independently, by Dogan // Kurt. bAllPassed &= Confirm("aaaaa", "*aa?", true); } // Additional cases where the '*' char appears in the tame string. bAllPassed &= Confirm("*", "*", true); bAllPassed &= Confirm("a*abab", "a*b", true); bAllPassed &= Confirm("a*r", "a*", true); bAllPassed &= Confirm("a*ar", "a*aar", false); // More double wildcard scenarios. bAllPassed &= Confirm("XYXYXYZYXYz", "XY*Z*XYz", true); bAllPassed &= Confirm("missisSIPpi", "*SIP*", true); bAllPassed &= Confirm("mississipPI", "*issip*PI", true); bAllPassed &= Confirm("xyxyxyxyz", "xy*xyz", true); bAllPassed &= Confirm("miSsissippi", "mi*sip*", true); bAllPassed &= Confirm("miSsissippi", "mi*Sip*", false); bAllPassed &= Confirm("abAbac", "*Abac*", true); bAllPassed &= Confirm("abAbac", "*Abac*", true); bAllPassed &= Confirm("aAazz", "a*zz*", true); bAllPassed &= Confirm("A12b12", "*12*23", false); bAllPassed &= Confirm("a12B12", "*12*12*", true); bAllPassed &= Confirm("oWn", "*oWn*", true); // Completely tame (no wildcards) cases. bAllPassed &= Confirm("bLah", "bLah", true); bAllPassed &= Confirm("bLah", "bLaH", false); if (!bComparePerformance) { // Simple mixed wildcard tests suggested by Marlin Deckert. bAllPassed &= Confirm("a", "*?", true); bAllPassed &= Confirm("ab", "*?", true); bAllPassed &= Confirm("abc", "*?", true); // More mixed wildcard tests including coverage for false // positives. bAllPassed &= Confirm("a", "??", false); bAllPassed &= Confirm("ab", "?*?", true); bAllPassed &= Confirm("ab", "*?*?*", true); bAllPassed &= Confirm("abc", "?**?*?", true); bAllPassed &= Confirm("abc", "?**?*&?", false); bAllPassed &= Confirm("abcd", "?b*??", true); bAllPassed &= Confirm("abcd", "?a*??", false); bAllPassed &= Confirm("abcd", "?**?c?", true); bAllPassed &= Confirm("abcd", "?**?d?", false); bAllPassed &= Confirm("abcde", "?*b*?*d*?", true); // Single-character-match cases. bAllPassed &= Confirm("bLah", "bL?h", true); bAllPassed &= Confirm("bLaaa", "bLa?", false); bAllPassed &= Confirm("bLah", "bLa?", true); bAllPassed &= Confirm("bLaH", "?Lah", false); bAllPassed &= Confirm("bLaH", "?LaH", true); } // Many-wildcard scenarios. bAllPassed &= Confirm("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab", "a*a*a*a*a*a*aa*aaa*a*a*b", true); bAllPassed &= Confirm("abababababababababababababababababababaacacacacacacacadaeafagahaiajakalaaaaaaaaaaaaaaaaaffafagaagggagaaaaaaaab", "*a*b*ba*ca*a*aa*aaa*fa*ga*b*", true); bAllPassed &= Confirm("abababababababababababababababababababaacacacacacacacadaeafagahaiajakalaaaaaaaaaaaaaaaaaffafagaagggagaaaaaaaab", "*a*b*ba*ca*a*x*aaa*fa*ga*b*", false); bAllPassed &= Confirm("abababababababababababababababababababaacacacacacacacadaeafagahaiajakalaaaaaaaaaaaaaaaaaffafagaagggagaaaaaaaab", "*a*b*ba*ca*aaaa*fa*ga*gggg*b*", false); bAllPassed &= Confirm("abababababababababababababababababababaacacacacacacacadaeafagahaiajakalaaaaaaaaaaaaaaaaaffafagaagggagaaaaaaaab", "*a*b*ba*ca*aaaa*fa*ga*ggg*b*", true); bAllPassed &= Confirm("aaabbaabbaab", "*aabbaa*a*", true); bAllPassed &= Confirm("a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*", "a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*", true); bAllPassed &= Confirm("aaaaaaaaaaaaaaaaa", "*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*", true); bAllPassed &= Confirm("aaaaaaaaaaaaaaaa", "*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*", false); bAllPassed &= Confirm("abc*abcd*abcde*abcdef*abcdefg*abcdefgh*abcdefghi*abcdefghij*abcdefghijk*abcdefghijkl*abcdefghijklm*abcdefghijklmn", "abc*abc*abc*abc*abc*abc*abc*abc*abc*abc*abc*abc*abc*abc*abc*abc*abc*", false); bAllPassed &= Confirm("abc*abcd*abcde*abcdef*abcdefg*abcdefgh*abcdefghi*abcdefghij*abcdefghijk*abcdefghijkl*abcdefghijklm*abcdefghijklmn", "abc*abc*abc*abc*abc*abc*abc*abc*abc*abc*abc*abc*", true); bAllPassed &= Confirm("abc*abcd*abcd*abc*abcd", "abc*abc*abc*abc*abc", false); bAllPassed &= Confirm( "abc*abcd*abcd*abc*abcd*abcd*abc*abcd*abc*abc*abcd", "abc*abc*abc*abc*abc*abc*abc*abc*abc*abc*abcd", true); bAllPassed &= Confirm("abc", "********a********b********c********", true); bAllPassed &= Confirm("********a********b********c********", "abc", false); bAllPassed &= Confirm("abc", "********a********b********b********", false); bAllPassed &= Confirm("*abc*", "***a*b*c***", true); // A case-insensitive algorithm test. // bAllPassed &= Confirm("mississippi", "*issip*PI", true); if (!bComparePerformance) { // Tests suggested by other DDJ readers. bAllPassed &= Confirm("", "?", false); bAllPassed &= Confirm("", "*?", false); bAllPassed &= Confirm("", "", true); bAllPassed &= Confirm("a", "", false); } } return bAllPassed; } // A set of tests with no '*' wildcards. // function TestTame() { let nReps = 1; let bAllPassed = true; if (bComparePerformance) { nReps = nComparisonReps; } while (nReps--) { // Case with last character mismatch. bAllPassed &= Confirm("abc", "abd", false); // Cases with repeating character sequences. bAllPassed &= Confirm("abcccd", "abcccd", true); bAllPassed &= Confirm("mississipissippi", "mississipissippi", true); bAllPassed &= Confirm("xxxxzzzzzzzzyf", "xxxxzzzzzzzzyfffff", false); bAllPassed &= Confirm("xxxxzzzzzzzzyf", "xxxxzzzzzzzzyf", true); bAllPassed &= Confirm("xxxxzzzzzzzzyf", "xxxxzzy.fffff", false); bAllPassed &= Confirm("xxxxzzzzzzzzyf", "xxxxzzzzzzzzyf", true); bAllPassed &= Confirm("xyxyxyzyxyz", "xyxyxyzyxyz", true); bAllPassed &= Confirm("mississippi", "mississippi", true); bAllPassed &= Confirm("xyxyxyxyz", "xyxyxyxyz", true); bAllPassed &= Confirm("m ississippi", "m ississippi", true); bAllPassed &= Confirm("ababac", "ababac?", false); bAllPassed &= Confirm("dababac", "ababac", false); bAllPassed &= Confirm("aaazz", "aaazz", true); bAllPassed &= Confirm("a12b12", "1212", false); bAllPassed &= Confirm("a12b12", "a12b", false); bAllPassed &= Confirm("a12b12", "a12b12", true); // A mix of cases. bAllPassed &= Confirm("n", "n", true); bAllPassed &= Confirm("aabab", "aabab", true); bAllPassed &= Confirm("ar", "ar", true); bAllPassed &= Confirm("aar", "aaar", false); bAllPassed &= Confirm("XYXYXYZYXYz", "XYXYXYZYXYz", true); bAllPassed &= Confirm("missisSIPpi", "missisSIPpi", true); bAllPassed &= Confirm("mississipPI", "mississipPI", true); bAllPassed &= Confirm("xyxyxyxyz", "xyxyxyxyz", true); bAllPassed &= Confirm("miSsissippi", "miSsissippi", true); bAllPassed &= Confirm("miSsissippi", "miSsisSippi", false); bAllPassed &= Confirm("abAbac", "abAbac", true); bAllPassed &= Confirm("abAbac", "abAbac", true); bAllPassed &= Confirm("aAazz", "aAazz", true); bAllPassed &= Confirm("A12b12", "A12b123", false); bAllPassed &= Confirm("a12B12", "a12B12", true); bAllPassed &= Confirm("oWn", "oWn", true); bAllPassed &= Confirm("bLah", "bLah", true); bAllPassed &= Confirm("bLah", "bLaH", false); // RegExWildCompare doesn't handle '?' wildcards. if (!bComparePerformance) { // Single '?' cases. bAllPassed &= Confirm("a", "a", true); bAllPassed &= Confirm("ab", "a?", true); bAllPassed &= Confirm("abc", "ab?", true); // Mixed '?' cases. bAllPassed &= Confirm("a", "??", false); bAllPassed &= Confirm("ab", "??", true); bAllPassed &= Confirm("abc", "???", true); bAllPassed &= Confirm("abcd", "????", true); bAllPassed &= Confirm("abc", "????", false); bAllPassed &= Confirm("abcd", "?b??", true); bAllPassed &= Confirm("abcd", "?a??", false); bAllPassed &= Confirm("abcd", "??c?", true); bAllPassed &= Confirm("abcd", "??d?", false); bAllPassed &= Confirm("abcde", "?b?d*?", true); } // Longer string scenarios. bAllPassed &= Confirm("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab", true); bAllPassed &= Confirm("abababababababababababababababababababaacacacacacacacadaeafagahaiajakalaaaaaaaaaaaaaaaaaffafagaagggagaaaaaaaab", "abababababababababababababababababababaacacacacacacacadaeafagahaiajakalaaaaaaaaaaaaaaaaaffafagaagggagaaaaaaaab", true); bAllPassed &= Confirm("abababababababababababababababababababaacacacacacacacadaeafagahaiajakalaaaaaaaaaaaaaaaaaffafagaagggagaaaaaaaab", "abababababababababababababababababababaacacacacacacacadaeafagahaiajaxalaaaaaaaaaaaaaaaaaffafagaagggagaaaaaaaab", false); bAllPassed &= Confirm("abababababababababababababababababababaacacacacacacacadaeafagahaiajakalaaaaaaaaaaaaaaaaaffafagaagggagaaaaaaaab", "abababababababababababababababababababaacacacacacacacadaeafagahaiajakalaaaaaaaaaaaaaaaaaffafagaggggagaaaaaaaab", false); bAllPassed &= Confirm("abababababababababababababababababababaacacacacacacacadaeafagahaiajakalaaaaaaaaaaaaaaaaaffafagaagggagaaaaaaaab", "abababababababababababababababababababaacacacacacacacadaeafagahaiajakalaaaaaaaaaaaaaaaaaffafagaagggagaaaaaaaab", true); bAllPassed &= Confirm("aaabbaabbaab", "aaabbaabbaab", true); bAllPassed &= Confirm("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", true); bAllPassed &= Confirm("aaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaa", true); bAllPassed &= Confirm("aaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaa", false); bAllPassed &= Confirm("abcabcdabcdeabcdefabcdefgabcdefghabcdefghiabcdefghijabcdefghijkabcdefghijklabcdefghijklmabcdefghijklmn", "abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc", false); bAllPassed &= Confirm("abcabcdabcdeabcdefabcdefgabcdefghabcdefghiabcdefghijabcdefghijkabcdefghijklabcdefghijklmabcdefghijklmn", "abcabcdabcdeabcdefabcdefgabcdefghabcdefghiabcdefghijabcdefghijkabcdefghijklabcdefghijklmabcdefghijklmn", true); if (!bComparePerformance) { bAllPassed &= Confirm("abcabcdabcdabcabcd", "abcabc?abcabcabc", false); bAllPassed &= Confirm( "abcabcdabcdabcabcdabcdabcabcdabcabcabcd", "abcabc?abc?abcabc?abc?abc?bc?abc?bc?bcd", true); bAllPassed &= Confirm("?abc?", "?abc?", true); } } return bAllPassed; } // A set of tests with empty strings. // function TestEmpty() { let nReps = 1; let bAllPassed = true; if (bComparePerformance) { nReps = nComparisonReps; } while (nReps--) { // A simple case. bAllPassed &= Confirm("", "abd", false); // Cases with repeating character sequences. bAllPassed &= Confirm("", "abcccd", false); bAllPassed &= Confirm("", "mississipissippi", false); bAllPassed &= Confirm("", "xxxxzzzzzzzzyfffff", false); bAllPassed &= Confirm("", "xxxxzzzzzzzzyf", false); bAllPassed &= Confirm("", "xxxxzzy.fffff", false); bAllPassed &= Confirm("", "xxxxzzzzzzzzyf", false); bAllPassed &= Confirm("", "xyxyxyzyxyz", false); bAllPassed &= Confirm("", "mississippi", false); bAllPassed &= Confirm("", "xyxyxyxyz", false); bAllPassed &= Confirm("", "m ississippi", false); bAllPassed &= Confirm("", "ababac*", false); bAllPassed &= Confirm("", "ababac", false); bAllPassed &= Confirm("", "aaazz", false); bAllPassed &= Confirm("", "1212", false); bAllPassed &= Confirm("", "a12b", false); bAllPassed &= Confirm("", "a12b12", false); // A mix of cases. bAllPassed &= Confirm("", "n", false); bAllPassed &= Confirm("", "aabab", false); bAllPassed &= Confirm("", "ar", false); bAllPassed &= Confirm("", "aaar", false); bAllPassed &= Confirm("", "XYXYXYZYXYz", false); bAllPassed &= Confirm("", "missisSIPpi", false); bAllPassed &= Confirm("", "mississipPI", false); bAllPassed &= Confirm("", "xyxyxyxyz", false); bAllPassed &= Confirm("", "miSsissippi", false); bAllPassed &= Confirm("", "miSsisSippi", false); bAllPassed &= Confirm("", "abAbac", false); bAllPassed &= Confirm("", "abAbac", false); bAllPassed &= Confirm("", "aAazz", false); bAllPassed &= Confirm("", "A12b123", false); bAllPassed &= Confirm("", "a12B12", false); bAllPassed &= Confirm("", "oWn", false); bAllPassed &= Confirm("", "bLah", false); bAllPassed &= Confirm("", "bLaH", false); // Both strings empty. bAllPassed &= Confirm("", "", true); // Another simple case. bAllPassed &= Confirm("abc", "", false); // Cases with repeating character sequences. bAllPassed &= Confirm("abcccd", "", false); bAllPassed &= Confirm("mississipissippi", "", false); bAllPassed &= Confirm("xxxxzzzzzzzzyf", "", false); bAllPassed &= Confirm("xxxxzzzzzzzzyf", "", false); bAllPassed &= Confirm("xxxxzzzzzzzzyf", "", false); bAllPassed &= Confirm("xxxxzzzzzzzzyf", "", false); bAllPassed &= Confirm("xyxyxyzyxyz", "", false); bAllPassed &= Confirm("mississippi", "", false); bAllPassed &= Confirm("xyxyxyxyz", "", false); bAllPassed &= Confirm("m ississippi", "", false); bAllPassed &= Confirm("ababac", "", false); bAllPassed &= Confirm("dababac", "", false); bAllPassed &= Confirm("aaazz", "", false); bAllPassed &= Confirm("a12b12", "", false); bAllPassed &= Confirm("a12b12", "", false); bAllPassed &= Confirm("a12b12", "", false); // A mix of cases. bAllPassed &= Confirm("n", "", false); bAllPassed &= Confirm("aabab", "", false); bAllPassed &= Confirm("ar", "", false); bAllPassed &= Confirm("aar", "", false); bAllPassed &= Confirm("XYXYXYZYXYz", "", false); bAllPassed &= Confirm("missisSIPpi", "", false); bAllPassed &= Confirm("mississipPI", "", false); bAllPassed &= Confirm("xyxyxyxyz", "", false); bAllPassed &= Confirm("miSsissippi", "", false); bAllPassed &= Confirm("miSsissippi", "", false); bAllPassed &= Confirm("abAbac", "", false); bAllPassed &= Confirm("abAbac", "", false); bAllPassed &= Confirm("aAazz", "", false); bAllPassed &= Confirm("A12b12", "", false); bAllPassed &= Confirm("a12B12", "", false); bAllPassed &= Confirm("oWn", "", false); bAllPassed &= Confirm("bLah", "", false); bAllPassed &= Confirm("bLah", "", false); } return bAllPassed; } // Test runner for all the tests in the set. // function RunTests() { // Any test failure will set this value false. // We'll exit quickly after that. let bPassed = true; // Reset global values that store performance comparison results. tf_cumulative = tr_cumulative = 0; // Invoke the tests. if (bCompareTame) { bPassed = TestTame(); if (!bPassed) { $("testsetresult").firstChild.nodeValue = "Failed entirely-tame comparison test."; } } if (bPassed && bCompareEmpty) { bPassed = TestEmpty(); if (!bPassed) { $("testsetresult").firstChild.nodeValue = "Failed entirely-empty comparison test."; } } if (bPassed && bCompareWild) { bPassed = TestWild(); if (!bPassed) { $("testsetresult").firstChild.nodeValue = "Failed wildcard comparison test."; } } // When all tests pass, this is where we display the timings of the // routines being performance-compared. if (bPassed) { let strResult = "All tests passed"; if (bComparePerformance) { strResult += " (for both algorithms)"; } strResult += "."; $("testsetresult").firstChild.nodeValue = strResult; if (bComparePerformance) { // Show the timing results. $("time1").firstChild.nodeValue = "FastWildCompare: " + tf_cumulative.toFixed(2) + " ms"; $("time2").firstChild.nodeValue = "RegExWildCompare: " + tr_cumulative.toFixed(2) + " ms"; } } // Done with testing. Enable the button again. $("testset").disabled = false; return; } // Resets the DOM elements that may have been updated via the above code. // function ClearForm() { //$("compare_form").reset(); $("testsetresult").firstChild.nodeValue = ""; $("time1").firstChild.nodeValue = ""; $("time2").firstChild.nodeValue = ""; return; } // Displays a "please wait" status message. // Kicks off the set of tests defined by the global values set at TOF. // function MainTestRunner() { ClearForm(); // Disable the button that kicked off testing, until the testing is done. $("testset").disabled = true; if (bComparePerformance) { $("testsetresult").innerHTML = " — Running — Please wait — "; } // Start up the tests. setTimeout(RunTests, 100); return; } // Entry point. // window.onload = function() { $("testset").onclick = MainTestRunner; return; }
For performance comparison, the above code relies on a DOMHighResTimeStamp provided in most modern browsers. To get meaningful results, it’s necessary to disable the reduced timer precision feature that’s set by default in many browsers such as FireFox®. To disable the feature in FireFox, you can navigate to the about:config settings and switch off the reduced timer precision feature as follows:
To get similar timing resolution using Node.js®, you can replace the performance.now() calls in the above code with calls to process.hrtime(). A fairly high-resolution timer is necessary to single out the time spent in the calls to the routines we’re comparing, leaving out the time spent setting up calls into those routines and managing input and output / user interaction. Speaking of that, here’s HTML code that provides a “Run test set” button for use with the test suite code above.
Listing Eleven
‹!DOCTYPE HTML› ‹!-- FastWildCompareTest.html --› ‹!-- Copyright 2018 Zettalocity Software. --› ‹!-- Licensed under the Apache License, Version 2.0 (the "License"); --› ‹!-- you may not use this file except in compliance with the License. --› ‹!-- You may obtain a copy of the License at --› ‹!-- http://www.apache.org/licenses/LICENSE-2.0 --› ‹!-- Unless required by applicable law or agreed to in writing, software --› ‹!-- distributed under the License is distributed on an "AS IS" BASIS, --› ‹!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --› ‹!-- See the License for the specific language governing permissions and --› ‹!-- limitations under the License. --› ‹html› ‹head› ‹title›FastWildCompareTest.js‹/title› ‹meta charset="UTF-8"› ‹meta name="description" content="Example code for matching wildcards"› ‹meta name="keywords" content="HTML,CSS,JavaScript"› ‹meta name="author" content="Kirk J Krauss"› ‹meta name="viewport" content="width=device-width, initial-scale=1.0"› ‹link href="FastWildCompareTest.css" rel="stylesheet" /› ‹script src="FastWildCompare_min.js"›‹/script› ‹script src="RegExWildCompare_min.js"›‹/script› ‹script src="FastWildCompareTest.js"›‹/script› ‹/head› ‹body› ‹main› ‹h4›FastWildCompare.js v. RegExWildCompare.js — Matching wildcards algorithm comparison test‹/h4› ‹form id="compare_form" name="compare_form"› ‹br /›‹br /› ‹input type="button" id="testset" value="Run test set" /› ‹input type="button" id="clear" value="Clear" /›‹br /›‹br /› ‹label for="testsetresult"›Test set result: ‹/label› ‹p id="testsetresult"› ‹/p› ‹p id="time1"› ‹/p› ‹p id="time2"› ‹/p› ‹/form› ‹/main› ‹/body› ‹footer› ‹p›‹small›This matching wildcards algorithm comparison test covers both correctness and performance.‹/small›‹/p› ‹/footer› ‹/html›
The CSS code to accompany this can be the nearly same as in Listing Four (above), except the “#compare” label needs to be replaced with “#runtestset” to match the HTML form of Listing Eleven. With that, you can load the FastWildCompareTest.html page in a browser, to compare the performance of the two routines for matching wildcards. To run the test, you press the “Run test set” button and wait several seconds.
So how do the two routines compare? Here’s the result of a test run in 64-bit FireFox® Quantum 62.0.3 on an AMD FX-8300™ 3.3 GHz system running Windows 7®:
To summarize, with FireFox®, the routine I’ve ported from native code provides a performance boost of at least two orders of magnitude over the routine based on RegExp(). Of course, the results may vary across JavaScript implementations. The test also runs just fine with old-fashioned Internet Explorer® on Windows 7®, but both wildcard matching routines run longer, with the performance difference increased to fully three orders of magnitude plus some change. If you’re using a compiler for JavaScript, such as the NW.js compiler for Native Node Modules, then the overhead added by full RegExp() functionality could make the difference still greater. Plus you get both ‘?’ and ‘*’ wildcard support with the faster routine.
The pointer-based native code implementation of FastWildCompare() runs with performance that’s easily yet another order of magnitude improved over this JavaScript implementation running with FireFox. For that reason, if you need to perform a lot of wildcard matching, you might consider calling the native code, for example via a Node.js® addon. But if all you need is a reasonably efficient routine that can run on any system where there’s a JavaScript interpreter available, either of the routines tested here can do the job. While the FastWildCompare() routine offers single-character wildcard matching along with good performance by JavaScript standards, the RegExWildCompare() routine easily can be tweaked for case insensitivity or to make use of the other pattern matching options available in the regular expression toolkit. The performance of a routine based on regular expressions is certainly not bad for use with an input form, or with most any other one-time check that may happen now and then.
Complete source code associated with this article is available at GitHub > kirkjkrauss > MatchingWildcardsInJavaScript.
Copyright © 2018 developforperformance.com.
JavaScript® is a registered trademark of Oracle Corp. FireFox® is a registered trademark of the Mozilla Foundation. Internet Explorer® and Windows® are trademarks or registered trademarks of Microsoft Corp. AMD FX™ is a trademark or registered trademark of Advanced Micro Devices, Inc. Node.js® is a registered trademark of Joyent, Inc.