Content area
Full Text
A simple wildcard text-matching algorithm in a single while loop
Some programs need to compare a lot of text strings quickly. Text output filters, search engines, and natural-language processors are examples of applications where rapid text string comparison is important. Besides simple character-by-character comparisons, many text comparison routines need to support matching on wildcard characters. One of the strings might treat the "*" character as a special wildcard character; for example, *sip* should match mississippi.
When I searched for a textbook wildcard string-matching algorithm that would fit this need, I discovered that a good algorithm is harder to find than I had expected. The algorithms that I found posted on the Internet were buggy, slow, and/or needlessly complex-typically all three. So I developed my own. Some colleagues then told me that I could have found similar algorithms among open-source regular expression matching routines. I looked at some of those routines, but the ones that I found use recursion. I find recursive algorithms difficult to understand and debug. They're also exposed to stack overflows; for example, if the input strings are lacking proper termination. It's difficult, if not impossible, to prevent stack overflows in a recursive algorithm and still keep it performing fast-and stack overflows are often fatal even in the face of exception-handling logic. When no stack space is available, it's tricky to do so much as output a diagnostic message. There's no such difficulty with my nonrecursive approach.
Listing One is a coded-up version of my algorithm,...