Algorithms for the approximate string matching
This topic was presented at Search Engines Amsterdam Meetup
(28 October 2016, Academisch-cultureel centrum SPUI25).
The approximate string matching or the fuzzy search is a technique of finding strings in the text
or in the dictionary that match a pattern approximately. The most common applications of a fuzzy
search are the spell checking, the syntax error correction, the spam filtering and the correction
of OCR errors.
At ZyLAB we are developing the enterprise level Full-Text Search Engine that supports a boolean
and proximity operators, wildcards, regular expressions and a fuzzy search. Over the years the
company has gained a lot of experience in this area. ZyLAB Search Engine is built upon state of the art
algorithms and data structures.
About presenter Petro Protsyk obtained Master's Degree in Computer Science from Taras Shevchenko National University of Kyiv. Since 2006 Petro works as a professional software developer. He joined ZyLAB in 2010 to work on the company's proprietary Search Engine. During the last 6 years Petro focused on building distributed solutions for processing large volumes of data and implementing a new full-text and audio search engines for ZyLAB.
For more information about ZyLAB, see: http://www.zylab.com
Implementation of the presented approximate string matching algorithms in C#
Levenshtein distance between two strings using brute-force recursive algorithm (source).
Levenshtein distance between two strings using dynamic programming algorithm (source).
Check if two strings are within specified Levenshtein distance from each other using shift-and algorithm also known as Bitap or Baeza-Yates-Gonnet algorithm (source).
Check if two strings are within specified Levenshtein distance from each other using Levenshtein automaton (source).