Algorithms for the approximate string matching
This topic was presented at Search Engines Amsterdam Meetup
(28 October 2016, Academisch-cultureel centrum SPUI25). ![]()
Summary:
The approximate string matching or the fuzzy search is a technique of finding strings in the text
or in the dictionary that match a pattern approximately. The most common applications of a fuzzy
search are the spell checking, the syntax error correction, the spam filtering and the correction
of OCR errors.
In this talk we present an overview of the approximate string matching algorithms that
use Levenshtein metric. About presenter Petro Protsyk obtained Master's Degree in Computer Science from Taras Shevchenko National University of Kyiv. Since 2006 Petro works as a professional software developer. He joined ZyLAB in 2010 to work on the company's proprietary Search Engine. For more information about ZyLAB, see: http://www.zylab.com Links Implementation of the presented approximate string matching algorithms in C# Levenshtein distance between two strings using brute-force recursive algorithm (source). Levenshtein distance between two strings using dynamic programming algorithm (source). Check if two strings are within specified Levenshtein distance from each other using shift-and algorithm also known as Bitap or Baeza-Yates-Gonnet algorithm (source). Check if two strings are within specified Levenshtein distance from each other using Levenshtein automaton (source). |