Trie and Ternary Search Tree

Data Structure - Trie

Introduction

Trie (digital tree, radix tree or prefix tree) is a tree data structure that is used to store a set of strings or string/value pairs. Each node of the tree corresponds to a prefix of a string from the set. The root of the tree represents an empty prefix. All the descendants of a node share the common prefix associated with that node.
Trie can be used to efficiently perform the following operations:

  • Check if a given string belongs to the set.
  • Find all strings that start from a given prefix.
  • Find all strings that match given pattern (wildcard or regular expression).
  • Find all approximate matches from a given string, i.e. implement fuzzy search.
Trie is often used as a data structure to store dictionary of words. It is often found in the following applications: full-text index, syntax correction programs, auto suggestion lists.

Nodes in the trie can have a lot of children. For a UCS-2 encoded strings, each node might have up to 65536 children. This is unfortunate and in practical applications strings are usually encoded using variable length encoding schemes. For example, Hu-Tucker encoding.

Ternary search tree is a balance between binary search tree and trie: it is space efficient and supports the same operations as a trie. Each node in ternary search tree can have up to three children.

Implementation

Both implemented data structures support fuzzy search based on Levenshtein distance, see LevenshteinMatcher class in Trie.cs.

Problems

Links

Trie build from words:
  • Petro
  • Mariya
  • Sophie
Last updated: 18.12.2020,
created: 19.12.2017.