Fine tuning the enhanced suffix array
The enhanced suffix array is an indexing data structure used for a wide range of applications in Bioinformatics. It is basically the suffix array but enhanced with extra tables that provide extra information to improve the performance in theory and in practice. In this paper, we present a number of improvements to the enhanced suffix array: 1) We show how to find a pattern of length m in O(m) time, i.e., independent of the alphabet size. 2) We present an improved representation of the bucket table. 3) We improve the access time of addressing the LCP (longest common prefix) table when one byte per entry is used in representing it. The basic idea behind these improvements is the extensive use of the minimal perfect hashing technique, by which n static items can be stored in linear space while retaining O(1) access time. © 2008 IEEE.