Efficient exact edit similarity query processing with the asymmetric signature scheme


More information here

Given a query string Q, an edit similarity search finds all strings in a database whose edit distance with Q is no more than a given threshold t. Most existing method answering edit similarity queries rely on a signature scheme to generate candidates given the query string. We observe that the number of signatures generated by existing methods is far greater than the lower bound, and this results in high query time and index space complexities.

In this paper, we show that the minimum signature size lower bound is t + 1. We then propose asymmetric signature schemes that achieve this lower bound. We develop efficient query processing algorithms based on the new scheme. Several dynamic programming-based candidate pruning methods are also developed to further speed up the performance. We have conducted a comprehensive experimental study in- volving nine state-of-the-art algorithms. The experiment results clearly demonstrate the efficiency of our methods.