Authors
Avrim Blum, Tao Jiang, Ming Li, John Tromp, Mihalis Yannakakis
Publication date
1994/7/1
Journal
Journal of the ACM (JACM)
Volume
41
Issue
4
Pages
630-647
Publisher
acm
Description
We consider the following problem: given a collection of strings s1,…, sm, find the shortest string s such that each si appears as a substring (a consecutive block) of s. Although this problem is known to be NP-hard, a simple greedy procedure appears to do quite well and is routinely used in DNA sequencing and data compression practice, namely: repeatedly merge the pair of (distinct) strings with maximum overlap until only one string remains. Let n denote the length of the optimal superstring. A common conjecture states that the above greedy procedure produces a superstring of length O(n) (in fact, 2n), yet the only previous nontrivial bound known for any polynomial-time algorithm is a recent O(n log n) result.
We show that the greedy algorithm does in fact achieve a constant factor approximation, proving an upper bound of 4n. Furthermore, we present a simple modified version of the greedy algorithm that we show …
Total citations
199319941995199619971998199920002001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320241017261510987571115141112121341071097146114115734
Scholar articles
A Blum, T Jiang, M Li, J Tromp, M Yannakakis - Journal of the ACM (JACM), 1994