Authors
William H Press, John A Hawkins
Publication date
2018/12/3
Journal
arXiv preprint arXiv:1812.01112
Description
Engineered DNA is an information channel. One can convert an arbitrary message into a string of DNA characters, or bases,{A, C, G, T}, synthesize the string into a physical DNA sample; store or transport the sample through space and time; sequence it back to a string of characters; and then hope to recover exactly the original message. Because errors are introduced during all the stages of synthesis, storage, and sequencing, it is necessary to utilize an error-correcting code (ECC) at the stage of converting message bits to DNA characters (encoding), and then later, when DNA characters are converted back to message bits (decoding). The ECC needs to correct three kinds of errors: substitutions of one base by another, spurious insertions of bases, and deletions of bases from the message. Insertions and deletions are commonly termed “indels”.
The correction of substitutions is a standard problem in coding theory, where substitutions are termed “errors”. The overarching theoretical framework for coding theory starts with Shannon [1], and there exist hundreds, if not thousands, of well studied error-correcting codes (ECCs)[2, 3, 4, 5]. However, established methods for error correction in the case of silent deletions—termed deletion channels—are few; and there are virtually no established methods for channels with all three of deletions, insertions, and substitutions.(See [6] and [7] for reviews and references.) Indeed, no approaches
Total citations
20202021202211
Scholar articles